Re: can I know how to write a html parser in C

From: Daniel Bruce (ircubic_at_gmail.com)
Date: 02/26/05


Date: Sat, 26 Feb 2005 07:44:33 +0100

Walter Roberson wrote:
> State 2: recognize and discard whitespace (including newline).
> When you get the first non-whitespace character, then if you had
> no whitespace or if tolower(character) is not 'h' then transit to state 4
> else transit to state 5
<snip>
> State 5: you have recognized up to "<img h". recognize and accept
> characters that match "ref=\"" and then enter url acceptance mode;
> if you hit something else, go to state 4
<snip>

Just a slight nitpick on a seemingly good text(I have no idea about the subject
myself, so can't really say anything about the quality of the text :)
I was under the impression that image URLs were stored in the src attribute, and
not the href one. :) Easy to switch anyways.



Relevant Pages

  • Re: problem with output of the program on different OS
    ... the original program was clearly assuming that the loop was ... to indicate which triangle the ray has hit. ...
    (comp.lang.c)
  • Re: Great SWT Program
    ... matches the word "leopard" in the search history so you have ... you can just hit enter to launch the search. ... reboots, until specifically invited back. ...
    (comp.lang.java.programmer)
  • Re: RFC: Sony Playstation-3 the next IBM PC?
    ... > Then it hit me: IS Sony PlayStation-3 the next IBM PC? ...
    (comp.sys.ibm.pc.hardware.chips)
  • Re: Agents and authors
    ... post new sales and include both the author, the agent, and the editor ... The custom here is to quote the part of the message you're responding to ... hit "show options" at the top of it. ... The two main things are to leave the attributions in there, and to snip ...
    (rec.arts.sf.composition)
  • Re: =?ISO-2022-JP?B?GyRCJEokcyQ4JGMkMyRqJGMhPBsoQg==?=
    ... And it appears that it has just hit the stands... ... Hatsubai sarete iru mitai desu ga... ...
    (sci.lang.japan)