Re: can I know how to write a html parser in C

From: infobahn (infobahn_at_btinternet.com)
Date: 02/23/05


Date: Wed, 23 Feb 2005 22:51:58 +0000 (UTC)

WUV999U wrote:
>
> Hi
>
> I am fairly familiar in C but not much.
>
> I want to know how I can write a html parser in C that only parses for
> the image file in the html file and display or print
> all the images found in the html file.
>
> How to go about it?
>
> Should I have a file pointer and store the html file into an array
> first and then look for the img src..
> like do some string compare...

That's certainly a valid approach, if you are sure you have the
RAM to get the whole HTML file into memory (for this to be a problem,
it would have to be a tiny computer and one mother of a Web page!).

Use strchr to find a '<' character. Now you know you have a tag.
I don't recall whether whitespace is allowed before the 'i' or 'I'
of img; to be certain, skip past whitespace. isspace() will help
you there. When you get past the whitespace, compare the next
three characters, case-insensitively, to "img". If you have a
match, press on and look for "src", which isn't necessarily just
one whitespace away from "img", so be careful. Don't forget it
might be "SRC" or even "sRc". The rest of this bit should be
obvious.

If the first non-whitespace char after '<' is /not/ 'i' or 'I',
simply look for another '<'.

Keep going until you run out of file.

> Is there a sample on the net(not a hifi code,, a simple one) that I can
> look at to give me an idea on what I need to do.

Have a go at it yourself. If you get stuck, post your best-effort
code here, and I expect someone will help you get unstuck again.



Relevant Pages

  • Re: Automating steps to copy URL from IE into Word
    ... Inside an html document, VBScript is contained between script tags. ... you can create an html file and convert it to an hta file ... > webpage document and it would run when the webpage was opened? ... However, if the document is open, it opens a second ...
    (microsoft.public.word.vba.general)
  • Re: web query : part of a table not captured
    ... HTML files have tables and forms. ... RowCount = RowCount + 1 ... Put the HTML code is a HTML file. ... For Each itm In Results ...
    (microsoft.public.excel.programming)
  • Re: web query : part of a table not captured
    ... HTML files have tables and forms. ... RowCount = RowCount + 1 ... save on my PC as a HTML and the macro I generated. ... Put the HTML code is a HTML file. ...
    (microsoft.public.excel.programming)
  • Re: previous document in JEditorPane has lingering state, how to avoid that?
    ... I am using a JEditorPane to open plain text file and html file alternately. ... The "content type of this editor" referred to is determined by what you ...
    (comp.lang.java.programmer)
  • Re: HTML Editor Problem
    ... HTMLKit, opened my HTML file, made appropriate changes, then saved and uploaded ... It would appear that the editor used by ... There should be 3 images displayed on the first page: ...
    (comp.infosystems.www.authoring.html)