parsing HTML
From: Drew (drew_at_drew.com)
Date: 02/28/05
- Next message: bauscharln: "Re: JMF: controlling webcam color settings from java"
- Previous message: John C. Bollinger: "Re: double declaration"
- Next in thread: Thomas Weidenfeller: "Re: parsing HTML"
- Reply: Thomas Weidenfeller: "Re: parsing HTML"
- Reply: TechBookReport: "Re: parsing HTML"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 28 Feb 2005 11:02:02 -0500
Hi All:
I'm working on a mini HTML parser. Basically, what I need to do is to
take a HTML file and parse thru it. I want to pick out all of the
text that is between table data tags <td> and </td> and all of the
text between list item tags <li> and </li>.
Since, its possible that a line of HTML could have no spaces at all
like the below:
<tr><td>SomeFixture</td></tr>
I'm thinking that I'm going to need to read the HTML file one line at
a time. Then look for < and its closing >. If the text between the
two is td or li, then start capturing text at the location of > + 1
and do that until I hit another < with at /td after it.
Does this sound reasonable? Or am I coming up with too difficult of a
solution. Does Java have any built in HTML parsing methods that make
this easier?
Or even if there's an existing Java program that I could modify for
this, that's great too.
Any help is appreciated!
Drew
- Next message: bauscharln: "Re: JMF: controlling webcam color settings from java"
- Previous message: John C. Bollinger: "Re: double declaration"
- Next in thread: Thomas Weidenfeller: "Re: parsing HTML"
- Reply: Thomas Weidenfeller: "Re: parsing HTML"
- Reply: TechBookReport: "Re: parsing HTML"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|