Re: HTML Processing in Java
- From: "Oliver Wong" <owong@xxxxxxxxxxxxxx>
- Date: Tue, 29 Nov 2005 16:50:33 GMT
"Honza" <jan.zeman@xxxxxxxxx> wrote in message
news:1133255497.231778.229120@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Hello,
>
> I would like to process html pages in java. The very first task would
> be to ignore unnecessary information like comments (everything in <!--
> -->) or images.
> What would be the best start point?
> I have found JTidy and HTML Parser in SourceForge, but none of them is
> able of ignoring tags - or did I miss it?
>
> Thank you for any clue
> Honza
Haven't used the parsers you're talking about, but if you find any SAX
based parser, you'll just receive a bunch of "events" representing the
discovery of "things" in an HTML document, and you can just ignore the
"comment" events.
- Oliver
.
- References:
- HTML Processing in Java
- From: Honza
- HTML Processing in Java
- Prev by Date: Re: Tagging interfaces
- Next by Date: A good text editor for JAVA?
- Previous by thread: Re: HTML Processing in Java
- Next by thread: Re: HTML Processing in Java
- Index(es):
Relevant Pages
|
|