Re: building a meta search engine



"RoS" <no@xxxxxxxx> wrote in message news:z2Sng.99$IC4.92@xxxxxxxxxxxxxxxx
Thanks for the reply. I am aware of the Java capabilities and how this
task is done in theory. I am more interested in performance and
practical issues. Parsing HTML is not a lightweight task, so languages
like PHP or Perl are hardly suitable for it. I don't want to touch C, so
Java seems to me like the best option... Another thing is that, coming
up with a parser from scratch seems like an awful waste of time to me,
considering languages like Python come up with a basic XML/HTML parser
out of the box. Are there any good Java parsers out there? Things like
POST/GET requests, support for multiple results pages, high degree of
customization would be nice

As for simply saving a HTML page, have a look at one of my articles:
http://jcsnippets.atspace.com/java/network-stuff/how-to-save-a-webpage.html

There is no default solution available in Java for html parsing as far as I
know, but there is a very good and easy to use framework to be found on
www.sourceforge.net - HtmlParser
(http://sourceforge.net/projects/htmlparser) is it's name.

I've just looked it up again, apparently there is a similar framework
available for PHP at the moment, so you could go either way.

Anyway, these will get you started in no-time.

Best regards,

JayCee
--
http://jcsnippets.atspace.com/
a collection of source code, tips and tricks


.



Relevant Pages