Re: building a meta search engine
- From: "jcsnippets.atspace.com" <admin@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 26 Jun 2006 15:45:40 GMT
"RoS" <no@xxxxxxxx> wrote in message news:z2Sng.99$IC4.92@xxxxxxxxxxxxxxxx
Thanks for the reply. I am aware of the Java capabilities and how this
task is done in theory. I am more interested in performance and
practical issues. Parsing HTML is not a lightweight task, so languages
like PHP or Perl are hardly suitable for it. I don't want to touch C, so
Java seems to me like the best option... Another thing is that, coming
up with a parser from scratch seems like an awful waste of time to me,
considering languages like Python come up with a basic XML/HTML parser
out of the box. Are there any good Java parsers out there? Things like
POST/GET requests, support for multiple results pages, high degree of
customization would be nice
As for simply saving a HTML page, have a look at one of my articles:
http://jcsnippets.atspace.com/java/network-stuff/how-to-save-a-webpage.html
There is no default solution available in Java for html parsing as far as I
know, but there is a very good and easy to use framework to be found on
www.sourceforge.net - HtmlParser
(http://sourceforge.net/projects/htmlparser) is it's name.
I've just looked it up again, apparently there is a similar framework
available for PHP at the moment, so you could go either way.
Anyway, these will get you started in no-time.
Best regards,
JayCee
--
http://jcsnippets.atspace.com/
a collection of source code, tips and tricks
.
- References:
- building a meta search engine
- From: RoS
- Re: building a meta search engine
- From: deep
- Re: building a meta search engine
- From: RoS
- building a meta search engine
- Prev by Date: Re: NoClassDefFoundError: Files\groovy-1/0-jsr-05
- Next by Date: Re: How to cause IE to use specific plugin?
- Previous by thread: Re: building a meta search engine
- Next by thread: Re: building a meta search engine
- Index(es):
Relevant Pages
|