Re: building a meta search engine
- From: RoS <no@xxxxxxxx>
- Date: Mon, 26 Jun 2006 14:20:15 GMT
Thanks for the reply. I am aware of the Java capabilities and how this
task is done in theory. I am more interested in performance and
practical issues. Parsing HTML is not a lightweight task, so languages
like PHP or Perl are hardly suitable for it. I don't want to touch C, so
Java seems to me like the best option... Another thing is that, coming
up with a parser from scratch seems like an awful waste of time to me,
considering languages like Python come up with a basic XML/HTML parser
out of the box. Are there any good Java parsers out there? Things like
POST/GET requests, support for multiple results pages, high degree of
customization would be nice
Roman
deep wrote:
java is perfectly suitable to design meta search engine..
read the html page and match every content and fetch the url and again
with that url open the page and read again and match with search
item...follow a loop....
and finally listed out the url....
there are no standard API for this process.
RoS wrote:
Hello there,
I am building a web application, which involves submitting search
queries to a number of sites, processing and parsing search results and
returning them in an organized way. Basically, a meta search engine. As
there are no search APIs for those sites nor I can access their
databases, I'll have to process raw HTML files and build an unique
parser for each site. As an underlying platform I use J2EE, Servlets and
Tomcat.
- Are there any ready-made Java open-source packages that would deal
with the task of handling POST/GET requests, parsing HTML and organizing
data?
- Is Java a suitable choice for this task? I was originally planning to
use PHP (mostly because I'd like to learn it), but considering this task
is quite CPU incentive, I opted for Java. Python is another viable option,
- Does parsing HTML files seem feasible at all? Considering a single
change in the target site search page structure would require changes to
its parser, this approach looks painful. But on the other hand I have no
idea about an alternative solution, other than bugging site owners for
granting database access or building a simple search API (on the second
thoughts this approach seems to be even more painful)
Any thoughts/comments on the subject are greatly appreciated.
Cheers,
Roman
- Follow-Ups:
- Re: building a meta search engine
- From: jcsnippets.atspace.com
- Re: building a meta search engine
- From: Andrey Kuznetsov
- Re: building a meta search engine
- References:
- building a meta search engine
- From: RoS
- Re: building a meta search engine
- From: deep
- building a meta search engine
- Prev by Date: servlet config and servlet context
- Next by Date: Re: building a meta search engine
- Previous by thread: Re: building a meta search engine
- Next by thread: Re: building a meta search engine
- Index(es):
Relevant Pages
|
|