Re: building a meta search engine



java is perfectly suitable to design meta search engine.
read the html page and match every content and fetch the url and again
with that url open the page and read again and match with search
item...follow a loop....
and finally listed out the url....

there are no standard API for this process.

RoS wrote:
Hello there,

I am building a web application, which involves submitting search
queries to a number of sites, processing and parsing search results and
returning them in an organized way. Basically, a meta search engine. As
there are no search APIs for those sites nor I can access their
databases, I'll have to process raw HTML files and build an unique
parser for each site. As an underlying platform I use J2EE, Servlets and
Tomcat.

- Are there any ready-made Java open-source packages that would deal
with the task of handling POST/GET requests, parsing HTML and organizing
data?
- Is Java a suitable choice for this task? I was originally planning to
use PHP (mostly because I'd like to learn it), but considering this task
is quite CPU incentive, I opted for Java. Python is another viable option,
- Does parsing HTML files seem feasible at all? Considering a single
change in the target site search page structure would require changes to
its parser, this approach looks painful. But on the other hand I have no
idea about an alternative solution, other than bugging site owners for
granting database access or building a simple search API (on the second
thoughts this approach seems to be even more painful)

Any thoughts/comments on the subject are greatly appreciated.


Cheers,
Roman

.



Relevant Pages

  • Re: Recommendations for a web application framework?
    ... Yet I still see this duality wherein there's a Java class representing the form. ... you can not eliminate this duality if you want to use HTML templates and insert a value into a specific position in the template. ... To be perhaps a little clearer, what JSF does is ensure that the correspondence between identifiers and names in Java code and HTML content is one-way. ... Java may have a verbose syntax, but it's still much better than XML, especially with a decent IDE. ...
    (comp.lang.java.programmer)
  • Re: What is the learning curve for PHP?
    ... HTML properly either. ... Books become ... HTML is not a programming language at all -- it's a data format, ... but isn't militant about making you use it like Java is. ...
    (comp.lang.php)
  • Re: soooo many questions!
    ... takes to create a dropdown HTML menu in Java, ... as a string via the toString ...
    (comp.lang.java.beans)
  • MVC in JSP (was: template in servlet)
    ... JSP = viel HTML mit wenig Java ... ... HTML ins JSP, ... public void setUser(String user) { ...
    (de.comp.lang.java)
  • HTML Parsing / pre processing
    ... I want am trying to write a simple meta search engine using php.E.G So ... I can send a search string to Google and AltaVista then I want to be ... able to pre process the html returned from both engines via PHP ...
    (alt.php)