Re: building a meta search engine



Thanks for the reply. I am aware of the Java capabilities and how this
task is done in theory. I am more interested in performance and
practical issues. Parsing HTML is not a lightweight task, so languages
like PHP or Perl are hardly suitable for it. I don't want to touch C, so
Java seems to me like the best option... Another thing is that, coming
up with a parser from scratch seems like an awful waste of time to me,
considering languages like Python come up with a basic XML/HTML parser
out of the box. Are there any good Java parsers out there? Things like
POST/GET requests, support for multiple results pages, high degree of
customization would be nice

acme.com has (not exactly what you need, but) very interesting html parser.

Andrey

--
http://uio.imagero.com Unified I/O for Java
http://reader.imagero.com Java image reader
http://jgui.imagero.com Java GUI components and utilities


.



Relevant Pages

  • Re: building a meta search engine
    ... Parsing HTML is not a lightweight task, ... Java seems to me like the best option... ... up with a parser from scratch seems like an awful waste of time to me, ... there are no search APIs for those sites nor I can access their ...
    (comp.lang.java.help)
  • Re: misc: compiler and metadata...
    ... I have partial/incomplete frontend support for both Java and C#, ... I am actually using the same parser for all 3 languages (and C++ as well, ...
    (comp.lang.misc)
  • Re: misc: compiler and metadata...
    ... I have partial/incomplete frontend support for both Java and C#, ... I am actually using the same parser for all 3 languages (and C++ as well, ...
    (comp.lang.misc)
  • Re: Comparison of ANTLR and javacc
    ... ANTLR has an edge over JavaCC in that it can produce parser source ... code in various languages, whereas JavaCC is limited to producing ... can use a parser in Java, and there are lots of example JavaCC ...
    (comp.compilers)
  • Re: Basic inheritance question
    ... used 'this' in C++ and Java. ... but in Python it doesn't. ... you meant "in languages that has implicit instance reference available in methods"? ...
    (comp.lang.python)