Re: Html download challenge



Paul Battersby wrote:
I've spent days poking around the internet, reading help information, trying
to find working source code but no luck so far.

My problem, on the surface and to someone who knows what he/she is doing,
should be easy to solve.

All I want to do is download the HTML from the following url:

  http://www.google.com/search?q=business

Sounds simple. I can type that into a browser and I will get a page full of
information. I try to download that using a Java program, and the server
seems to know I am not a browser (my code works with other Urls just fine).
I figure I need to pass some sort of header information or something so that
I appear to be a browser.

So, what I'm looking for, if anyone is up to the challenge, is a small piece
of Java source code that is capable of downloading the HTML from the above
mentioned url and printing it to the screen.

On my own, I think I'm looking at a pretty big learning curve (low level
HTTP protocol) to sort this out.

Any help is of course greatly appreciated.



There is an alternative approach. Google has a Java API. See http://www.google.com/apis/.

The licensing limits you to 1000 queries per day, and specifies
personal, non-commercial use only. I assume they are trying to prevent
anyone from constructing a rival search site with their own ads but
Google's search results.

As long as you meet the licensing restrictions, it is MUCH easier to
access the results using their API than by trying to parse their web
pages, even if you can get hold of them.

Patricia
.


Quantcast