Re: Html download challenge
- From: Bill Tschumy <bill@xxxxxxxxxxxxxxxxxxx>
- Date: Thu, 30 Jun 2005 16:51:43 GMT
On Thu, 30 Jun 2005 09:59:11 -0500, Tor Iver Wilhelmsen wrote
(in article <ull4rriv4.fsf@xxxxxxxxxxx>):
> "Paul Battersby" <batman42ca@xxxxxxxx> writes:
>
>> I figure I need to pass some sort of header information or something so that
>> I appear to be a browser.
>
> So you are looking for URLConnection.setRequestProperty("User-Agent",
> "some browser string").
I have a product called Parsnips that uses Java to download URLs and index
them. It works fine with the URL you gave.
Here is the code snippet I use:
System.setProperty("sun.net.client.defaultConnectTimeout",
"10000");
System.setProperty("sun.net.client.defaultReadTimeout", "10000");
System.setProperty("http.agent", "Parsnips/" + Parsnips.CURRENT_VERSION + "
(" + System.getProperty("os.name") + ")");
URL url = new URL(urlStr);
BufferedReader urlReader = new BufferedReader(new
InputStreamReader(url.openStream()));
callback = new ParserCallback(url, null);
new ParserDelegator().parse(urlReader, callback, true);
urlReader.close();
As I say, this is working for me. You will probably want to replace the
ParserCalback stuff with some other way to read the URL stream.
--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com
.
- Follow-Ups:
- Re: Html download challenge
- From: Paul Battersby
- Re: Html download challenge
- References:
- Html download challenge
- From: Paul Battersby
- Re: Html download challenge
- From: Tor Iver Wilhelmsen
- Html download challenge
- Prev by Date: JTable drag and drop rows
- Next by Date: JNI_CreateJavaVM fails in client jvm, if initial/max heap sizes are specified
- Previous by thread: Re: Html download challenge
- Next by thread: Re: Html download challenge
- Index(es):
Relevant Pages
|