Re: Html download challenge



On Thu, 30 Jun 2005 09:59:11 -0500, Tor Iver Wilhelmsen wrote
(in article <ull4rriv4.fsf@xxxxxxxxxxx>):

> "Paul Battersby" <batman42ca@xxxxxxxx> writes:
>
>> I figure I need to pass some sort of header information or something so that
>> I appear to be a browser.
>
> So you are looking for URLConnection.setRequestProperty("User-Agent",
> "some browser string").

I have a product called Parsnips that uses Java to download URLs and index
them. It works fine with the URL you gave.

Here is the code snippet I use:

System.setProperty("sun.net.client.defaultConnectTimeout",
"10000");
System.setProperty("sun.net.client.defaultReadTimeout", "10000");
System.setProperty("http.agent", "Parsnips/" + Parsnips.CURRENT_VERSION + "
(" + System.getProperty("os.name") + ")");

URL url = new URL(urlStr);
BufferedReader urlReader = new BufferedReader(new
InputStreamReader(url.openStream()));
callback = new ParserCallback(url, null);
new ParserDelegator().parse(urlReader, callback, true);
urlReader.close();

As I say, this is working for me. You will probably want to replace the
ParserCalback stuff with some other way to read the URL stream.

--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com

.



Relevant Pages

  • Re: Html download challenge
    ... > I figure I need to pass some sort of header information or something so that ... > I appear to be a browser. ... Prev by Date: ...
    (comp.lang.java.programmer)
  • Re: Wheres my CGI hook going?
    ... the intermediate messages printed out by the callback. ... What is happening is that the browser (or the http server on behalf of the ... blocks waiting for the server/browser to start reading the response, ...
    (comp.lang.perl.misc)
  • re: error message , pls help, advice, assist
    ... i tried other browser like mozilla, opera n i get a similar error message. ... i tried a few suggested methods but could not get it fix. ... one suggested that it was due to overheating or bad memory ram. ... detect or determine, or it might be some sort of network problem or error, ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: Ahhh crap I got compromised
    ... agree, but in many homes, the PC is sort of 'family property'. ... fault of the browser, they'd probably manage to screw things up with any ... So if a large organisation such as this with tight security thinks IE ... Firefox will be targetted more often as it gains popularity. ...
    (alt.games.warcraft)
  • Re: Full Screen Viewer
    ... It can sort the files only, if you are in the browser. ... I used IrfanView, which opens ... but it seems you insist on using the Windows explorer to ...
    (rec.photo.digital)