Re: Downloading lots and lots and lots of files



coolneo (coolneo@xxxxxxxxx) wrote on MMMMDCCCXCIX September MCMXCIII in
<URL:news:1170081842.925051.117310@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>:
== First, what I am doing is legit... I'm NOT trying to grab someone
== elses content. I work for a non-profit organization and we have
== something going on with Google where they are providing digitized
== versions of our material. They (Google) provided some information on
== howto write a script (shell) to download the digitized version using
== wget.
==
== There are about 50,000 items, raning in size from 15MB-600MB. My
== script downloads them fine, but it would be much faster if i could
== multi-thread(?) it. I'm running the wget using the sys command on a
== windows box (i know, i know, but the whole place is windows so I don't
== have much of a choice).
==
== Am I on the right track? Or should I be doing this differently?


Before you do anything, first check with google if they allow multiple
connection, and if they do, how many multiple connection you may start.
It won't do you much good to start 100 downloads in parallel if google
holds up 95 of them.

Of course, it's quite likely that the network is the bottleneck.
Starting up many simultaneous connections isn't going to help in
that case.

Finally, I wouldn't use threads. I'd either fork() or use a select()
loop, depending on the details of the work that needs to be done.
But then, I'm a Unix person.


Abigail
--
A perl rose: perl -e '@}-`-,-`-%-'
.



Relevant Pages

  • Re: How to implement "Check for Updates"?
    ... track who downloads what when and how. ... my web site, and from external sources. ... That is *not* skewing the stats, it is rather direct evidence that shows ... Google is rubbish for trying to find Mac software. ...
    (comp.sys.mac.programmer.help)
  • Re: Cant connect to www.google.com
    ... Doug Knox, MS-MVP Windows Media Center\Windows Powered Smart Display ... > google url is entered and clicked and the Windows logo in the upper right ... It's for my T-Mobile cellphone connection. ... MS-MVP Windows Media Center\Windows Powered Smart Display ...
    (microsoft.public.windowsxp.general)
  • Re: IP6tables crash
    ... matching, ie, if http do something. ... To communicate on the internet from say google to your pc. ... sends a little hello packet onto the internet looking for google. ... Google sends back a port number confirming the connection ...
    (comp.os.linux.misc)
  • AS 4.x bug: 15-second timeout in HTTP connections.
    ... "System.Net.Sockets.SocketException: An existing connection was ... I see that the desktop downloads all 54 k in a ... My app slowly reads from that buffer. ... there's a virtual connection over USB that emulates a direct ...
    (microsoft.public.pocketpc.activesync)
  • Re: Buddhism in Japan
    ... Say you were accessing TRB via Google Groups... ... legal rights of others; ... owner to Post such Content; ... While Google prohibits such conduct and Content in connection with the ...
    (talk.religion.buddhism)