Re: Downloading lots and lots and lots of files



On 29 Jan 2007, coolneo@xxxxxxxxx wrote:

Google is kinda odd sometimes. It took them forever to allow multiple
download streams, and then they provide this web interface to recall
data in text format with wget. I mean, for Google, you figure they
could do better. I think they would prefer to not give us anything at
all. Once we have it there is always the chance we'll give it way or
lose it or have it stolen (by Microsoft!).

As a business decision it may make sense; technically it's nonsense :)

At the very least they should give you a rsync interface. It's a
single TCP stream, it's fast, and it can be resumed if the connection
should abort. HTTP is low on my list of transport mechanisms for
large files.

Another thing I didn't mention is that this can grow to much larger
than the 50,000, in which case, I'd much rather just auto-download,
than deal with media.

Sure. I was talking about your initial data load; subsequent loads
can be incremental.

I would also suggest limiting to N downloads per hour, to avoid bugs
or other situations (unmounted disk, for example) where you're
repeatedly requesting all the data you already have. That's a very
nasty situation.

Ted
.