Re: look up very large table

In article <hku3e0$3fs$1@xxxxxxxxxxxxxxxxxxxxxxxxx>, ela
<ela@xxxxxxxxxx> wrote:

I have some large data in pieces, e.g.

asia.gz.tar 300M


roads1.gz.tar 100M
roads2.gz.tar 100M
roads3.gz.tar 100M
roads4.gz.tar 100M

I wonder whether I should concatenate them all into a single ultra large
file and then perform parsing them into a large table (I don't know whether
perl can handle that...).

There is no benefit that I can see to concatenating the files. Use the
File::Find module to find all files with a certain naming convention,
read each one, and process the information in each file. As far as the
amount of information that Perl can handles, that is mostly determined
by the available memory and how smart you are at condensing the data,
keeping only what you need and throwing away stuff you don't need.

The final table should look like this:

X1 Y9 san diego; california; West Coast; America; North Ameria; Earth
X2.3 H9 Beijing; China; Asia

Perl does not have tables. It has arrays and hashes. You can nest
arrays and hashes to store complex datasets in memory by using


each row may come from a big file of >100M (as aforementioned):

CITY Beijing
NOTE Capital
RACE Chinese

And then I have another much smaller table which contains all the ID's
(either ID1 or ID2, maybe 100,000 records, <20M). and I just need to make
this 20M file annotated with the INFO. Hashing seems not to be a solution
for my 32G, 8-core machine...

Any advice? or should i resort to some other languages?

Try reading all the files and saving the data you want. If you run out
of memory, then think about a different approach. 32GB of memory is
quite a lot.

If you can't fit all of your data into memory at one time, you might
consider using a database that will store your data in files. Perl has
support for many databases. But I would first determine whether or not
you can fit everything in memory.

Jim Gibson