Re: A challenging file to parse
- From: Eric Sosman <Eric.Sosman@xxxxxxx>
- Date: Tue, 21 Aug 2007 17:02:30 -0400
Walter Roberson wrote On 08/21/07 16:26,:
In article <1187726830.062759.145450@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
<david.deram@xxxxxxxxx> wrote:
I have a group of files in a format that is that is tab delimited with
about a million columns and a thousand rows.
Reading this file left-to-right top-to-bottom is not a problem but my
requirements are to read it top-to-bottom left-to-right (to read each
column in order as follows).
1,4,7
2,5,8
3,6,9
It's an O(n^2) problem if I read each line for each column (it could
take a week for a big file). The file is too big to hold the lines in
memory and I see no strategy where I can hold a subset of lines in
memory.
Let's suppose you can store about 500MB of file data
in RAM at once. With about a thousand lines, that means
you can read the whole file, storing the leftmost 0.5MB
from each line and discarding the rest. You can then
write this portion of the data to the output file in
transposed order. Rewind the original file and make
another pass, this time ignoring the first 0.5MB from each
line, storing the next 0.5MB, and ignoring the tails.
Write that second batch out, rewind, rinse, and repeat.
Let's see: If the data items are ~10 bytes long plus
the tabs between them, each line is about 11MB and you'll
complete the job in about two dozen passes. If you've
got 1.5GB available, you can do it in eight or nine.
--
Eric.Sosman@xxxxxxx
.
- References:
- A challenging file to parse
- From: david . deram
- Re: A challenging file to parse
- From: Walter Roberson
- A challenging file to parse
- Prev by Date: Re: Read-only functionality without 'const'
- Next by Date: Re: About multidimensional array
- Previous by thread: Re: A challenging file to parse
- Next by thread: Re: A challenging file to parse
- Index(es):
Relevant Pages
|
Loading