Re: A challenging file to parse



Walter Roberson wrote On 08/21/07 16:26,:
In article <1187726830.062759.145450@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
<david.deram@xxxxxxxxx> wrote:

I have a group of files in a format that is that is tab delimited with
about a million columns and a thousand rows.


Reading this file left-to-right top-to-bottom is not a problem but my
requirements are to read it top-to-bottom left-to-right (to read each
column in order as follows).


1,4,7
2,5,8
3,6,9


It's an O(n^2) problem if I read each line for each column (it could
take a week for a big file). The file is too big to hold the lines in
memory and I see no strategy where I can hold a subset of lines in
memory.

Let's suppose you can store about 500MB of file data
in RAM at once. With about a thousand lines, that means
you can read the whole file, storing the leftmost 0.5MB
from each line and discarding the rest. You can then
write this portion of the data to the output file in
transposed order. Rewind the original file and make
another pass, this time ignoring the first 0.5MB from each
line, storing the next 0.5MB, and ignoring the tails.
Write that second batch out, rewind, rinse, and repeat.

Let's see: If the data items are ~10 bytes long plus
the tabs between them, each line is about 11MB and you'll
complete the job in about two dozen passes. If you've
got 1.5GB available, you can do it in eight or nine.

--
Eric.Sosman@xxxxxxx
.



Relevant Pages

  • Re: performance and memory usage.
    ... you're just storing a lot of data. ... 1- Is there a more efficient way of storing this in memory, ... The only difference is that in Java, objects carry an additional 8 bytes ... You should, ideally, use a database for storing large data sets. ...
    (comp.lang.java.programmer)
  • aspnet_state.exe Internals Info Needed
    ... I noticed when storing large amounts of information in the StateServer ... Service that this does not increase in size, the worker process itself seems ... I thought the State Server actually stored the session data itself, ... seem from my example that this is not the case and that the memory space ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: A challenging file to parse
    ... algorithm problem and found a Orun-time and Omemory solution, ... Do the same thing, but when storing in an array, store it at ...
    (comp.lang.c)
  • Re: Security of data in memory
    ... > I have a unix program that reads in an encrypted file, decrypts it and ... > prior to exiting, not storing any of the data in a temporary file, etc. ... Storing it "in memory" risks having it written out to swap, ...
    (SecProg)

Loading