Re: How do I sort a very large dataset (> 10 GB)?



ola.mattsson@xxxxxxxxx wrote:
Hi,
I need a way to sort the content of a log file that can contain more
than 10 gigabyte of data. I am considering parsing the log file into
separate log entries with "fields" for timestamps, log level, message
etc. and put them in a database and perform the sorting in the
database. But maybe there exists a better way?
The major problem I have is that I can't keep the entire content of the
log file in memory and I don't know how to perform a sort on a data set
that is contained partly in memory and partly on disk. Any ideas?



http://en.wikipedia.org/wiki/Merge_sort

Read the entire article, especally where it says "stable sort, parallelizes better, and is more efficient at handling slow-to-access sequential media".

Mergesort doesn't sort in-place. So, use disk as an extra storage. You do have > 10 gigs free on your HD? You can also use tape storage. ;-)
.




Relevant Pages

  • Re: sort by file extension
    ... Text somewhere other than the folder options. ... Document" to "Log File" in the .reg file and import them again. ... Alan Edwards, MS MVP Windows - Internet Explorer ... So when I sort by "Type", it sorts the .log and .txt files together. ...
    (microsoft.public.windowsxp.basics)
  • Re: sort by file extension
    ... Document" to "Log File" in the .reg file and import them again. ... I want to be able to sort by file extension. ... So when I sort by "Type", it sorts the .log and .txt files together. ... I'm sure that sorting by "Type" used to sort by file extension. ...
    (microsoft.public.windowsxp.basics)
  • Re: Using sortrows on mixed-type cell arrays
    ... I read the log file into a Matlab cell array, everthing is string ... I now want to sort the rows of the array first by then ...
    (comp.soft-sys.matlab)
  • Re: Using sortrows on mixed-type cell arrays
    ... I read the log file into a Matlab cell array, ... I now want to sort the rows of the array first by then ...
    (comp.soft-sys.matlab)
  • Re: How do I sort a very large dataset (> 10 GB)?
    ... I am considering parsing the log file into ... >etc. and put them in a database and perform the sorting in the ... external sort. ... I have been using Opt-Tech sort since the early DOS ...
    (comp.lang.java.help)