Re: How do I sort a very large dataset (> 10 GB)?
- From: Igor Planinc <igoplan@xxxxxxxxxxx>
- Date: Mon, 05 Dec 2005 13:49:57 +0100
ola.mattsson@xxxxxxxxx wrote:
Hi,
I need a way to sort the content of a log file that can contain more
than 10 gigabyte of data. I am considering parsing the log file into
separate log entries with "fields" for timestamps, log level, message
etc. and put them in a database and perform the sorting in the
database. But maybe there exists a better way?
The major problem I have is that I can't keep the entire content of the
log file in memory and I don't know how to perform a sort on a data set
that is contained partly in memory and partly on disk. Any ideas?
http://en.wikipedia.org/wiki/Merge_sort
Read the entire article, especally where it says "stable sort, parallelizes better, and is more efficient at handling slow-to-access sequential media".
Mergesort doesn't sort in-place. So, use disk as an extra storage. You do have > 10 gigs free on your HD? You can also use tape storage. ;-)
.
- References:
- How do I sort a very large dataset (> 10 GB)?
- From: ola . mattsson
- How do I sort a very large dataset (> 10 GB)?
- Prev by Date: Re: What is "\\s+"
- Next by Date: Re: Java & XML
- Previous by thread: Re: How do I sort a very large dataset (> 10 GB)?
- Next by thread: Re: How do I sort a very large dataset (> 10 GB)?
- Index(es):
Relevant Pages
|