Re: Efficient algorithm for finding duplicates in 56-bit number
From: gerard46 (gerard46_at_rtt.net)
Date: 01/29/05
- Next message: Siddharth Taneja: "Re: initialize in O(1)"
- Previous message: Willem: "Re: Efficient algorithm for finding duplicates in 56-bit number"
- In reply to: Willem: "Re: Efficient algorithm for finding duplicates in 56-bit number"
- Next in thread: CBFalconer: "Re: Efficient algorithm for finding duplicates in 56-bit number"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 29 Jan 2005 17:32:50 GMT
| Willem wrote:
|) gerard46 wrote:
|) As an aside, I'd be interested to see how long (wall clock and
|) processor time) it would take to sort millions of those values
|) (and I assume that 2 million is quite less than what Heiner had
|) in mind when he said millions).
|)
|) If the OP is looking for duplicates, all one has to do is just
|) sort the file and see if any of the records match the previous
|) one. No need to do a binary search. This boils the problem
|) down to having some utility do the work sorting the file, and
|) creating a new file if the old one is to be kept intact.
|) Presumably, let the utility run overnight or at some other
|) "slack" time. __________________________________________Gerard S.
| I think you're overestimating the amount of time needed to sort
| mollions of numbers. Basically, if it fits in memory, sorting time
| is in the order of minutes, or maybe even seconds.
Yes, and that'a a heck of a big IF (if it fits in memory). Millions
(let's pick ten as a "good" number) of numbers, each requiring 56 bytes
(as I understood the OP's numbers being 56-byte hex numbers) would
require 1/2 Gigabyte or so of memory. Even if you could read all those
numbers into real memory, I don't think that it can be done in an order
of minutes, and certainly not seconds. But, I could be wrong, of
course. __________________________________________________Gerard S.
| If it doesn't, you're best off with a sort that has very good locality
| of reference, or even tailor-make one that sorts in chunks (where each
| chunk fits in memory) and then merges the chunks.
| You can merge the chunks in one go, if you read those in smaller chunks
| so it fits in memory again, and use maybe a heap structure of pointers
| to chunks to always output the smallest value.
|
| I seem to remember there's a sorting challenge where you have to sort
| like billions or trillions of numbers on big hardware.
What size numbers, 32 bits ? _____________________________Gerard S.
- Next message: Siddharth Taneja: "Re: initialize in O(1)"
- Previous message: Willem: "Re: Efficient algorithm for finding duplicates in 56-bit number"
- In reply to: Willem: "Re: Efficient algorithm for finding duplicates in 56-bit number"
- Next in thread: CBFalconer: "Re: Efficient algorithm for finding duplicates in 56-bit number"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|