Re: Mergesort algorithm for linked lists
- From: cri@xxxxxxxx (Richard Harter)
- Date: Tue, 23 Jan 2007 22:50:03 GMT
On Tue, 23 Jan 2007 23:08:27 +0100, Joerg Schoen
<jo65.spammenot@xxxxxxxxx> wrote:
Ben Pfaff wrote:
cri@xxxxxxxx (Richard Harter) writes:
As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.
That's basically what my code is doing, but it also takes advantage of
natural runs in the input! Richard Harter is right in noticing that I
should have described the algorithm more detailed. Sorry! I'll also improve
my indentation next time. Here now is a detailed explanation:
I break down the linked lists into natural runs (only if preprocessor
symbol "NATURAL" is defined). After collecting the first two runs, I merge
them into a bigger run. Then I collect the next two runs, merge them into a
bigger one. On the stack, I now have two bigger runs which I merge again.
Here is how the stack develops (separated letters denote different runs,
adjacent letters means merged runs, the comment at the end of the line
describes what happens before the next line):
# collect run
a # collect run
a b # merge
ab # collect run
ab c # collect run
ab c d # merge
ab cd # merge
abcd # collect run
abcd e # collect run
abcd e f # merge
abcd ef # collect run
abcd ef g # collect run
abcd ef g h # merge
abcd ef gh # merge
abcd efgh # merge
abcdefgh
...
The stack contains the start of the each run which is a NULL terminated
linked list and also the length of it. So there is enough information on
the stack to know the start and length of each run. Compared to other
implementations of mergesort the my implementation doesn't run through the
list a given number of steps just to find the start of the next run.
What I like about the implementation is that there is not much need for
treating special cases like ending lists when merging etc. That sounds
easy, but try to do it yourself and you see that it's a bit tricky!
Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.
That is exactly the governing idea in my own implementation: Improve
mergesort by recognizing natural runs in the input.
There are two not so obvious problems with Ben's description:
One thing is that you need to call your comparison function very frequently
just to recognize runs you could already know about otherwise. Assume you
have just merged two runs of length 1000 and 2000 to a run of length 3000.
When you go over the list from the top, you need another 2999 comparisons
just to recognize the run of size 3000.
Note that this is easier in the simpler implementation cited by Richard: The
length of the runs is known to be exactly 2**n in the n-th iteration of the
outer loop.
The second problem in Ben's approach is non-locality: You have to run
repeatedly through the WHOLE input and each time increase the size of the
runs. This will render your CPU cache rather useless. In the test program
on my webpage (http://www.die-schoens.de/prg/testsort.tgz), I compare the
localized version with a non-localized one. The runtime of the localized
one is better by 35%! That's seems relevant to me for real world
application!
Of course, as said before, the algorithm is O(n * log(n)), we are just
talking about minimizing the constant factor in front of this.
Well done. Actually the algorithm I described is is closer to your
algorithm than to Ben's. IIANM Ben's is actually O(n^2) worse case if I
am not missing a fine point. An implementation of my algorithm has a
little gotcha - you have to be careful about recognizing the end of
data. I shall have to think about it a bit, but I think both can be
written nicely as recursive routines.
.
- Follow-Ups:
- Re: Mergesort algorithm for linked lists
- From: Ben Pfaff
- Re: Mergesort algorithm for linked lists
- References:
- Re: Mergesort algorithm for linked lists
- From: Richard Harter
- Re: Mergesort algorithm for linked lists
- From: Ben Pfaff
- Re: Mergesort algorithm for linked lists
- From: Joerg Schoen
- Re: Mergesort algorithm for linked lists
- Prev by Date: Re: What's wrong with this server code ?? Cant read incoming data
- Next by Date: Re: Mergesort algorithm for linked lists
- Previous by thread: Re: Mergesort algorithm for linked lists
- Next by thread: Re: Mergesort algorithm for linked lists
- Index(es):
Relevant Pages
|
Loading