Re: Mergesort Vs Quicksort
- From: seeWebInstead@xxxxxxxxxxxxxxxx (Robert Maas, http://tinyurl.com/uh3t)
- Date: Sat, 22 Nov 2008 00:11:32 -0800
Following up my previous post to make a correction or improvement:
I might not have correctly remembered my algorithm of months ago
for sorting records in an array using one auxilary (scratch) of the
same size and merge-sort as the basic algorithm. But never mind
because I think I have clarified the algorithm to be better anyway.
Here's the best new version of my two-array merge-sort algorithm:
Instead of a single recursive algorithm that writes the result
either into the *other* array or back to the same array depending
on how things turn out from lower levels of recursion, reporting to
call where the result happened to be, at the top level we have
control over whether the result will be back in the original array
(most commonly useful) or always to the other array (possible if
you want this behaviour for any perverse reason). As a result there
is a mode flag upon call of the algorithm that determines which of
the two behaviours will occur, or perhaps just two different
functions that are alternately mutually recursive. I'll describe
the algorithm as two separate functions, on the left side of the
diagram the output-back-in-same-array using aux for scratch only,
and on the right side of the diagram the source-to-other-array. At
the very start of either function we do a quick look-ahead to see
whether the number of records in the array segment to be sorted is
very small for special cases, or large enough for general
split/sortEach/merge algorithm. The following table tells what to
do in each case for the two mututally recursive functions:
in-place using-aux source-to-destination
+-------------------------------------------------------------------+
n=0 | |
| Do nothing. +--------------------------------------+
n=1 | | Move single record to destination. |
+----------------------------+--------------------------------------+
n=2 | Do nothing, or swap trick. | Merge from one array to other array. |
+----------------------------+--------------------------------------+
n>2 | General case: Split in half, |
| sort each half by calling *other* function, |
| merge from one array to other array. |
+-------------------------------------------------------------------+
The only case that needs further explanation is n=2 leftside:
Compare the two records. If they are already in correct sequence, do nothing,
else swap them in-place using whichever of these algorithms is fastest:
- Move smaller record from main array to aux array, in any
location that minimizes the number of page boundaries it
crosses, then slide (*) larger record to other end of main
array segment, then move smaller record back into main array
to fill hole in segment vacated by the slide.
- Use the modular permutation cycles algorithm I described in previous msg.
* (Caution for novices: Since the longer segment overlaps where it
was and where it will be moved to, you have to be careful which
direction you copy bytes to avoid overwriting a byte that hasn't
yet been copied to its new location. If it slides from left end
to right end, then you have to copy the rightmost byte first
then work leftward. But if it slides from right end to left end,
then you have to copy the leftmost byte first then move rightward.)
Note that Knuth's algorithm of reversing each part in-place then
reversing the concatenation in-place, or vice versa, is rejected
for sure, because it costs the most time but doesn't save any space
because we already have the second (aux) array allocated.
Reminder, calling convention (from earlier message) is clarified here:
- Starting (inclusive) index of segment to be sorted (for each function)
- Ending (exclusive) index of segment to be sorted (for each function)
(for in-place function only) (for source-to-destiation function only)
- Ptr to array to be sorted in-place - Ptr to array containing source data
- Ptr to auxilary/scratch array - Ptr to array where result will be put
The array return value (pointer to which of the two arrays has
result) is no longer needed because it's implicit in which of the
two functions was called.
As before, records are variable length, where length can be
determined by parsing from the leftmost byte towards the right. For
example, records could be prefixed by their length, or some formal
nesting syntax such as XML or s-expressions could be used for each
record separately. (Obviously prefixing with length would make
sorting much faster, but you take the data the way it's given to
you and must sort in-place or source-to-destination; you don't have
the luxury of re-formatting the data to a more efficient format
before sorting it, otherwise you'd just build a linked list of
records to sort instead of trying to sort records packed in arrays.)
Note that this algorithm is thread safe, whereby sub-tasks can be
parallelized to any degree possible, providing that we have shared
VM with thread-safe paging (or trivially if both arrays fit in
shared physical RAM), and thread-safe CPU access to different
data-allocation units to which all records are aligned. This is
because each recursive call reads and writes *only* within its own
segment of two parallel arrays, which are disjoint with any other
recursive call at any level that might be active at the same time.
(This is "divide and conquer" not just in regard to computations
but also in regard to memory access!)
.
- Follow-Ups:
- Re: Mergesort Vs Quicksort
- From: pete
- Re: Mergesort Vs Quicksort
- References:
- Re: Mergesort Vs Quicksort
- From: Ben Bacarisse
- Re: Mergesort Vs Quicksort
- From: Paul Hsieh
- Re: Mergesort Vs Quicksort
- From: Ben Bacarisse
- Re: Mergesort Vs Quicksort
- From: Tim Rentsch
- Re: Mergesort Vs Quicksort
- From: Robert Maas, http://tinyurl.com/uh3t
- Re: Mergesort Vs Quicksort
- Prev by Date: Re: Question Help About Graphs, thanks in advance.
- Next by Date: Re: Lock-free reference counting
- Previous by thread: Re: Mergesort Vs Quicksort
- Next by thread: Re: Mergesort Vs Quicksort
- Index(es):
Relevant Pages
|