Re: Fastcode poll - Should Sort include worst case situations
- From: "Stephen Howe" <SPAMstephen.howeGUARD@xxxxxxxxxxxx>
- Date: Thu, 6 Oct 2005 21:45:16 +0100
> > Not true. There still exist killer sequences but they are just very
> > rare.
>
> By my experience it is a fact.
1) Your experience is wrong (and "experience" is not a good arbiter of
mathematical truth).
Robert Sedgewick proved that unless your partition strategy selects the
median (an O(N) algorithm, not O(1)), there exists sequences that will beat
quicksort and make it O(N * N).
Furthermore, this has been done and there exists a function called
antiqsort() that you can supply as an argument to C's qsort() (if based on
quicksort and most are) and it force the most number of comparisons and
swaps.
See here
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
for Doug McIlroy's "A Killer Adversary for Quicksort"
The function in that paper proves you incorrect.
2) And to this day, one of the best versions of a hybrid-Quicksort (and used
by many compiler vendors (but not Microsoft) is "Engineering a Sort
Function" by Jon Bentley & Doug McIlroy in 1993.
See
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol23/issue11/spe862jb.pdf
It will still do badly if fed antiqsort() but not as bad as others.
Summarising this featured
Insertion sort for arrays < 7 elements
Median-of-3 partition quicksort for arrays < 41 elements
Median-of-9 partition quicksort for arrays >= 41 elements
Fat-pivot
Fat-pivot means that Dijkstra's "Dutch National Flag" problem is solved
(If you have an array of 3 distinct elements randomised (like the 3 colours
of the Dutch Flags), what algorithm do you use to efficently partition the
array).
This is important for QuickSort, because if it worked out you had large
number of duplicates of the partition element, you would to partition and
put them into the correct position and reduce the number of elements left to
be sorted as a much smaller number than putting 1 element in place
(thin-pivot) and having duplicates to the left and right of the pivot
element. The problem is much smaller.
Basically those 3 colours are equivalent to <, = and >.
In short efficient 3-way partioning is preferred over 2-way partioning
See paper above and also
http://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf
(WARNING: The paper need rotating 90 degrees to view properly. There is a
button in Acrobat).
where Sedgewick discusses efficient 3-way partition and also QuickSort for
this century.
3) But on all of this, you can use Musser's Introsort which a version is
commonly used for C++ sort() implementations.
It guarantees O(N * log N) for _ALL_ sequence values (which strict quicksort
does not)
It is a hybrid quicksort.
If it detects that a portion of an array is responding badly, it switches to
heapsort for that portion of the array.
The result is guaranteed O(N * log N) behaviour.
See
http://en.wikipedia.org/wiki/Introsort
and also
http://www.cs.rpi.edu/~musser/gp/introsort.ps (you will need GhostScript to
view)
where this snippet is mentioned "Musser reported that on a median-of-3
killer sequence of 100,000 elements, introsort's running time was 1/200th
that of median-of-3 quicksort."
4) Question: So are there any more "improvements" that can be made to
comparison-based sorts (introsort or hybrid quicksort)?
Well yes.
Heapsort has considerable overhead and a well-gapped Shell Sort will beat it
for low N.
In fact it well be that Heapsort only overtakes Shell Sort once the number
of elements is absolutely huge - more than 2 ^ 32.
I don't know what this point is.
And that is more than the number of elements addressable on a bog-standard
32-bit Intel PC.
So by that criterion, Introsort should use Shell Sort in the worse case and
only Heap Sort if the number of elements is huge.
See http://en.wikipedia.org/wiki/Shell_sort for list of Sedgewick-Incerpi
and Tokuda gaps.
Not listed in the Wiki article is Gonnet & Baeza-Yates gaps after classic
"Handbook of Algorithms and Data Structures".
They report that a gap of 0.4545... ratio is most optimal (which is 5/11)
making sure a gap of 1 is ended on.
5) Further HeapSort has improved over the past 10-years.
Originally Floyd in 1964 produced "TreeSort 1"
He then made two improvments
- the array can be initially heapified in O(N) by sifting down rather
than up in O(N * log N). The 2nd step is still O(N * log N) to sort
- having swapped the largest element with bottom element, push bottom
element right to the end and do comparisons on the way up rather than on
the way down.
But the 3rd improvement is not in common use, so "TreeSort 2" is what we
know as heapsort().
Dijikstra invented (in 1970's?) some complicated version called Smoothsort.
It guaranteed O(N) if input is nearly sorted.
But noone uses this.
See http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD795.PDF
In 1991, Ingo Wegerner created "Bottom-up-Heapsort". Performance is N * O(N
* log N) and beats Quicksort.
See http://ls2-www.cs.uni-dortmund.de/~wegener/
But then followed "Weak-Heapsort" by Stefan Edelkamp (original algorithm
work done by Dutton for Priority Queues)
What happens here is some auxilarly memory is required (just a "bit" for
boolean flags). So with 80,000 elements to sort, 10,000 bytes is needed.
Weak-Heapsort no longer has a binary tree. Instead the trees are lop-sided
with no left node but several right-nodes.
Run time is O(n log n + 0.1 n) in worse case.
See http://eccc.uni-trier.de/eccc-reports/1999/TR99-028/Paper.ps.
which has a nice sumary of the state-of-the-art in terms of sorting
algorithms
6) And where do things go here?
Not much really. I think over this century (2000's) the improvements to
existing algorithms will be very minor.
But any slight improvement is welcome.
Someone might one day "close" the open problem of Shell Sort and supply the
best gaps.
Stephen Howe
.
- Follow-Ups:
- Fastcode Sort Criteria - Number of records?
- From: A Programmer
- Re: Fastcode poll - Should Sort include worst case situations
- From: A Programmer
- Re: Fastcode poll - Should Sort include worst case situations
- From: Sasa Zeman
- Re: Fastcode poll - Should Sort include worst case situations
- From: Dennis
- Fastcode Sort Criteria - Number of records?
- References:
- Fastcode poll - Should Sort include worst case situations
- From: Avatar Zondertau
- Re: Fastcode poll - Should Sort include worst case situations
- From: Avatar Zondertau
- Fastcode poll - Should Sort include worst case situations
- Prev by Date: Re: Fastcode Benchmarking
- Next by Date: Re: Object pooling
- Previous by thread: Re: Fastcode poll - Should Sort include worst case situations
- Next by thread: Re: Fastcode poll - Should Sort include worst case situations
- Index(es):