Re: Quicksort
- From: user923005 <dcorbit@xxxxxxxxx>
- Date: Wed, 9 Jan 2008 00:49:17 -0800 (PST)
On Jan 9, 12:31 am, "cr88192" <cr88...@xxxxxxxxxxx> wrote:
<robertwess...@xxxxxxxxx> wrote in message
news:68b3fb1f-3b96-4f63-97d3-640a56bbd50c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Jan 8, 8:12 pm, "cr88192" <cr88...@xxxxxxxxxxx> wrote:
oh, this shows, something...
...
...
how about we stick to actual code (exact lang doesn't matter so much
probably...).
then we can actually debate about what will actually happen, rather than
pcode written in forms that will lead to little more than conjecture and
idle speculation wrt the underlying implementation...
abstact for abstract, concrete for concrete.
not so sensible to use the abstract and then debate about points in the
concrete.
Of course, the particular point being addressed was about the
algorithm, not a particular implementation, but whatever...
yes, ok.
now, IMO, the points of debate are this:
quicksort, in a purely recursive form, likes to overflow if nothing is
done;
quicksort, in some edge cases, gets bad performance;
quicksort, in the average case, is one of the fastest sort algos.
highest priority here: overflows.
not good to have the app crash, after all.
so, my approach: on overflow fall back to a different sort (typically a
simple but slow sort).
(...options snipped...)
While a fallback to a O(n**2) sort is not an unreasonable way to avoid
the excessive stack growth, just fixing the implementation of
quicksort to avoid the excessive stack growth is almost certainly
simpler - that's one of my major points.
yes, but, if the "fix" is simply gaining an O(n^2) worst case (rather than
an O(n^2) fallback), it is not much of a tradeoff.
also, I will argue, at least for a selection sort fallback, that the added
complexity is marginal, even vs rewriting the thing into a loop.
justification:
selsort just isn't that complicated.
I suggest that you try switching to insertion sort. It's just as
simple and it is also faster.
of course, cocktail sort is more complicated, and experimentally (for
randomly generated sets) is comming out generally slower than selection
sort.
Also, I'd argue that an unexpected 50000x increase in runtime may well
be equivalent to having the application crash (the approximate factor
if a quicksort of a million elements goes quadratic).
yes, however, large sorts like this are rare IME.
I use lots of small sorts (maybe hundreds or a few k items), but large sorts
are, over all, fairly rare...
I need to sort things in the billions. Our customers do it on a daily
basis.
I use sorts much as I use hashes. they are very common, for lots of little
things, but typically small volume, and not very reusable (often, they are
implemented and specialized for each and every use case).
Make 'em templates and reuse is comically simple.
<snip>
so, if a 15x slowdown exists, the O(n^2) algo may well be faster.
much smaller, and it may well be faster to just use something like
sel-sort.
While that's theoretically true, it's also fairly far divorced from
reality. Typical implementations of heapsort have an average runtime
of about double that of quicksort, and in the case of heapsort, that's
also approximately the maximum run time.
yes.
I was actually debating more about providing a generic library-style
wrapper, of the kind found in qsort.
If you create a template, you won't have to change your code (but you
will need to provide your array elements with comparison operators if
they do not already exist).
for example, the compiler is fairly smart, and it can do useful things for
moving integers around.
memcpy is not as smart.
test shows, however, that qsort is only about 1.6x slower than a plain
natively-compiled sort (with compiler optimizations on), and is actually
faster than a native quicksort with optimizations turned off...
McIlroy wrote a blazing quicksort. BSD uses it, I think.
so: I think the C runtime is compiled with optimizations turned on...
then again, it is possible that either qsort is making use of a special case
for 4-byte items as well (this would make sense, since these are
statistically likely to be very common, ints, pointers, and pointers to
things...).
with compiler optimizations, qsort is still about 1.3x faster than heapsort.
as such, qsort is probably implemented as a quicksort variant, or something
else similarly fast.
In any event, for 1000 elements, this is all a pretty much pointless
debate. If you only got a few thousand elements to sort, use
Shellsort - it's simpler, and at approximate O(n**1.5) worst case is
almost always good enough for such small data sets. You might even
want to consider that for the quicksort fallback.
possibly.
then again, lots of small sorts add up (much like linear searches).
for example, if one is sorting many thousands of sets of items, each with a
few hundred items, speed still matters.
40% of mainframe cycles are said to be spent on sorting. Given the
cost of a CRU, that is very significant.
likewise, even though most lists are smallish, not doing any hashing can
still kill an apps' performance, as all these masses of linked-list lookups
accumulate and choke the app...
actually, this happened unexpectedly long ago, at a time where I was unaware
of the dangers of masses of linear lookups...
sadly, my current new upper compiler (this one being built on DOM trees,
unlike the previous which was built on Lisp-style lists) faces pretty much
the same problem (though, since it is not fully written yet, I have little
idea what to expect for compile times).
But your point that performance needs to be "good enough" is correct -
which is why heapsort is very attractive. It's only marginally slower
than quick sort, and will never bite you in the posterior by going
quadratic. IOW, if it's good enough, it will alwyas be good enough,
no surprised - the same cannot be said of quicksort, even though it
may be a bit better on average.
yeah, I guess so. I guess one tends not to worry too much about the worst
case, given how rare it is.
For a general purpose routine, I think it is necessary to plan for the
worst case.
I guess, in cases where it becomming unstable and going "quadratic" is a
serious operational risk, introspective sort is probably a good idea.
after all, if one has both quicksort and heapsort at hand, beating together
introsort is pretty easy.
then again, introsort is pretty bulky, so, likely, it would make sense to
implement all this behind a vaguely-generic qsort-style interface, and use
the amazing powers of copy and paste.
Introspective sort is as simple as:
If (too_deep) do_heapsort();
if (too_few_elements) do_insertionsort();
else do_quicksort();
The value of too_deep is determined by checking to see if the current
depth is larger than the log of the original input element count.
The value of too_few_elements is usually determined heuristically. It
will depend on the efficiency of your insertion sort (or whatever else
you use for cutoffs).
or, I guess, one can just use qsort, and assume that the runtime
implementors have done a good enough job.
In the grand scheme of things, using qsort() or a sorting template
from the STL is probably the right course of action 95% of the time or
more.
.
- Follow-Ups:
- Re: Quicksort
- From: cr88192
- Re: Quicksort
- References:
- Quicksort
- From: Roman Töngi
- Re: Quicksort
- From: Adak
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: CBFalconer
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: CBFalconer
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: cr88192
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: cr88192
- Quicksort
- Prev by Date: Re: Quicksort
- Next by Date: Re: A note on computing thugs and coding bums
- Previous by thread: Re: Quicksort
- Next by thread: Re: Quicksort
- Index(es):