Re: Quicksort
- From: "cr88192" <cr88192@xxxxxxxxxxx>
- Date: Thu, 10 Jan 2008 10:18:25 +1000
On Jan 9, 6:49 pm, user923005 <dcor...@xxxxxxxxx> wrote:
On Jan 9, 12:31 am, "cr88192" <cr88...@xxxxxxxxxxx> wrote:
> <robertwess...@xxxxxxxxx> wrote in message
>news:68b3fb1f-3b96-4f63-97d3-640a56bbd50c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > On Jan 8, 8:12 pm, "cr88192" <cr88...@xxxxxxxxxxx> wrote:
> >> oh, this shows, something...
> >> ...
> >> ...
> >> how about we stick to actual code (exact lang doesn't matter so much
> >> probably...).
> >> then we can actually debate about what will actually happen, rather > >> than
> >> pcode written in forms that will lead to little more than conjecture > >> and
> >> idle speculation wrt the underlying implementation...
> >> abstact for abstract, concrete for concrete.
> >> not so sensible to use the abstract and then debate about points in > >> the
> >> concrete.
> > Of course, the particular point being addressed was about the
> > algorithm, not a particular implementation, but whatever...
> yes, ok.
> >> now, IMO, the points of debate are this:
> >> quicksort, in a purely recursive form, likes to overflow if nothing > >> is
> >> done;
> >> quicksort, in some edge cases, gets bad performance;
> >> quicksort, in the average case, is one of the fastest sort algos.
> >> highest priority here: overflows.
> >> not good to have the app crash, after all.
> >> so, my approach: on overflow fall back to a different sort (typically > >> a
> >> simple but slow sort).
> > (...options snipped...)
> > While a fallback to a O(n**2) sort is not an unreasonable way to avoid
> > the excessive stack growth, just fixing the implementation of
> > quicksort to avoid the excessive stack growth is almost certainly
> > simpler - that's one of my major points.
> yes, but, if the "fix" is simply gaining an O(n^2) worst case (rather > than
> an O(n^2) fallback), it is not much of a tradeoff.
> also, I will argue, at least for a selection sort fallback, that the > added
> complexity is marginal, even vs rewriting the thing into a loop.
> justification:
> selsort just isn't that complicated.
I suggest that you try switching to insertion sort. It's just as
simple and it is also faster.
pure/unadorned Insertion Sort is O(n^2).
as such, insertion sort is usually the simple choice for linked list sorting (much as selection sort is the simple choice for arrays).
> of course, cocktail sort is more complicated, and experimentally (for
> randomly generated sets) is comming out generally slower than selection
> sort.
> > Also, I'd argue that an unexpected 50000x increase in runtime may well
> > be equivalent to having the application crash (the approximate factor
> > if a quicksort of a million elements goes quadratic).
> yes, however, large sorts like this are rare IME.
> I use lots of small sorts (maybe hundreds or a few k items), but large > sorts
> are, over all, fairly rare...
I need to sort things in the billions. Our customers do it on a daily
basis.
I meant, in my codebase, which I will assume to be the common case in largish codebases.
I probably have, I don't know, maybe, tens or hundreds of cases where I make small-scale use of selsort or insertion sort (and even more where I use hashes, often as a quick/easy means of lookup caching).
now, I use sel sort often for really small cases, and usually switch to quicksort if the use of selsort is hurting performance.
in other cases (usually for spatial cases), I build spatial trees (for example, BSP or kd-tree variants, used for example, for quickly sorting polygons relative to the camera, or quickly eliminating objects from linetrace or volume-movement tasks), which could presumably be used for nearly any form of vector-based lookup (possibly, as an optimization even for some harder audio-processing tasks, such as optimizing auditory matching tasks, where the audio is sorted in a vector space defined, for example, by its FFT coefficients).
for example, in my garbage collector, I make use of quicksort. I use quicksort to sort an array of object pointers (used for larger objects), so that I can lookup pointers (done in-bulk in a GC) in O(log2 n) time, rather than O(n) time.
most other cases are much smaller and much more misc (take these items and sort them by a ranking hueristic, ...).
> I use sorts much as I use hashes. they are very common, for lots of > little
> things, but typically small volume, and not very reusable (often, they > are
> implemented and specialized for each and every use case).
Make 'em templates and reuse is comically simple.
sadly, only really doable for those who are using C++ (for various technical reasons, some of us stick almost purely to C).
> <snip>
> >> so, if a 15x slowdown exists, the O(n^2) algo may well be faster.
> >> much smaller, and it may well be faster to just use something like
> >> sel-sort.
> > While that's theoretically true, it's also fairly far divorced from
> > reality. Typical implementations of heapsort have an average runtime
> > of about double that of quicksort, and in the case of heapsort, that's
> > also approximately the maximum run time.
> yes.
> I was actually debating more about providing a generic library-style
> wrapper, of the kind found in qsort.
If you create a template, you won't have to change your code (but you
will need to provide your array elements with comparison operators if
they do not already exist).
as above...
absent, templates, classes, and other features, the practice becomes very different.
so, a large codebase in C is a very different beast then a large codebase in C++...
C++ tends to be very "centralizing", and tends to form very centralized and inflexible projects. the "class heirarchy" forming a kind of almost unchangable core, and attempting to change this can fundamentally break a project.
so, projects need to be highly planned out, and this plan can't really change, and if it does, very often, much of the project needs to be rewritten and reorganized.
C tends to go in a very different direction, leading to a mass of loosely coupled modules and subsystems, and abstraction is the key feature.
however, in C, the main approach for code-reuse is that of copying, pasting, and specializing.
as a result, we may see that very similar pieces of code, may be endlessly duplicated in nearly each and every subsystem.
as such, code becomes a mass of features that can be easily reused, at varying levels of complexity (code fragments, functions, function groups, source files, file groups, directories, subsystems, libraries, subprojects, and projects).
and, at each level, the ideal design, is a little different.
though, in the past, this was one reason for me to stick to C, now there is another issue:
I am running my own dynamic C compiler/VM framework, and at present, I only support C.
throwing C++ into the mix would create, effectively, a horrible mess...
though C++ is nice for some things, it is not critical IMO.
> for example, the compiler is fairly smart, and it can do useful things > for
> moving integers around.
> memcpy is not as smart.
> test shows, however, that qsort is only about 1.6x slower than a plain
> natively-compiled sort (with compiler optimizations on), and is actually
> faster than a native quicksort with optimizations turned off...
McIlroy wrote a blazing quicksort. BSD uses it, I think.
yes, ok.
> so: I think the C runtime is compiled with optimizations turned on...
> then again, it is possible that either qsort is making use of a special > case
> for 4-byte items as well (this would make sense, since these are
> statistically likely to be very common, ints, pointers, and pointers to
> things...).
> with compiler optimizations, qsort is still about 1.3x faster than > heapsort.
> as such, qsort is probably implemented as a quicksort variant, or > something
> else similarly fast.
> > In any event, for 1000 elements, this is all a pretty much pointless
> > debate. If you only got a few thousand elements to sort, use
> > Shellsort - it's simpler, and at approximate O(n**1.5) worst case is
> > almost always good enough for such small data sets. You might even
> > want to consider that for the quicksort fallback.
> possibly.
> then again, lots of small sorts add up (much like linear searches).
> for example, if one is sorting many thousands of sets of items, each > with a
> few hundred items, speed still matters.
40% of mainframe cycles are said to be spent on sorting. Given the
cost of a CRU, that is very significant.
yes.
> likewise, even though most lists are smallish, not doing any hashing can
> still kill an apps' performance, as all these masses of linked-list > lookups
> accumulate and choke the app...
> actually, this happened unexpectedly long ago, at a time where I was > unaware
> of the dangers of masses of linear lookups...
> sadly, my current new upper compiler (this one being built on DOM trees,
> unlike the previous which was built on Lisp-style lists) faces pretty > much
> the same problem (though, since it is not fully written yet, I have > little
> idea what to expect for compile times).
> > But your point that performance needs to be "good enough" is correct -
> > which is why heapsort is very attractive. It's only marginally slower
> > than quick sort, and will never bite you in the posterior by going
> > quadratic. IOW, if it's good enough, it will alwyas be good enough,
> > no surprised - the same cannot be said of quicksort, even though it
> > may be a bit better on average.
> yeah, I guess so. I guess one tends not to worry too much about the > worst
> case, given how rare it is.
For a general purpose routine, I think it is necessary to plan for the
worst case.
yes, but one has to also consider how "general purpose"...
very often, overgeneralizing can be an enemy as well...
> I guess, in cases where it becomming unstable and going "quadratic" is a
> serious operational risk, introspective sort is probably a good idea.
> after all, if one has both quicksort and heapsort at hand, beating > together
> introsort is pretty easy.
> then again, introsort is pretty bulky, so, likely, it would make sense > to
> implement all this behind a vaguely-generic qsort-style interface, and > use
> the amazing powers of copy and paste.
Introspective sort is as simple as:
If (too_deep) do_heapsort();
if (too_few_elements) do_insertionsort();
else do_quicksort();
The value of too_deep is determined by checking to see if the current
depth is larger than the log of the original input element count.
The value of too_few_elements is usually determined heuristically. It
will depend on the efficiency of your insertion sort (or whatever else
you use for cutoffs).
but, it is pulled off usually as several functions.
function groups tend to be a little bit of a hassle to work with IMO (in particular, because of implied renaming-to-avoid-clashes issues).
> or, I guess, one can just use qsort, and assume that the runtime
> implementors have done a good enough job.
In the grand scheme of things, using qsort() or a sorting template
from the STL is probably the right course of action 95% of the time or
more.
yeah.
.
- Follow-Ups:
- Re: Quicksort
- From: user923005
- Re: Quicksort
- From: Stephen Howe
- Re: Quicksort
- References:
- Quicksort
- From: Roman Töngi
- Re: Quicksort
- From: Adak
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: CBFalconer
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: CBFalconer
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: cr88192
- Re: Quicksort
- From: robertwessel2@xxxxxxxxx
- Re: Quicksort
- From: cr88192
- Re: Quicksort
- From: user923005
- Quicksort
- Prev by Date: Re: Article on Herbert Schildt, author of C Unleashed, repaired on wikipedia
- Next by Date: Re: Article on Herbert Schildt, author of C Unleashed, repaired on wikipedia
- Previous by thread: Re: Quicksort
- Next by thread: Re: Quicksort
- Index(es):