Quicksort and Knuth [Was: Division by zero: float vs. int]

From: Chris Uppal (chris.uppal_at_metagnostic.REMOVE-THIS.org)
Date: 09/29/04


Date: Wed, 29 Sep 2004 17:26:26 +0100

Thomas G. Marshall wrote:

> > I don't think I'd want to take lessons in precise English from a guy
> > who's Quicksort implementation recurses into the larger subproblem
> > and iterates into the smaller...

I suppose I should fill out what was really just intended to be a throwaway
irrelevant (and irreverent ;-) comment.

> At first read of this, ICBW, but isn't that how quicksort is supposed to
> work? In the small, you lose far too much stack space and it just isn't
> efficient enough to recur down to the individual members.

There are two questions here.

It is indeed more efficient (in general) to stop quicksort before it gets down
to sorting the smallest sublists (switching to insert sort is the standard
approach), but that's not the issue I was talking about.

The questions is what do you do /before/ you've reached that point, when (at
each step of the algorithm) you have partitioned your input into two sublists.
You don't want to recurse into both, since that risks taking stack space that's
linear in the size of the input if you hit a pathological case. So what you do
is choose the /smallest/ sublist and recurse into that, while continuing your
(constant-space) iteration on the larger of the two sub-problems. That forces
the max stack depth to be at worst log2 N (since each recursive step must be
treating a sublist that is < 1/2 the size of the previous list).

Equivalently, you could say that you handle both sublists recursively, but you
make damn sure (hand coding it if necessary) that the recursive call to the
larger sublist is subject to tail-call optimisation.

I should say that Knuth's presentation[*] of the algorithm isn't explicitly
recursive (and neither is the one in "Numerical methods"). He uses iteration
plus an explicit stack (LIFO datastructure), rather than iteration plus the
system stack. But the error's the same in either case -- he is unecessarily
accepting a worst case O(N) tempory space consumption.

[*] This is from the second edition of Knuth Vol 3 and is taken from the text
presentation of the algorithm, not the MIX code -- I have no intention of
trying to read that!

(BTW, the reason, /I/ know about this is not because I have gone over Knuth's
work with a fine-tooth comb (I didn't even have a copy until fairly recently).
But one of my programs (not in Java) unexpectedly blew up in my face when I was
sorting some list, and when I investigated I discovered that the implementation
was broken in this way. The authors of that sort routine pointed the finger at
"Numerical methods in C" as the source of the mistake (where, incidently, the
code even includes a comment "Push pointers to larger subarray on stack,
process smaller subarray immediately", so it's not a typo). "Numerical
methods" in turn states that the implementation "closely follows [Knuth]")

    -- chris



Relevant Pages

  • Re: using MISC (post 1987 Forth hardware) opcodes
    ... words to naively sum the integers from 1 to the first element on the stack. ... Instead of "recurse" it uses the definitions name, ... Push and process. ... An environment with six cell stacks cannot traverse this data structure ...
    (comp.lang.forth)
  • Elegant code Was: Re: what kind of non microconroller app are done in forth?
    ... OVER R@ EXECUTE WHILE ... good way to do this without locals. ... can handle 3 stack items. ... /STRING R> RECURSE ...
    (comp.lang.forth)
  • Re: Elegant code Was: Re: what kind of non microconroller app are done in forth?
    ... OVER R@ EXECUTE WHILE ... good way to do this without locals. ... can handle 3 stack items. ... /STRING R> RECURSE ...
    (comp.lang.forth)
  • Re: remove all occurence of list
    ... ;;; Recurse, where natural. ... Traversing lists tail-recursively is not only an ... it's about limiting your stack space requirements. ... of reverse, every frame would hold exactly one word plus a return point. ...
    (comp.lang.scheme)
  • Re: stack error detection
    ... Resizing the stack automatically will need to do it precisely at the time where the stack overflow occurs. ... a partial check on each recursive call (direct with RECURSE or indirect with EXECUTE) is better than a simple check in outer interpreter, even if it doesn't cover all cases... ... I would call it CHECKED-EXECUTE to be more self-explanatory towards EXECUTE functionality. ...
    (comp.lang.forth)