Re: "Sorting" assignment



In article
<99cf437b-e8d3-4d3c-92cc-909fb9d85b1b@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
spinoza1111 <spinoza1111@xxxxxxxxx> writes
Bubble sorting is how not
to sort.

Here is a C example of a simple program that sorts a list of random
numbers using the method of exchanging between two bins (partition
exchange) that I mentioned in my first post.

An algorithm which most people know as "quicksort", of course.

As I said, this algorithm grabs an arbitrary member and then compares
it to every other member, throwing each compared member into bin A or
bin B depending on whether it is greater than, or less than (or equal
to) the arbitrary member (called the pivot).

Actually, it doesn't. It grabs the member at the left end of the array
to be sorted. If the array is already sorted, this means that you end up
with an N^2 algorithm instead of an N log N one. About the only worse
choice you could have made is to use the one at the right end, since the
way you've coded it means that you'd also swap every single member of
the array with itself.

It is more usual to pick the middle of the array as the pivot, or even
to pick an element at random.

It is usual to point out that quicksort has this pathological behaviour
for certain inputs.

But when A and B are one cell, this means that A <= pivot < B for all
members because the two recursive calls you see in the code cover all
the values.

This makes no sense to me. If A or B are one cell, your code doesn't
attempt to sort them (since there's nothing to do).

But that means that the entire array is sorted in rather
less time than a bubble sort,

Actually, if the array is sorted before you start, your code will be
rather slower than a bubble sort. If it's in reverse order, it will be
much slower.

which maximizes the number of times an
entry must travel to get to its final position.

In some circumstances. There are others - as we'll see - where bubble
sort is actually a sensible thing to use.

For any arbitrary
entry, the probability of its having to move reduces each time the
quicksort is recursively called.

If the array is initially random, the probability that an entry has to
move is almost exactly 0.5 *at every stage*. This is because each entry
has a 50% chance of being greater than the pivot each time.

This is what intelligent workers might do on a farm when sorting
apples.

You are a patronising git at times.

They wouldn't use bubble sort. I think a programmer invented
that time sink.

Here is the code. Examine it in a monospace font such as Courier New.


// ***** Quicksort (partition exchange) example *****

#include <stdio.h>
#include <stdlib.h>

const int SIZE = 27;

int quicksortPartition
(int intArray[],
int intLeft,
int intRight,
int intPivotIndex)

I find this style difficult to read. The type of a variable is often the
least important thing about it. I know you love Hungarian notation, but
many of us find it merely hides the essentials in a morass of
unimportant detail. For example, "intArray" tells me *NOTHING* about the
parameter that I can't already tell from its declaration (no matter what
it is called, it's an int and it's an array). I feel that it would be
far better to name that parameter on the basis of what it *does*, such
as "values" or even (though I wouldn't do it myself) "to_be_sorted".

Given your advocacy of object-oriented programming in the past, I am
extremely surprised that you force the data to be integers. It would be
far more sensible to abstract the type away. That is, at the top of the
program have "typedef int datatype" and declare the first parameter as
"datatype values []".

Furthermore, the correct type to use for an index into an array is
size_t rather than int. Of course, since you blithely subtract 1 from
the index, you'll need to trap that special case. But you won't mind
doing that - after all, you were complaining the other day that (I
think) Brian Kernigan didn't do that but relied on his data being
zero-terminated.

What is the justification for making the index of the pivot a parameter
to this function? If you were choosing the pivot via separate code it
*might* be justified, but you don't - you pivot on the left hand value.
It would make the code far clearer if the choice of pivot was done
within the partition function. It would also make it easier for the
student to experiment with different choices, or for you to do mid-array
or random selection.

{
int intPivotValue = intArray[intPivotIndex];

If this is meant to be educational code, then a comment of the form:

/* Move the pivot value out of the way at the right hand end */

would have been helpful. Comments further along would also be helpful;
for example, in the main loop you could point out how intStoreIndex and
intIndex1 partition the array into bin A, bin B, and unchecked data.

intArray[intPivotIndex] = intArray[intRight];
intArray[intRight] = intPivotValue;

I don't criticise your use of arrays and indices rather than pointers,
though many people would. In practice it's a matter of what you feel
comfortable with.

int intStoreIndex = intLeft;

You can't do that! Declaring variables after the first line of
executable code is an invention of the evil corporation-controlled
standards bodies and is not part of True C before the
military-industrial complex perverted it.

To be precise, *I* added it to the 1999 C Standard. That is, I did all
the donkey work of writing the wording to allow it to be included,
including working out some nasty boundary cases (such as the interaction
with variable length arrays).

int intIndex1;
int intExchange;
for (intIndex1 = intLeft;
intIndex1 < intRight;
intIndex1++)

If you're going to surrender and use C99 features, then why not:

for (int intIndex1 = intLeft; intIndex1 < intRight; intIndex1++)

? And just what information does the "1" provide? In fact, how does it
hurt to just call this "idx"? Or even, given the small size of the body,
"i"?

{
if (intArray[intIndex1] <= intPivotValue)
{
intExchange = intArray[intIndex1];

Why don't you declare intExchange at this scope? Doing so reduces the
chance of it being misused by accident later.

intArray[intIndex1] = intArray[intStoreIndex];
intArray[intStoreIndex] = intExchange;

If your data type is anything bigger than a machine word, it is worth
bracketing this swap with "if (intIndex1 != intStoreIndex)".

intStoreIndex += 1;

Suddenly scared of the ++ operator?

}
}

This partitioning algorithm will work and suffices for teaching.
However, in practice it's far too inefficient. For example, if the left
element of the array is the second largest (e.g. array contents 9 1 2 3
4 5 6 7 8 10), then it does a bubble pass on the array (excluding the
last element), moving every single element down by one and shifting the
10 from left to right one cell at a time.

The standard way to implement the partition phase is:
* Swap the pivot item out of the way.
* Initialize pointer pL to the left element and pR to the right element.
* While pL is left of pR:
* Move pL right until it finds an element greater than the pivot.
* Move pR left until it finds an element less than the pivot.
* Swap the elements at pL and pR.
This does require careful coding, especially to deal with the boundary
cases.

intPivotValue = intArray[intStoreIndex];
intArray[intStoreIndex] = intArray[intRight];
intArray[intRight] = intPivotValue;
return intStoreIndex;
}


void quicksort(int intArray[],
int intLeft,
int intRight)
{
if (intRight > intLeft)
{
int intPivotNewIndex = quicksortPartition
(intArray,
intLeft,
intRight,
intLeft);
quicksort(intArray,
intLeft,
intPivotNewIndex - 1);
quicksort(intArray,
intPivotNewIndex + 1,
intRight);
}
}

Again, if this is meant to be teaching code, it suffices. If it's meant
to be production code, it doesn't.

Firstly, you should determine which of the two partitions is the
smaller. That should be sorted using a recursive call, while the larger
one should be sorted in a loop.

void quicksort(int intArray[],
int intLeft,
int intRight)
{
while (intRight > intLeft)
{
int intPivotNewIndex = quicksortPartition
(intArray,
intLeft,
intRight,
intLeft);
int leftwidth = intPivotNewIndex - 1 - intLeft;
int rightwidth = intRight - (intPivotNewIndex + 1);

if (leftwidth < rightwidth)
{
quicksort(intArray,
intLeft,
intPivotNewIndex - 1);
intLeft = intPivotNewIndex + 1;
}
else
{
quicksort(intArray,
intPivotNewIndex + 1,
intRight);
intRight = intPivotNewIndex - 1;
}
}
}

Doing this guarantees that, no matter how bad things get, the depth of
recursion is never more than log N.

Secondly, when the bins get small the effort of partitioning and
recursion overwhelm the benefits of the technique. So it is common to
stop when the bin size gets down to around 5 (the exact size needs to be
found by experimentation on the platform in use). That is, the outer
test in the above code becomes:

if (intRight > intLeft + 5)
or
while (intRight > intLeft + 5)

Then, after the quicksort completely finishes, do a *BUBBLE SORT* on the
entire array. Since every element is within 5 places of where it
belongs, this will never need more than 5 passes.

int main(int argc,char *argv[])
{
int intArray[SIZE];
int intUpperBound = SIZE - 1;
int intIndex1;
for (intIndex1 = 0; intIndex1 < SIZE; intIndex1++)
{
intArray[intIndex1] =
(int)(frand() * 99);

Undeclared identifier frand.

What is the justification for the 99? Why isn't it a #define? If this is
teaching code, it needs to explain such matters.

printf("%2d ", intArray[intIndex1]);
}
printf("\n");
quicksort(intArray, 0, intUpperBound);

What is the justification for a variable intUpperBound rather than just
putting SIZE - 1?

for (intIndex1 = 0; intIndex1 < SIZE; intIndex1++)
{
printf("%2d ", intArray[intIndex1]);
}
printf("\n");
return 0;
}

--
Clive D.W. Feather | Home: <clive@xxxxxxxxxx>
Tel: +44 20 8495 6138 (work) | Web: <http://www.davros.org>
Fax: +44 870 051 9937 | Work: <clive@xxxxxxxxx>
Please reply to the Reply-To address, which is: <clive@xxxxxxxxxx>
.