Re: "Sorting" assignment



In article
<a7c93b0b-4f2a-45c4-8837-e361893f83bc@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
spinoza1111 <spinoza1111@xxxxxxxxx> writes
Actually, it doesn't. It grabs the member at the left end of the array
to be sorted. If the array is already sorted, this means that you end up
with an N^2 algorithm instead of an N log N one. About the only worse
choice you could have made is to use the one at the right end, since the
way you've coded it means that you'd also swap every single member of
the array with itself.

It is more usual to pick the middle of the array as the pivot, or even
to pick an element at random.

News flash. The array is unsorted.

How do you know? Your test data is random, but real data isn't always.
Even if it's nearly ordered, you'll run into the same problem.

This means that choosing the left
end IS picking an element at random.

Wrong, for the reason given above.

But when A and B are one cell, this means that A <= pivot < B for all
members because the two recursive calls you see in the code cover all
the values.
This makes no sense to me. If A or B are one cell, your code doesn't
attempt to sort them (since there's nothing to do).
It makes no sense to you because you don't understand it.

No, it makes no sense because it doesn't make sense.

After the partition, A <= pivot < B for all members of A and B, after
which you can make the recursive calls. This continues to apply even
when A and B contain only one cell or are empty. But that is not what
you said.

Yet. You
will. I am in the text giving a little peroration to show that that
the recursion gets down to one.

Actually, it doesn't. It gets down to two. If there's only one cell in a
block, it returns immediately. [Okay, you make the recursive call, but
only in a trivial sense.]

But that means that the entire array is sorted in rather
less time than a bubble sort,
Actually, if the array is sorted before you start, your code will be
rather slower than a bubble sort. If it's in reverse order, it will be
much slower.
True. This can however be checked.

You can check sorted or reverse sorted, yes. But how do you determine
"nearly sorted", which has the same problem?

For any arbitrary
entry, the probability of its having to move reduces each time the
quicksort is recursively called.
If the array is initially random, the probability that an entry has to
move is almost exactly 0.5 *at every stage*. This is because each entry
has a 50% chance of being greater than the pivot each time.

No comment? Do you agree or not?

int quicksortPartition
   (int intArray[],
    int intLeft,
    int intRight,
    int intPivotIndex)

I find this style difficult to read. The type of a variable is often the
least important thing about it.

"The type of a variable is often the least important thing about it"?

That's what I said.

I should say that the type of a variable is the MOST important thing
about it.

On the contrary. The type of the first parameter is *ABSOLUTELY
IRRELEVANT* to this algorithm; all we care is that it's copyable and has
a < operator. Even that last would be fairly easy to work around (as the
standard function qsort does). What goes wrong with your algorithm if I
change its type to unsigned long, or uintmax_t, or long double? Or even
to a pointer type (noting that the semantics of < require these to be
pointers into a single array of something, not to separate objects).

Similarly, all that matters for the other three are that they can be
used for subscripting the array. Any integer type would do equally well.

Given your advocacy of object-oriented programming in the past, I am
extremely surprised that you force the data to be integers. It would be
far more sensible to abstract the type away. That is, at the top of the
program have "typedef int datatype" and declare the first parameter as
"datatype values []".
That would be the version I am working on for my own further use. This
uses C Sharp. I don't think typedef is OO programming.

It isn't necessarily. However, the word I used was "abstract", which I
thought you'd approve of.

What is the justification for making the index of the pivot a parameter
to this function? If you were choosing the pivot via separate code it
*might* be justified, but you don't - you pivot on the left hand value.
It would make the code far clearer if the choice of pivot was done
within the partition function. It would also make it easier for the
student to experiment with different choices, or for you to do mid-array
or random selection.

No answer?

   int intStoreIndex = intLeft;

You can't do that! Declaring variables after the first line of
executable code is an invention of the evil corporation-controlled
standards bodies and is not part of True C before the
military-industrial complex perverted it.

Git. Wiglaf was a Geat, wasn't he?

No, he was a Scylfing.

But I merely paraphrased your stated opinion of the C Standard. Did I
misrepresent you? If so, I apologise - what *IS* your view of the C
Standard and how does it differ from my paraphrase?

   int intIndex1;
   int intExchange;
   for (intIndex1 = intLeft;
        intIndex1 < intRight;
        intIndex1++)

If you're going to surrender and use C99 features, then why not:

    for (int intIndex1 = intLeft; intIndex1 < intRight; intIndex1++)

I thought about doing this but was uncertain about the scope of
intIndex1.

The for statement, of course. As is clearly stated in the Standard:

If clause-1 is a declaration, the scope of any variables it declares
is the remainder of the declaration and the entire loop, including
the other two expressions; it is reached in the order of execution
before the first evaluation of the controlling expression.

I prefer code that looks like English because this reassures me that
someone was programming on purpose.

However, that mitigates against putting "int" on the front of every
identifier. And one must be careful: in that direction lies COBOL.

Seriously, every try talking about code over a long distance hookup
with a chap from India? If you can say the code and it uses simple
international words, the process tends to go smoother.

If that was likely to be an issue, I would still want to omit all those
leading "int"s. But if we both have the code in front of us, saying
"index" and reading "idx" isn't a problem in my experience.

The "1" means it is the first index and more may be needed.

But they weren't, so the 1 risks confusion.

   {
       if (intArray[intIndex1] <= intPivotValue)
       {
           intExchange = intArray[intIndex1];

Why don't you declare intExchange at this scope? Doing so reduces the
chance of it being misused by accident later.
I don't declare anything inside of a loop is why.

Why not? I thought you liked nesting.

           intStoreIndex += 1;
Suddenly scared of the ++ operator?
After learning about sync points, I'm terrified.

Then why use it earlier. But, in any case, it's perfectly safe if you
don't do stupid things like assigning twice to the same variable in the
same expression.

This partitioning algorithm will work and suffices for teaching.
However, in practice it's far too inefficient.

This represents a problem I have with your code. If it's intended for
teaching, it's too opaque. If it's intended for production, it's too
inefficient.

Thanks for your comments. They are illuminating. My copy of Sorting
and Searching was left in the United States when I left for Asia, and
Knuth conscientiously provides worst case information. Sounds like I
need replacement volumes of Knuth

Indeed.

I agree that quicksort sucks when the array is nearly in order.
However, I think it's more useful to start with quicksort than with
bubble sort.

Knuth starts (IIRC) by asking the reader to try sorting a small number
of elements, then explores the issues.

You also need to explain *why* bubble sort is bad in a way that the
student understands.

Secondly, when the bins get small the effort of partitioning and
recursion overwhelm the benefits of the technique. So it is common to
stop when the bin size gets down to around 5 (the exact size needs to be
found by experimentation on the platform in use). That is, the outer
test in the above code becomes:

    if (intRight > intLeft + 5)
or
    while (intRight > intLeft + 5)

Then, after the quicksort completely finishes, do a *BUBBLE SORT* on the

No, dammit. Don't do a BUBBLE SORT. Do an interchange sort:

for(i = 0; i < arraySize -1; i++)
for(j = i + 1; j < arraySize; j++) if (array[i] > array[j])
exchange(i, j);

entire array. Since every element is within 5 places of where it
belongs, this will never need more than 5 ...

..25 iterations...

No. The way you've written it, it will require N iterations and so
O(N^2) steps. Most of which is wasted.

bool bubble;

do
{
bubble = false;
for (int i = 0; i < arraySize - 1; i++)
if (array [i] > array [j])
{
exchange (i, j);
bubble = true;
}
}
while (bubble);

--
Clive D.W. Feather | Home: <clive@xxxxxxxxxx>
Tel: +44 20 8495 6138 (work) | Web: <http://www.davros.org>
Fax: +44 870 051 9937 | Work: <clive@xxxxxxxxx>
Please reply to the Reply-To address, which is: <clive@xxxxxxxxxx>
.



Relevant Pages

  • Re: "Sorting" assignment
    ... If the array is already sorted, this means that you end up ... less time than a bubble sort, ...     int intLeft, ... I don't declare anything inside of a loop is why. ...
    (comp.programming)
  • Re: most idiomatic way to iterate over an associative array?
    ...     loop ...     end loop; ... SQL> DECLARE  TYPE population_type IS TABLE OF NUMBER INDEX BY ... this will work if - and only if - your array has no gaps: ...
    (comp.databases.oracle.misc)
  • Re: Passing an array of structuresfrom a pointer?
    ... to only declare structs in headers and then define the ... the struct should be declared ... what if you have a simple array like this: ... In the header we would declare? ...
    (microsoft.public.vc.language)
  • Re: #defining an array within another in C language
    ... you do not know the number of elements of the array. ...     int var; ... C99 flexible array members or the plain old "struct hack". ...
    (comp.lang.c)
  • Re: vb.net class
    ... about fixed array lenghts or using ReDim statements. ... code ensures everything in the array is a String because you declare it ... Count can be generated from the time list, not need to store ...
    (microsoft.public.dotnet.languages.vb)