Re: General Idea how to attack this problem.




"Ben Pfaff" <blp@xxxxxxxxxxxxxxx> wrote in message
news:878xpkiaoc.fsf@xxxxxxxxxxxxxxx
"Abstract Dissonance" <Abstract.Dissonance@xxxxxxxxxxx> writes:

"Richard Heathfield" <invalid@xxxxxxxxxxxxxxx> wrote in message
news:UcSdnew7qIASF8rZRVnyvQ@xxxxxxxxx
After all, if stability matters to you, then presumably you
have a secondary key that you care about - so just include it
in the sort criteria.

diff = (p1->primary > p2->primary) - (p1->primary < p2->primary);
if(diff == 0)
{
diff = strcmp(p1->secondary, p2->secondary);
}
return diff;


um... then you are making a stable qsort from an unstable one. (You
might
be doing it in a round about way but the end effect is the same)

This sounds koan-like to me. "If an unstable sort is never
presented with equal items, is it still an unstable sort?"

lol.

Hmm... why is it that difficult to understand?


Simple scenario:

Lets suppose an urn contains a number of red balls and a number of green
balls.

You are not told in which proprotion the red balls are to the green ones.

farther, you are to choose a ball and if it is red you loose your life. Its
it green you survive and get some type of reward. You have the option of not
choosing either.

What is will you do?

Since you do not know how many red are in there YOU MUST assume they are all
red(even if in reality there are none). That is, unless you don't value your
life much.

This is totally different than if you are told the proportion. If you know
the proportion than you can make an informed decision... if you don't you
can't. Its basic probability and you can probably find this problem in any
book on probability.


Its exactly analogous to the qsort problem. If you are randomly given a
qsort routine but you do not know it is stable or unstable you must assume
it is unstable... because if you assume it is stable and it turns out to be
unstable then it won't meet your needs.

Why assume its unstable if you don't know? because you can then create a
wrapper to make it stable and not have any issues.

I.e., by assuming it is unstable you can, with probability 1, get a stable
routine. By assuming it is stable you are taking a chance(it might turn out
all implementation are stable though... but it only takes one unstable
routine to screw it up).

I'd imagine this shouldn't be a hard concept to understand for most
programmers... but this might be why there are so many bugs in software too.

If you think about it you might realize whats going on. The real issue
comes about when you are porting code to different implementations who might
not have stable qsort routines... if you programmed it and assumed as if it
were stable and it worked then it will most likely not work when ported.

Now, if the standard said all sorting routines were unstable then you would
have no problem. If it says that all sorting routines were stable then you
also would not have any problem. Since it says neither you cannot assume
one or the other. But you can turn any stable routine to any unstable one
and vice versa. This is the only way make the problem deterministic....
leaving it up to god isn't going to make your program work.


Since you still are probably having a hard time understanding this simple
concept I will give an example.

Lets suppose you are using your favorite ANSI C complier and you write some
program to do some sorting and you required you get some result. You then
end up porting this code in the future to some other compiler that is also
ANSI C compatible. You run the code and the results are different. You
can't understand why as it should be completely deterministic. Lo and
behold its cause one implementation of qsort was stable and the other was
not. The problem was assuming that the implementations of the qsort were
equivilent. Since they were not you screwed up and in reality it could
cause major problems in real software when you make similar mistakes by
assuming to much... its always safer to assume to little than to much.

To fix the problem from the start you should have written your own qsort
routine. Ofcourse that costs time and money so you could have read the ANSI
docs and found out that all implementations of qsort's stability is not
specificied. Then you could just write a simple wrapper to make it stable
or unstable depending on what you needed. Its much easier to make a stable
routine unstable than it is the other way.

Why? Well, go study some algorithmic theory. Learn about what it takes for
algorithms to be equivilent. Hell, since you had so much trouble understand
this heres how to figure out.

If two algorithms give the same output for the same inputs for all inputs
then they are said to be equivilent(even if the code that produces them are
different it won't matter).
Now, technically then there can be only one qsort routine but because people
have implemented them in different ways and added there own criteria(which
gives us unstable and stable versions) we don't really have equivilent
algorithms(which is why this own problem started in the first place).
Ofcourse we all group them under qsort algorithms.

Ok, heres the experiment. Lets suppose I have two qsort algorithms A and B.
One is stable and the other is unstable... but you don't know which. How
can you tell? YOU KNOW THE STABLE ONE ALWAYS outputs in a certain way. Hence
if A doesn't output in that way you know its unstable and hence B must be
stable(or vice versa). What if both A and B don't output in a stable way?
Then neither can be stable and you were wrong to assume they were stable in
the first place. What if A and B both acted stable? Well, you run it
enough times and in different settings then eventually one will have to act
unstable or both actually were stable(really a theoretical argument but).
Hence you could, theoretically, always find the stable algorithm.

Why? Because a stable algorithm is more defined than an unstable one. All
stable algorithms outputs are a subset of all unstable algorithms output...
but not vice versa. Hence there are atleast more unstable algorithms than
stable ones and if you had to guess randomly you best choose unstable(hence
to the original problem you should assume all algorithms are unstable given
no other information).



One more chance for you to get it:

Lets suppose you are told n was a "number" between [0,1]. Whats your guess
that its rational or irrational? If your smart you'll say its irrational.

Now, suppose the same but you had to guess rational or real. Whats your
guess? Obviously you should get real since it includes rational and you are
right either way. (although its less defined cause it includes more
possibilities)

What about an integer or real? what about 0 or 1. what about 0 and [1/2,
1]. etc... etc... etc...

The point is that if you are going to choose something you choose it wisely.
If theres any risk involved you choose the one with the least risk(even
though you could be wrong in specific instances your chance of being right
in the long run is whats important in general).

So if you think of a stable qsort as a also being an unstable one but not
vice versa then its always best to assume they are unstable unless you know
for a fact otherwise. If You think they are mutually exclusive then you are
screwed since you have to assume one then make it really stable or
unstable(by wrapping it).

If you agree with what above then you hopefully can see that there is a
solution to both possibilities(so no matter how you define it you can get a
deterministic outcome). Just assume its unstable and make a stable wrapper
and vice versa. Then you have two algorithms that you know are both
deterministically stable and unstable and you don't have to guess. (this
assumes its easy to convert one to the other. In the case of qsorts its
pretty easy).

In reality all qsorts are deterministic but what seperates stable from
unstable is how they treat the input. You can make a qsort that acts stable
half the time and unstable the other half(using the RNG). It wouldn't be
determnistic(well, technically it would) but it would still be
unstable(unless unstable requires it to be determnistic).

Anyways...


.



Relevant Pages

  • Runs on Windows, fails on Linux
    ... This is VERY perplexing. ... For a programming assignment I'm writing an integer sort routine that must ... Mhz machine with 256meg ram, my times are 42 seconds for Qsort, and 27 each ...
    (comp.os.linux.development.apps)
  • Re: smuggling data in and out of alarm handlers and the like
    ... qsort() is basically broken since the cmp ... make you realize that the sort function should be a macro, ... This allows your "cmp" to itself be a macro, ... But if you write *your own* sort routine, as Paul suggests, then the ...
    (comp.lang.c)
  • Re: smuggling data in and out of alarm handlers and the like
    ... And if you think that a sorting routine is simple enough that there's ... While the first binary search was published in 1946, ... So you think compiler vendors implement qsort() without any code? ...
    (comp.lang.c)
  • Re: Sorting Array of Structures
    ... char author; ... int hold; ... I believe comp.programming does algorithms, so if you need to implement a sort algorithm (I'm guessing that is your assignment and your reason for not using qsort) then that is the place to ask. ...
    (comp.lang.c)
  • Re: Matrix Inversion Suggestions for Fortran
    ... inversion routine. ...   Does any one know of either a general ... better fundamental algorithms can be found in Collected Algorithms ...
    (sci.math.num-analysis)