Re: Selection-Sort in C
- From: cri@xxxxxxxx (Richard Harter)
- Date: Sun, 04 Oct 2009 17:25:30 GMT
On Sun, 04 Oct 2009 13:52:48 +0100, Ben Bacarisse
<ben.usenet@xxxxxxxxx> wrote:
[snip a lot of stuff - look at the previous postings if you want
background]
That's another matter altogether. Sorting is a problem not an
algorithm.
None-the-less it makes sense to speak of order functions for
problems. Given a problem, there will be a wide variety of
algorithms that can be used to solve it. The intrinsic cost of
solving that problem is the cost of the best algorithms. Thus it
makes sense to say that the traveling salesman problem is
exponential. There is an implicit assumption here that the
algorithms run on a Turing machine equivalent.
Personally, I'd never say that sorting is O(f) for any f
since I don't know of any results of that kind for this class of
problem (I am sure there are some, I just don't know any). Whilst I
agree one can allow some of the things I listed to be unstated for an
/algorithm/[1], I don't think any can left out when discussing a
/problem/.
But they can - it's just a matter of abstraction.
BTW, I will be mildly peeved if the big assumption you think I am
making is to assume that all sorting algorithms are comparison sorts.
Your posting history suggest to me that you might have something more
interesting in mind, but I honestly have no idea what is encompassed
by your notion of "natural".
I'm wasn't but prepare to be disappointed - I expect your
reaction will be "Is that all?". You asked, quite properly, what
does n denote. And the answer is, and I quote, "data sets of
size n". But what how do you measure size? The natural
measurement of the size of a data set is the amount of storage it
occupies. For specialized data sets such as arrays of elements
it is convenient to use the number of elements as a measure of
size, but it the actual number of bits that is primary. (Don't
argue with me on this point; I'm right by definition. :-))
The thing is, sorting algorithms are developed in terms of
operations on elements so it is natural to characterize their
performance in terms of number of elements with an implict
assumption that the cost of operations on elements is O(1). It's
a convenient assumption that works in practice, because real
sorting problems are of bounded size. However the assumption
creates a blind spot - as the number of distinct elements
increases the intrinsic cost of basic operations also increases.
This increase in cost is usually ignored in analysis.
At this point you may be shaking your head and saying, "So what,
why does it matter?" For most of us, I suppose it doesn't - or
at least we think it doesn't. One place where it matters is in
large systems analysis. The issue is: Does sorting scale?
Primary operations (the things that you use for black boxes)
scale it the cost per data unit passing through the box is O(1).
(That's another definition; don't argue. :-)) So, if you are
thinking in terms of counting elements, sorting doesn't scale; if
you are thinking in terms of counting data volume it does -
perhaps.
After all, we need to establish whether sorting actually is
Theta(n) where n is the number of bits in the data set. The
lower bound is clear enough - whatever the algorithm an oracle
can force it to examine every bit. The upper bound is a bit
murkier. The simple argument runs like this:
Let m be the number of elements and n be the total data set size.
Comparison sorts require O(m * log m) operations. However if
there are m distinct elements then the average size of an element
has to be >= log m. Ergo n >= m * log m. Ergo O(n) is an upper
bound.
There are some fiddles you have to go through to account for
duplicate elements but they're trivial. More importantly, naive
comparisons have a cost of n/m >= log m. However there are
various tricks that can be used to tag the bit/byte position.
The upshot is that the comparison cost can be reduced to O(1).
Where things get murky is the question of data movement. In most
sorting algorithms all of the data is potentially moved log m
times for a n*log(m) cost. However one can avoid moving data by
working with data indices. (With the aid of some hair this can
be done in an O(1) per data bit way.) Give a subsidiary
permutation array all of the data can be moved into place as an
O(n) operation - if setting a bit in an arbitrary location is an
O(1) operation. The trouble is that for an arbitrarily large
address space this is problematic. The upshort is that sorting
isn't really O(n) but if we make assumptions that apply within
ordinary computation it is.
By the by, this little insight seems to be harder to arrive at
than I suppose it to be. When my crackpot was posting everybody
told him he was crazy and nobody caught on that he was talking
about apples whereas they were talking about oranges. Using his
n he was right - sort of. He didn't take into all of the costs
for larg n.
Richard Harter, cri@xxxxxxxx
http://home.tiac.net/~cri, http://www.varinoma.com
Kafka wasn't an author;
Kafka was a prophet!
.
- Follow-Ups:
- Re: Selection-Sort in C
- From: Ben Bacarisse
- Re: Selection-Sort in C
- From: Phil Carmody
- Re: Selection-Sort in C
- References:
- Re: Selection-Sort in C
- From: Tim Rentsch
- Re: Selection-Sort in C
- From: Keith Thompson
- Re: Selection-Sort in C
- From: Tim Rentsch
- Re: Selection-Sort in C
- From: Richard Harter
- Re: Selection-Sort in C
- From: Keith Thompson
- Re: Selection-Sort in C
- From: Richard Harter
- Re: Selection-Sort in C
- From: Keith Thompson
- Re: Selection-Sort in C
- From: Richard Harter
- Re: Selection-Sort in C
- From: Ben Bacarisse
- Re: Selection-Sort in C
- From: Richard Harter
- Re: Selection-Sort in C
- From: Ben Bacarisse
- Re: Selection-Sort in C
- Prev by Date: Re: Selection-Sort in C
- Next by Date: Re: ANN An ansic90 version of lcc-win
- Previous by thread: Re: Selection-Sort in C
- Next by thread: Re: Selection-Sort in C
- Index(es):
Relevant Pages
|