Re: Educated guesses for efficiency?



Malcolm wrote:

What people always say is that algorithmic optimisation should come first.
Actually, it is comparatively rare for a program to run too slowly because an algorithm with the wrong big-O analysis was chosen. [...]

Doesn't jibe with my experience. People tend to do
something that appears to get the job done and passes their
tiny test cases, and then when the Real World hits it ...
The programmer has a notion of "how big" things will be, and
chooses algorithms (wisely or unwisely) based on that notion.
Then when the "size" turns out to be different (maybe the
notion was wrong to begin with, maybe the program is being
put to a new use), the programmer looks dumb for having
chosen the wrong algorithm.

At a PPOE, I once investigated a horrible performance
problem reported by an irate customer. I very soon found
that the thing was spending all its time sorting a linked
list (this part of the product was written in Lisp, so a lot
of things tended to be stored in lists). Was some strange
sort algorithm being used? No: the code was calling Lisp's
built-in sort function. Was something wrong with the sort
implementation? Not really: Some of the constant factors
could have been improved, but it was a straightforward merge
sort, O(N log N). So why was it taking so long?

The culprit turned out to be the comparison function. Its
first check was to search a completely different list for the
two items being compared; if they were both present in the
reference list, their order in that list determined their
order in the sort. Well, the reference list was approximately
the same length as the list being sorted, so each pairwise
comparison took O(N) time.

Yes, folks: The result was an O(N^2 log N) sort! This holds
the record for the worst asymptotic complexity of any sort I've
ever seen in actual production. (Things like "bogosort" are worse,
of course, but I seriously doubt anyone's ever shipped them in
a paid-for product -- I, at least, have never run across them
in "real" software.)

The scary part is that the programmer achieved this horrible
result by putting together pieces that considered individually
were quite innocuous: an O(N log N) sort utility and an O(N)
list search. It's just that he happened to assemble them in such
a way as to multiply their complexities ... And, of course, he
tested his code, probably with lots of N=2 and N=3 lists that
checked the corner cases. All the tests worked, and the code
shipped, and the customer ran it with N = a few hundred ...

Programmers *will* use algorithms with high complexities.
Your program maintains a little LRU cache of something or other
and when you write it you have a mental image of a table of
maybe eight items, so you use simple linear methods to search
and maintain it. Later, something changes elsewhere in the
program to increase the "pressure" on the cache, and some other
programmer changes #define CACHESIZE 8 to #define CACHESIZE 8192.
Or your program keeps lists of the students enrolled in various
courses; there are usually only a couple dozen students per class
except for a very few big lecture sessions, so again you don't
bother wheeling out a hash table. "Cannons and canaries," you
mutter to yourself while writing `for(i = 0; i < class_size; ++i)'.
And then somebody in Alumni Records thinks of a great way to use
your program, and starts by "enrolling" every living alumnus of
the University in one giant artificial "class" ...

Programmers make trade-offs when they select algorithms.
Sometimes they make good trades, sometimes they don't. Sometimes
a trade that was good at the time turns bad later; this just means
that the programmer wasn't Nostradamus.

--
Eric Sosman
esosman@xxxxxxxxxxxxxxxxxxx
.



Relevant Pages

  • How best to teach newbie at algorithms (was: coerce for arbitrary types)
    ... the algorithm itself is essentially last-in first-out, ... newbie needs to learn is how to write useful tools that go beyond ... lists, as vectors, and as structs, to show how the same API can be ... If you want to sort a list, you have to use the result of SORT. ...
    (comp.lang.lisp)
  • Re: Detailed explanation of how a QuickSort Works
    ... Firstly, if you consider the simple "Bubble Sort" algorithm, it works by running through the entire data set, one item at a time, comparing each item to the previous item and swapping them if they are not already in the correct order. ... by simply running through the entire list just once (and splitting it into two smaller lists) you have cut the sorting time in half. ...
    (microsoft.public.vb.general.discussion)
  • Re: fastest sorted list type?
    ... implement IComparable for that class and use the default BinarySearch() ... without passing any sort of comparison to it. ... rather than the exact algorithm). ... I did have a look on google and wiki lists al the sort algorothms nicely, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Mergesort algorithm for linked lists
    ... Instead sort the first two elements. ... I break down the linked lists into natural runs (only if preprocessor ... abcd ef g # collect run ... Of course, as said before, the algorithm is O), we are just ...
    (comp.lang.c)
  • Re: "Sorting" assignment
    ... algorithm such as tbe bubble sort should be given a free pass because ... I would tailor which algorithm to start with by how the student thinks ... If you only learn the beautiful side of programming, ...
    (comp.programming)