Re: Comparing lists



Let me begin by apologizing to Christian as I was too snippy in
my reply, and sounded even snippier than I meant to.

Christian Stapfer wrote:
"Scott David Daniels" <scott.daniels@xxxxxxx> wrote in message
news:434ba977@xxxxxxxxxxxxxxxx
a "better" set implementation will win if
it can show better performance without
related down-sides.

Is there ever such a thing? I always thought that, most of the time, there is no such thing as a free lunch in
If you look at the history of Python's sort, it has steadily gotten
better.  The list implementations has been tweaked to produce better
performance appending and popping.  There are a number of such cases.
In fact, as Python rolls along, code keeps getting improved.  Usually
the requirement is that the change not degrade current benchmarks and
provide a substantial improvement in at least some practical cases.

As to the "either now, or eventually;"  if you _must_ have performance
now, not in some abstract future, then it behooves you to _test_,
_test_, _test_!

Well, I might want to test: but *first* I want
to design, preferably in "armchair style" ... [using]
> rough complexity measures to ....

If the documentation stated the order-of-magnitude
behavior of those basic operations up front, then
I (and *anyone* else who ever wanted to use those
operations on large lists / large sets) could do
a quick order-of-magnitude estimation of how
a certain program design will behave, performance
wise.

I think this is where I started over-reacting. There are a number of requests here over time by people who state that things would be so much better for every one if only someone (never themselves) did some more work that they might not otherwise want to do. The people who implement the code often do so on their own time. I find the Python docs surprisingly good for even commercial documentation. For work that is done gratis, it is phenomenal. I hesitate to ask any more of it (although I have pointed out bugs in docs as I've found them).

And, if the proper documentation is in place, and it
says "dictionary lookup is O(N)" (and you avoid using
it for exactly that reason), how surprised will you be
to discover that the O(N) is only reached if the hash
values of the keys are all equal?

It's not that difficult to distinguish *average* case and *worst* case scenarios. It might even be a good idea to state, if that can easily be done, what the *best* case happens do be...

Oh, maybe you expect "O(N)" to really mean "\Theta(N)".
Then, if you are a dweeb like me, you will respond that
"This is not possible, a dictionary of size N must take at
least 'O(lg N)' to read the key, never mind processing it."
But, it turns out, that given a bound on the size of a
process, processing an address is "O(1)", not "O(lg N)".
Is that too practical for you, or not enough?

I do not expect a developer to expend *inordinate* amounts of work to figure out the computational complexity of what he has implemented. But, as I wrote, he *must* have thought about the matter, and is thus, by definition, in a rather good position to provide what he can frequently simply pull from memory when it comes to documenting the interface of his module.

I talked about Big-O (worst case) or Big-Theta (average case) just to point out that no simple characterization like "O(N)" tells enough of the story to be practically useful. Once you decide that isn't good enough, the burden on creating the documentation is getting substantial, especially given that you've already spent the effort to write the code and tests for it. In fact I'd hesitate to raise the documentation bar any higher -- I'd hate to think someone thought, "Oh, forget it, I'll just use this code myself."

*Experimenting* is not necessarily as easy to
do as you seem to believe.
No, "experimenting" is not always easy (though often it is
easy enough).  However, "experimenting" puts the cost on the
person who derives the benefit, and is thus likely to not be
done in a slipshod way.

I am not at all a lawyer type, as you seem to
imagine. I just want to suggest that some
(to the implementer - but not the average user)
*easily* available information about computational
complexity (especially for the most basic data types)
would be a good thing to have. Nothing more.
My point is that simple performance characterization is not
good enough to be useful, and fully accurate characterization
is an onerous documentation burden.  Something in between
will be fraught with complaints about "surprising" worst
cases.  Whether you would approach middle-ground documentation
with the spirit of "this is enough to go on" or not, rest
assured that a number of people will complain in a "language-
lawyer-like" way about any perceived imperfections.

Would you mind if the quality is proportional to
the price you paid?
Here I was too snippy.  Sorry.

You are, of course, either assuming that there's a
single implementation of Python,
or that all implementations have the same behaviour.
(Each line denied with "Of course not.")
But realistically, most users will learn the performance
characteristics once, and continue to use the trade-offs
that were characteristics of that version of Python going
forward.  Now maybe you behave differently, but most
programmers I know (and I explicitly include myself) do
have that tendency.  Lots of python code moves around
operating systems and implementations (and the number
of implementations is growing).

But it is reasonable, anyway, to ask for information
about a specific implementation (that one is *forced*
to use).
You are not _forced_ to use any implementation of Python.
What I wanted to say is ... this: *whenever* a Python
program is run, it will run on some specific
implementation of Python. That much should be obvious.
If I were still in a snippy mood, I'd point out that you
might look at your original statement and see how it might
be read by someone who accidentally read what you wrote,
rather than what wanted to write.

You are free to implement your own Python system.
Don't be ridiculous.
If Pypy gets going this may not be as unlikely as you think.
You will be able to (relatively easily) replace implementations
of parts of Python and re-generate a system.

It is you (and only you) here, who is arguing that if
perfection is not attainable (with reasonable effort),
nothing should be done at all.
I am trying to say the request you make is substantially larger
than you understand (or would appear at first blush).  Further
it is a request that asks for work from others for your (and
other's) benefit, not work that you offer to pitch in and help on.

I am as willing to live with imperfect module inferface
specification as I am willing with imperfect tools more
generally.  But that will not stop me from making
*suggestions* for improvement (not arrogant demands, mind you).
Try writing something that would fit your standards for all of
the basic types, punting where you don't know details.  If it is
enough to be useful, others will chip in and help fix it where it
is broken.

I have seen *many* books on algorithms that specify
computational complexity measures for the algorithms
described. Any really good library of algorithms will
at least try to reach that standard, too.
How many "really good libraries of algorithms" do you know of?
Could you name a couple that have these characterizations?
Practical code is rife with issues of "finite address space,"
low-order effects swamping the higher-order terms for most
practical sizes and such.  It is this collection of nasty little
details that makes characterizing libraries frustratingly hard.

I consider it the job of the implementer to ...
Am I to take it you provide this to all of your customers?
...I did not want to pose as the customer who expects nothing
> but perfection from *others* (but not himself, of course).
I am not at all like that: you are attacking
a figment of your imagination.
Actually, I was not attacking at all.  I was trying to suggest
that it is a harder task than you imagine.  I am assuming you won't
take me up on the suggestion of starting a flawed document, but I'd
be happy to be proved wrong.  If not, try writing a characterization
of the performance of a dozen or so of your own programs.  Does the
result seem useful in proportion to the work it took you to write it?

How reasonable is it to ask me, or anyone else
for that matter, to extract, experiment-wise
(if it can be done at all with reasonable effort)
information that properly belongs to the implementer
and really should have been exposed in the
documentation in the first place?

Not at all reasonable.  How reasonable is it to ask
me to provide you support information for free?

OK, here I was _very_ snippy. Sorry again.

I will assert that often the experiment results in a single bit of
information: "it is fast enough."  Experimenting will get you reliable
answers like that, and only when run-time is an issue will you need to
go into it much further.

--Scott David Daniels
scott.daniels@xxxxxxx
.



Relevant Pages

  • Re: How to except the unexpected?
    ... source are allowed to change between Python versions and across Python ... implementations. ... I've wasted many hours programming to faulty documentation when a quick peek at the source code would have saved some serious time. ... The original reference was to the internet, which seems vast in its sources of unforseen peril. ...
    (comp.lang.python)
  • Re: Finding installed package files
    ... for making installation information more easily accessible? ... >> somewhere where more information about the key parts of the package ... >> As an example I installed the latest Python on Fedora 3. ... what one would expect is a form of help documentation much more integrated ...
    (alt.os.linux.redhat)
  • Re: Pythons garbage collection was Re: Python reliability
    ... >>> Has anyone looked into using a real GC for python? ... A strategy based on PURE reference counting just ... > extensions, and with correct implementations, both refcounting and GC are ... > Lucky those existing C libraries were written to use python's refcounting! ...
    (comp.lang.python)
  • Re: Bitching about the documentation...
    ... > a language need complaints about the language? ... To give potential adopters of Python a more even handed view ... but sometimes "constructive criticism" means ... I have found Python's documentation to generally be excellent. ...
    (comp.lang.python)
  • Re: My Python annoyances
    ... incomplete documentation, a maze of little platform-specific quirks to ... Python base classes have changed behavior. ... perhaps the best introduction to the language when it was written. ... I filed a bug report. ...
    (comp.lang.python)