Re: Comparing lists
- From: Scott David Daniels <scott.daniels@xxxxxxx>
- Date: Fri, 14 Oct 2005 13:03:50 -0700
Let me begin by apologizing to Christian as I was too snippy in my reply, and sounded even snippier than I meant to.
Christian Stapfer wrote:
"Scott David Daniels" <scott.daniels@xxxxxxx> wrote in message news:434ba977@xxxxxxxxxxxxxxxxa "better" set implementation will win if it can show better performance without related down-sides.
Is there ever such a thing? I always thought that, most of the time, there is no such thing as a free lunch in
If you look at the history of Python's sort, it has steadily gotten better. The list implementations has been tweaked to produce better performance appending and popping. There are a number of such cases. In fact, as Python rolls along, code keeps getting improved. Usually the requirement is that the change not degrade current benchmarks and provide a substantial improvement in at least some practical cases.
As to the "either now, or eventually;" if you _must_ have performance now, not in some abstract future, then it behooves you to _test_, _test_, _test_!
Well, I might want to test: but *first* I want
to design, preferably in "armchair style" ... [using]
> rough complexity measures to ....
If the documentation stated the order-of-magnitude behavior of those basic operations up front, then I (and *anyone* else who ever wanted to use those operations on large lists / large sets) could do a quick order-of-magnitude estimation of how a certain program design will behave, performance wise.
I think this is where I started over-reacting. There are a number of requests here over time by people who state that things would be so much better for every one if only someone (never themselves) did some more work that they might not otherwise want to do. The people who implement the code often do so on their own time. I find the Python docs surprisingly good for even commercial documentation. For work that is done gratis, it is phenomenal. I hesitate to ask any more of it (although I have pointed out bugs in docs as I've found them).
And, if the proper documentation is in place, and it says "dictionary lookup is O(N)" (and you avoid using it for exactly that reason), how surprised will you be to discover that the O(N) is only reached if the hash values of the keys are all equal?
It's not that difficult to distinguish *average* case and *worst* case scenarios. It might even be a good idea to state, if that can easily be done, what the *best* case happens do be...
Oh, maybe you expect "O(N)" to really mean "\Theta(N)". Then, if you are a dweeb like me, you will respond that "This is not possible, a dictionary of size N must take at least 'O(lg N)' to read the key, never mind processing it." But, it turns out, that given a bound on the size of a process, processing an address is "O(1)", not "O(lg N)". Is that too practical for you, or not enough?
I do not expect a developer to expend *inordinate* amounts of work to figure out the computational complexity of what he has implemented. But, as I wrote, he *must* have thought about the matter, and is thus, by definition, in a rather good position to provide what he can frequently simply pull from memory when it comes to documenting the interface of his module.
I talked about Big-O (worst case) or Big-Theta (average case) just to point out that no simple characterization like "O(N)" tells enough of the story to be practically useful. Once you decide that isn't good enough, the burden on creating the documentation is getting substantial, especially given that you've already spent the effort to write the code and tests for it. In fact I'd hesitate to raise the documentation bar any higher -- I'd hate to think someone thought, "Oh, forget it, I'll just use this code myself."
*Experimenting* is not necessarily as easy to do as you seem to believe.
No, "experimenting" is not always easy (though often it is easy enough). However, "experimenting" puts the cost on the person who derives the benefit, and is thus likely to not be done in a slipshod way.
I am not at all a lawyer type, as you seem to imagine. I just want to suggest that some (to the implementer - but not the average user) *easily* available information about computational complexity (especially for the most basic data types) would be a good thing to have. Nothing more.
My point is that simple performance characterization is not good enough to be useful, and fully accurate characterization is an onerous documentation burden. Something in between will be fraught with complaints about "surprising" worst cases. Whether you would approach middle-ground documentation with the spirit of "this is enough to go on" or not, rest assured that a number of people will complain in a "language- lawyer-like" way about any perceived imperfections.
Would you mind if the quality is proportional to the price you paid?
Here I was too snippy. Sorry.
You are, of course, either assuming that there's a single implementation of Python, or that all implementations have the same behaviour.
(Each line denied with "Of course not.") But realistically, most users will learn the performance characteristics once, and continue to use the trade-offs that were characteristics of that version of Python going forward. Now maybe you behave differently, but most programmers I know (and I explicitly include myself) do have that tendency. Lots of python code moves around operating systems and implementations (and the number of implementations is growing).
You are not _forced_ to use any implementation of Python.But it is reasonable, anyway, to ask for information about a specific implementation (that one is *forced* to use).What I wanted to say is ... this: *whenever* a Python program is run, it will run on some specific implementation of Python. That much should be obvious.
If I were still in a snippy mood, I'd point out that you might look at your original statement and see how it might be read by someone who accidentally read what you wrote, rather than what wanted to write.
You are free to implement your own Python system.Don't be ridiculous.
If Pypy gets going this may not be as unlikely as you think. You will be able to (relatively easily) replace implementations of parts of Python and re-generate a system.
It is you (and only you) here, who is arguing that if perfection is not attainable (with reasonable effort), nothing should be done at all.
I am trying to say the request you make is substantially larger than you understand (or would appear at first blush). Further it is a request that asks for work from others for your (and other's) benefit, not work that you offer to pitch in and help on.
I am as willing to live with imperfect module inferface specification as I am willing with imperfect tools more generally. But that will not stop me from making *suggestions* for improvement (not arrogant demands, mind you).
Try writing something that would fit your standards for all of the basic types, punting where you don't know details. If it is enough to be useful, others will chip in and help fix it where it is broken.
I have seen *many* books on algorithms that specify computational complexity measures for the algorithms described. Any really good library of algorithms will at least try to reach that standard, too.
How many "really good libraries of algorithms" do you know of? Could you name a couple that have these characterizations? Practical code is rife with issues of "finite address space," low-order effects swamping the higher-order terms for most practical sizes and such. It is this collection of nasty little details that makes characterizing libraries frustratingly hard.
> but perfection from *others* (but not himself, of course)....I did not want to pose as the customer who expects nothingI consider it the job of the implementer to ...Am I to take it you provide this to all of your customers?
I am not at all like that: you are attacking a figment of your imagination.
Actually, I was not attacking at all. I was trying to suggest that it is a harder task than you imagine. I am assuming you won't take me up on the suggestion of starting a flawed document, but I'd be happy to be proved wrong. If not, try writing a characterization of the performance of a dozen or so of your own programs. Does the result seem useful in proportion to the work it took you to write it?
How reasonable is it to ask me, or anyone else for that matter, to extract, experiment-wise (if it can be done at all with reasonable effort) information that properly belongs to the implementer and really should have been exposed in the documentation in the first place?
Not at all reasonable. How reasonable is it to ask me to provide you support information for free?
OK, here I was _very_ snippy. Sorry again.
I will assert that often the experiment results in a single bit of information: "it is fast enough." Experimenting will get you reliable answers like that, and only when run-time is an issue will you need to go into it much further.
--Scott David Daniels scott.daniels@xxxxxxx .
- References:
- Re: Comparing lists
- From: Christian Stapfer
- Re: Comparing lists
- From: Christian Stapfer
- Re: Comparing lists
- From: Scott David Daniels
- Re: Comparing lists
- Prev by Date: Re: Moving to Win XP as a Python developer
- Next by Date: Re: object inheritance and default values
- Previous by thread: Re: Comparing lists
- Next by thread: One last thing about SocketServer
- Index(es):
Relevant Pages
|