Re: Populating a dictionary, fast [SOLVED]
- From: Francesc Altet <faltet@xxxxxxxxxxx>
- Date: Wed, 14 Nov 2007 12:29:49 +0100
A Tuesday 13 November 2007, Steven D'Aprano escrigué:
On Tue, 13 Nov 2007 17:27:11 +0100, Francesc Altet wrote:
I don't know exactly why do you need a dictionary for keeping the
data, but in case you want ultra-fast access to values, there is no
replacement for keeping a sorted list of keys and a list with the
original indices to values, and the proper list of values. Then,
to access a value, you only have to do a binary search on the
sorted list, another lookup in the original indices list and then
go straight to the value in the value list. This should be the
faster approach I can think of.
Maybe on Bizarro World, but not on Planet Earth.
Searching a dict takes on average a constant amount of time,
regardless of the key and the size of the dict. Because dicts are the
basis of Python's namespaces, it is optimized up the wazoo, and being
implemented in C it is extremely fast.
Using your suggestion, to get a value: you have to do a binary search
which is O(log N) instead of O(1), then ANOTHER lookup into a list of
indices, and finally a THIRD lookup to actually retrieve the value --
all implemented in relatively slow Python instead of fast C.
Well, the bisect module in Python is implemented in fast C. Apart from
that, you are basically right, see below.
And let's not even discuss the overhead of keeping these three lists
synchronized.
But don't take my word for it: measure for yourself. I'm not even
going to attempt to code your suggestion, but here's how you measure
the time it takes for dictionary lookup.
# Create a dataset to search on.
D = {}
chars = "abcdefghijklmnopqrstuvwxyz"
triplets = (a+b+c for a in chars for b in chars for c in chars)
for i, s in enumerate(triplets):
D[s] = i # D is of the form {'abc': 12345}
# Now time a successful search for an arbitrary triplet.
import timeit
min(timeit.Timer("D['foo']", "from __main__ import D").repeat(6))
On my PC, it takes about 0.2 seconds per million searches.
Oh yes. All of you are right, guys. I've implemented code (attached)
for measuring the lookup times using a a dictionary and using a binary
search approach (using numpy, mostly for space efficiency) and here are
my results:
$ python2.5 gq-dict.py
Creating the dictionary...
Time for dict creation: 89.115
Items in dict: 8191180
Timing queries...
Time for querying: 39.44
$ python2.5 gq-binary-search.py
Creating the lists...
Time for lists creation: 4.245
Sorting...
Time for sorting: 6.195
Timing queries...
Time for querying: 108.1
i.e. a dict approach proves to be almost 3x faster than a regular binary
search (5 us/lookup vs 13 us/lookup).
I think I've got messed on some benchmarks that I've done on that
subject some time ago, but either my memory is bad or I've made some
mistake on those experiments. My apologies.
Cheers,
--
0,0< Francesc Altet http://www.carabos.com/V V Cárabos Coop. V. Enjoy Data
"-"
Attachment:
gq-binary-search.py
Description: application/python
Attachment:
gq-dict.py
Description: application/python
- References:
- Re: Populating a dictionary, fast
- From: Michael Bacarella
- Re: Populating a dictionary, fast [SOLVED]
- From: Francesc Altet
- Re: Populating a dictionary, fast [SOLVED]
- From: Steven D'Aprano
- Re: Populating a dictionary, fast
- Prev by Date: Re: Using Python To Change The World :)
- Next by Date: Re: Populating a dictionary, fast [SOLVED]
- Previous by thread: Re: Populating a dictionary, fast [SOLVED]
- Next by thread: Re: Populating a dictionary, fast [SOLVED]
- Index(es):
Relevant Pages
|
|