Re: objects as mutable dictionary keys

From: John Roth (newsgroups_at_jhrothjr.com)
Date: 12/28/04


Date: Tue, 28 Dec 2004 16:19:37 -0600

I think that's a good summary. The condensed
version is that the results of both __hash__() and
__cmp__() have to remain stable for dicts to work
as one would expect. __cmp__ doesn't do that
for lists, and it isn't defined by default for user
objects.

John Roth

"Peter Maas" <peter@somewhere.com> wrote in message
news:33e3k7F3ulad0U1@individual.net...
> Peter Maas schrieb:
>> There was a huge and sometimes heated debate about tuples, lists and
>> dictionaries recently, and the mainstream opinion was that dictionary
>> keys must not be mutable, so lists are not allowed as dictionary keys.
>
> Warning, long posting (~ 100 lines)
>
> The existence of lists and tuples and the impossibility to use
> lists as dictionary keys is a constant source of irritation for
> Python newcomers. Recent threads in c.l.py have revealed that
> even programmers like me using Python for some years now often don't
> have a clear picture. To put an end to this misery I tried to learn
> from the recent discussions and to explain to myself what's going on.
> My problem was that I didn't know exactly how a dictionary and
> a dictionary lookup works:
>
> A dictionary maps key values to data values. Sketch of a dictionary
> lookup(value = dict[key]):
>
> def lookup(dict, key):
> '''dictionary lookup is done in three steps:
> 1. A hash value of the key is computed using a hash function.
> Hash functions map a large set L to a (usually) smaller
> set S.
>
> A hash function must satisfy:
> (*) for all e1, e2 in L: h(e1) != h(e2) => e1 != e2
>
> A hash function should satisfy as well as possible:
> for all e1, e2 in L: h(e1) == h(e2) => e1 == e2
>
> Violation of the latter condition means a hash collision,
> i.e. two or more elements of L have the same hash value.
> This should be highly unlikely for good hash functions.
> If not, hash lookup becomes as slow as sequential lookup.
>
> 2. The hash value addresses a location in dict.data which is
> supposed to be an array of bins. A bin contains a collision
> list of (key,value) pairs.
>
> 3. The collision list addressed by the hash value is searched
> sequentially until a pair is found with pair[0] == key. The
> return value of the lookup is then pair[1]. Equality (==) is
> defined by the function cmp(). The return value for equality is
> 0, for inequality some other value.
> '''
> h = hash(key) # step 1
> cl = dict.data[h] # step 2
> for pair in cl: # step 3
> if cmp(key, pair[0]) == 0:
> return pair[1]
> else:
> raise KeyError, "Key %s not found." % key
>
> Thus a data type must be a valid input for hash() and cmp() to be usable
> as a dictionary key. hash() and cmp() must satisfy the condition (*) in
> lookup.__doc__ for this data type.
>
> Is a list a suitable candidate for a dictionary key? hash(list) = id(list)
> and cmp(list1, list2) comparing id(list1) with id(list2) would satisfy
> condition (*). But this definition completely ignores list contents,
> causing big surprises for programmers:
>
> - Different lists with the same content would be mapped to different
> dictionary values.
>
> - Dictionary lookup with list literals would be impossible.
>
> To avoid these surprises dictionary lookup would have to use list contents
> instead of list id. Consider the (hypothetical, not working) code:
>
> >>> d = dict()
> >>> li = [1,2,3]
> >>> d[li] = "123"
> >>> d[[1,2,3]]
> "123"
>
> Now assume li is changed (e.g. li[2] = 4). There are two options to handle
> this,
> let the dictionary ignore changes
>
> >>> d[li]
> KeyError: [1,2,4] not found in dictionary. (even worse: a previously
> existing
> [1,2,4] map is returned).
>
> or let the dictionary follow changes
>
> >>> d[[1,2,3]]
> MovedError: Please address future lookups to [1,2,4] :)
>
> Both are pretty unsatisfactory. Conclusion: dictionary lookup with
> mutable types like lists is a source of unpleasant surprises for the
> programmer and therefore impossible in Python. This is the point where
> tuples come in. They have nearly the same interface as lists but cannot
> be changed once they have been created thereby avoiding the problems
> discussed for lists. Thus tuples can be used as dictionary keys.
>
> What about instances of user defined classes?
>
> User defined classes can be anything the programmer likes but yet their
> instances are usable as dictionary keys because there is a default:
> hash(object) = id(object) and cmp(object1, object2) compares id(object1)
> with id(object2). The same setting that has been discussed above for lists
> and has been found unsatisfactory. What is different here?
>
> 1. It is a default setting which can be adapted to each class by defining/
> overriding the methods __hash__ and __cmp__.
>
> 2. For objects identity is more important than for lists.
>
> That's it.
>
> Please correct me if I'm wrong. If not, a poor programmer's soul has
> found peace eventually :)
>
> --
> -------------------------------------------------------------------
> Peter Maas, M+R Infosysteme, D-52070 Aachen, Tel +49-241-93878-0
> E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
> -------------------------------------------------------------------



Relevant Pages

  • Re: comments on my design of a new language?
    ... > exist, plus strings and the function type, and that is it. ... indexed by a string or a regular expression, ... > dictionaries, that terminate with literals, or (possibly ... and lists. ...
    (comp.lang.misc)
  • Re: objects as mutable dictionary keys
    ... > dictionaries recently, and the mainstream opinion was that dictionary ... > keys must not be mutable, so lists are not allowed as dictionary keys. ... There are no other constraints on the class designer; ...
    (comp.lang.python)
  • Re: objects as mutable dictionary keys
    ... > dictionaries recently, and the mainstream opinion was that dictionary ... > keys must not be mutable, so lists are not allowed as dictionary keys. ... - Dictionary lookup with list literals would be impossible. ...
    (comp.lang.python)
  • Re: comments on my design of a new language?
    ... > indexed by a string or a regular expression, ... Trees are essentially nested dictionaries, ... I was originally going to treat scalars and unit lists equivalently, ... > dictionaries being indexed by strings or by regular expressions. ...
    (comp.lang.misc)
  • Re: Why are tuples immutable?
    ... but yet the fact that lists are mutable and tuples ... as dictionaries are one example. ... in the second we just allow mutables to ... > Except that Python uses dicts internally, ...
    (comp.lang.python)