objects as mutable dictionary keys

John Roth newsgroups at jhrothjr.com
Tue Dec 28 17:19:37 EST 2004


I think that's a good summary. The condensed
version is that the results of both __hash__() and
__cmp__() have to remain stable for dicts to work
as one would expect. __cmp__ doesn't do that
for lists, and it isn't defined by default for user
objects.

John Roth

"Peter Maas" <peter at somewhere.com> wrote in message 
news:33e3k7F3ulad0U1 at individual.net...
> Peter Maas schrieb:
>> There was a huge and sometimes heated debate about tuples, lists and
>> dictionaries recently, and the mainstream opinion was that dictionary
>> keys must not be mutable, so lists are not allowed as dictionary keys.
>
> Warning, long posting (~ 100 lines)
>
> The existence of lists and tuples and the impossibility to use
> lists as dictionary keys is a constant source of irritation for
> Python newcomers. Recent threads in c.l.py have revealed that
> even programmers like me using Python for some years now often don't
> have a clear picture. To put an end to this misery I tried to learn
> from the recent discussions and to explain to myself what's going on.
> My problem was that I didn't know exactly how a dictionary and
> a dictionary lookup works:
>
> A dictionary maps key values to data values. Sketch of a dictionary
> lookup(value = dict[key]):
>
> def lookup(dict, key):
>     '''dictionary lookup is done in three steps:
>        1. A hash value of the key is computed using a hash function.
>           Hash functions map a large set L to a (usually) smaller
>           set S.
>
>           A hash function must satisfy:
>           (*) for all e1, e2 in L: h(e1) != h(e2) => e1 != e2
>
>           A hash function should satisfy as well as possible:
>               for all e1, e2 in L: h(e1) == h(e2) => e1 == e2
>
>           Violation of the latter condition means a hash collision,
>           i.e. two or more elements of L have the same hash value.
>           This should be highly unlikely for good hash functions.
>           If not, hash lookup becomes as slow as sequential lookup.
>
>        2. The hash value addresses a location in dict.data which is
>           supposed to be an array of bins. A bin contains a collision
>           list of (key,value) pairs.
>
>        3. The collision list addressed by the hash value is searched
>           sequentially until a pair is found with pair[0] == key. The
>           return value of the lookup is then pair[1]. Equality (==) is
>           defined by the function cmp(). The return value for equality is
>           0, for inequality some other value.
>     '''
>     h = hash(key)                  # step 1
>     cl = dict.data[h]              # step 2
>     for pair in cl:                # step 3
>         if cmp(key, pair[0]) == 0:
>             return pair[1]
>     else:
>         raise KeyError, "Key %s not found." % key
>
> Thus a data type must be a valid input for hash() and cmp() to be usable
> as a dictionary key. hash() and cmp() must satisfy the condition (*) in
> lookup.__doc__ for this data type.
>
> Is a list a suitable candidate for a dictionary key? hash(list) = id(list)
> and cmp(list1, list2) comparing id(list1) with id(list2) would satisfy
> condition (*). But this definition completely ignores list contents,
> causing big surprises for programmers:
>
> - Different lists with the same content would be mapped to different
>   dictionary values.
>
> - Dictionary lookup with list literals would be impossible.
>
> To avoid these surprises dictionary lookup would have to use list contents
> instead of list id. Consider the (hypothetical, not working) code:
>
> >>> d = dict()
> >>> li = [1,2,3]
> >>> d[li] = "123"
> >>> d[[1,2,3]]
> "123"
>
> Now assume li is changed (e.g. li[2] = 4). There are two options to handle 
> this,
> let the dictionary ignore changes
>
> >>> d[li]
> KeyError: [1,2,4] not found in dictionary. (even worse: a previously 
> existing
> [1,2,4] map is returned).
>
> or let the dictionary follow changes
>
> >>> d[[1,2,3]]
> MovedError: Please address future lookups to [1,2,4] :)
>
> Both are pretty unsatisfactory. Conclusion: dictionary lookup with
> mutable types like lists is a source of unpleasant surprises for the
> programmer and therefore impossible in Python. This is the point where
> tuples come in. They have nearly the same interface as lists but cannot
> be changed once they have been created thereby avoiding the problems
> discussed for lists. Thus tuples can be used as dictionary keys.
>
> What about instances of user defined classes?
>
> User defined classes can be anything the programmer likes but yet their
> instances are usable as dictionary keys because there is a default:
> hash(object) = id(object) and cmp(object1, object2) compares id(object1)
> with id(object2). The same setting that has been discussed above for lists
> and has been found unsatisfactory. What is different here?
>
> 1. It is a default setting which can be adapted to each class by defining/
>    overriding the methods __hash__ and __cmp__.
>
> 2. For objects identity is more important than for lists.
>
> That's it.
>
> Please correct me if I'm wrong. If not, a poor programmer's soul has
> found peace eventually :)
>
> -- 
> -------------------------------------------------------------------
> Peter Maas,  M+R Infosysteme,  D-52070 Aachen,  Tel +49-241-93878-0
> E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
> ------------------------------------------------------------------- 




More information about the Python-list mailing list