objects as mutable dictionary keys

Peter Maas peter at somewhere.com
Tue Dec 28 17:03:10 EST 2004


Peter Maas schrieb:
> There was a huge and sometimes heated debate about tuples, lists and
> dictionaries recently, and the mainstream opinion was that dictionary
> keys must not be mutable, so lists are not allowed as dictionary keys.

Warning, long posting (~ 100 lines)

The existence of lists and tuples and the impossibility to use
lists as dictionary keys is a constant source of irritation for
Python newcomers. Recent threads in c.l.py have revealed that
even programmers like me using Python for some years now often don't
have a clear picture. To put an end to this misery I tried to learn
from the recent discussions and to explain to myself what's going on.
My problem was that I didn't know exactly how a dictionary and
a dictionary lookup works:

A dictionary maps key values to data values. Sketch of a dictionary
lookup(value = dict[key]):

def lookup(dict, key):
     '''dictionary lookup is done in three steps:
        1. A hash value of the key is computed using a hash function.
           Hash functions map a large set L to a (usually) smaller
           set S.

           A hash function must satisfy:
           (*) for all e1, e2 in L: h(e1) != h(e2) => e1 != e2

           A hash function should satisfy as well as possible:
               for all e1, e2 in L: h(e1) == h(e2) => e1 == e2

           Violation of the latter condition means a hash collision,
           i.e. two or more elements of L have the same hash value.
           This should be highly unlikely for good hash functions.
           If not, hash lookup becomes as slow as sequential lookup.

        2. The hash value addresses a location in dict.data which is
           supposed to be an array of bins. A bin contains a collision
           list of (key,value) pairs.

        3. The collision list addressed by the hash value is searched
           sequentially until a pair is found with pair[0] == key. The
           return value of the lookup is then pair[1]. Equality (==) is
           defined by the function cmp(). The return value for equality is
           0, for inequality some other value.
     '''
     h = hash(key)                  # step 1
     cl = dict.data[h]              # step 2
     for pair in cl:                # step 3
         if cmp(key, pair[0]) == 0:
             return pair[1]
     else:
         raise KeyError, "Key %s not found." % key

Thus a data type must be a valid input for hash() and cmp() to be usable
as a dictionary key. hash() and cmp() must satisfy the condition (*) in
lookup.__doc__ for this data type.

Is a list a suitable candidate for a dictionary key? hash(list) = id(list)
and cmp(list1, list2) comparing id(list1) with id(list2) would satisfy
condition (*). But this definition completely ignores list contents,
causing big surprises for programmers:

- Different lists with the same content would be mapped to different
   dictionary values.

- Dictionary lookup with list literals would be impossible.

To avoid these surprises dictionary lookup would have to use list contents
instead of list id. Consider the (hypothetical, not working) code:

 >>> d = dict()
 >>> li = [1,2,3]
 >>> d[li] = "123"
 >>> d[[1,2,3]]
"123"

Now assume li is changed (e.g. li[2] = 4). There are two options to handle this,
let the dictionary ignore changes

 >>> d[li]
KeyError: [1,2,4] not found in dictionary. (even worse: a previously existing
[1,2,4] map is returned).

or let the dictionary follow changes

 >>> d[[1,2,3]]
MovedError: Please address future lookups to [1,2,4] :)

Both are pretty unsatisfactory. Conclusion: dictionary lookup with
mutable types like lists is a source of unpleasant surprises for the
programmer and therefore impossible in Python. This is the point where
tuples come in. They have nearly the same interface as lists but cannot
be changed once they have been created thereby avoiding the problems
discussed for lists. Thus tuples can be used as dictionary keys.

What about instances of user defined classes?

User defined classes can be anything the programmer likes but yet their
instances are usable as dictionary keys because there is a default:
hash(object) = id(object) and cmp(object1, object2) compares id(object1)
with id(object2). The same setting that has been discussed above for lists
and has been found unsatisfactory. What is different here?

1. It is a default setting which can be adapted to each class by defining/
    overriding the methods __hash__ and __cmp__.

2. For objects identity is more important than for lists.

That's it.

Please correct me if I'm wrong. If not, a poor programmer's soul has
found peace eventually :)

-- 
-------------------------------------------------------------------
Peter Maas,  M+R Infosysteme,  D-52070 Aachen,  Tel +49-241-93878-0
E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
-------------------------------------------------------------------



More information about the Python-list mailing list