Set and {} comparison confusion

Alex Martelli aleaxit at yahoo.com
Thu Sep 9 03:28:32 EDT 2004


Roman Yakovenko <roman.yakovenko at actimize.com> wrote:

> Hi. Could somebody explain why sets and dict are different ?

sets.Set is build around dict, so let's concentrate on dict, the set
just follows from there.

> 
> from sets import Set as set
> 
> class A(object):
>       def __init__( self, x ):
>               self.x = x
>       def __eq__( self, other ):
>               return self.__dict__ == other.__dict__

This class is NOT correctly hashable.

A crucial semantic condition for hashing is:

    X==Y   ***MUST imply***   hash(X)==hash(Y)

But this is not ensured by this class A.  For example, as you've noticed
all A(1) instances compare equal to each other:

> print 'A(1) == A(1)', A(1) == A(1)

and yet their hashes are all different (and happen to equal their id's,
since you have not overridden __hash__):

xx = []
for i in xrange(4):
    xx.append(A(1))
    print hash(xx[-1])

emits, e.g.:

255856
256752
359376
357840

((note we're careful to keep the A(1)'s around, otherwise, if one had
been garbage collected the next one might happen to be allocated just at
the same place, thus accidentally have the same id...)).

As you are violating a semantic precondition of dict (that it be only
keyed into by _validly_ hashable keys), don't be surprised if the
behavior of such dicts is weird.  You can imagine equality comparison
starts by checking equality of hash values for keys to bail out fast in
most cases, here the hash values are different, so, poof.

Not easy to make that class A _correctly_ hashable, either (and it
really should be immutable too).  Assuming self.x is always hashable and
nobody ever reassigns it nor assigns any other attribute, you could try
adding something like...:

    def __hash__(self):
        return hash(tuple(self.__dict__.iteritems()))


Alex



More information about the Python-list mailing list