hash values and equality

Chris Rebert clp2 at rebertia.com
Fri May 20 15:00:43 EDT 2011


On Fri, May 20, 2011 at 10:56 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Chris Rebert wrote:
>> On Thu, May 19, 2011 at 10:43 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>>> Several folk have said that objects that compare equal must hash equal,
>>> and
>>> the docs also state this
>>> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>>>
>>> I'm hoping somebody can tell me what horrible thing will happen if this
>>> isn't the case?
<snip>
>> Here's a more common/plausible "horrible" case closer to what the docs
>> writers had in mind:
>> --> class Naughty(object):
>> ...     def __init__(self, n):
>> ...         self.n = n
>> ...     def __eq__(self, other):
>> ...         return self.n == other.n
>> ...
>> --> Naughty(5) == Naughty(5)
>> True
>> --> Naughty(5) is Naughty(5)
>> False
>> --> bad = Naughty(3)
>> --> y = {bad : 'foo'}
>> --> y[bad] # just happens to work
>> 'foo'
>> --> del bad
>> --> # ok, how do we get to 'foo' now?
>> --> y[Naughty(3)] # try the obvious way
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>> KeyError: <__main__.Naughty object at 0x2a1cb0>
>> --> # We're screwed.
>>
>> Naughty instances (and similar) can't be used sensibly as hash keys
>> (unless you /only/ care about object identity; this is often not the
>> case).
>
> I tried this sequence (using Python 3, BTW -- forgot to mention that little
> tidbit -- sorry!):

Doesn't matter anyway.

> --> del two
> --> two
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> NameError: name 'two' is not defined
> --> d
> {<__main__.Wierd object at 0x00C0C950>: '3',
>  1: '1.0',
>  2: '2.0',
>  3: '3.0',
>  <__main__.Wierd object at 0x00B3AC10>: '2',
>  <__main__.Wierd object at 0x00B32E90>: '1'}
> --> new_two = Wierd(2)
> --> d[new_two]
> '2'

Right, this is why I went to the trouble of writing Naughty instead of
using Wierd. Wierd's exact problem is less common and less severe. The
"equality implies identical hash" rule is not a universal one; some
other languages instead impose the lesser requirement of "equality and
same (or related) types implies identical hash". In Ruby for instance:
irb(main):001:0> 1 == 1.0
=> true
irb(main):002:0> a = {1.0 => 'hi'} # float key
=> {1.0=>"hi"}
irb(main):003:0> a[1] = 'bye' # int key
=> "bye"
irb(main):004:0> a # notice how they don't collide:
=> {1=>"bye", 1.0=>"hi"}
(Contrast this with my earlier analogous Python example.)

Basically, Naughty is fundamentally broken [hash(Naughty(2)) !=
hash(Naughty(2))], whereas Wierd merely defies convention [hash(2) !=
hash(Wierd(2)) but hash(Wierd(2)) == hash(Wierd(2))].

Cheers,
Chris
--
http://rebertia.com



More information about the Python-list mailing list