equality & comparison by default

A.T.Hofkamp hat at se-162.se.wtb.tue.nl
Fri Jun 29 09:15:24 EDT 2007


On 2007-06-29, Gabriel Genellina <gagsl-py2 at yahoo.com.ar> wrote:
> En Thu, 28 Jun 2007 11:38:56 -0300, A.T.Hofkamp <hat at se-162.se.wtb.tue.nl>  
> escribió:
>
>> The point I intended to make was that having a default __hash__ method on
>> objects give weird results that not everybody may be aware of.
>> In addition, to get useful behavior of objects in sets one should  
>> override
>> __hash__ anyway, so what is the point of having a default  
>> object.__hash__ ?
>
> __hash__ and equality tests are used by the dictionary implementation, and  
> the default implementation is OK for immutable objects. I like the fact  

I don't understand exactly how mutability relates to this.

The default __eq___ and __hash__ implementation for classes is ok if you never
have equivalent objects. In that case, == and 'is' are exactly the same
function in the sense that for each pair of arguments, they deliver the same
value.

This remains the case even if I mutate existing objects without creating
equivalent objects.

As soon as I create two equivalent instances (either by creating a duplicate at
a new address, or by mutating an existing one) the default __eq__ should be
redefined if you want these equivalent objects to announce themselves as
equivalent with the == operator.

> that I can use almost anything as dictionary keys without much coding.

Most data-types of Python have their own implementation of __eq__ and __hash__
to make this work. This is good, it makes the language easy to use. However for
home-brewn objects (derived from object) the default implementation of these
functions may easily cause unexpected behavior and we may be better off without
a default implementation for these functions. That would prevent use of such
objects in combination with == or in sets/dictionaries without an explicit
definition of the __eq__ and __hash__ functions, but that is not very bad,
since in many cases one would have to define the proper equivalence notion
anyway.

> This must always be true: (a==b) => (hash(a)==hash(b)), and the  
> documentation for __hash__ and __cmp__ warns about the requisites (but  
> __eq__ and the other rich-comparison methods are lacking the warning).

I don't know exactly what the current documentation says. One of the problems
is that not everybody is reading those docs. Instead they run a simple test
like "print set([Car(1),Car(2)])". That gives the correct result even if the
"(a==b) => (hash(a)==hash(b))" relation doesn't hold due to re-definition of
__eq__ but not __hash__ (the original designer never expected to use the class
in a set/dictionary for example) , and the conclusion is "it works". Then they
use the incorrect implementation for months until they discover that it doesn't
quite work as expected, followed by a long debugging session to find and
correct the problem.

Without default __eq__ and __hash__ implementations for objects, the program
would drop dead on the first experiment. While it may be inconvenient at that
moment (to get the first experiment working, one needs to do more effort), I
think it would be preferable to having an incorrect implementation for months
without knowing it. In addition, a developer has to think explicitly about his
notion of equivalence.

Last but not least, in the current implementation, you cannot see whether there
is a __eq__ and/or __hash__ equivalence notion. Lack of an explicit definition
does not necessarily imply there is no such notion.  Without default object
implementation this would also be uniqly defined.


Albert




More information about the Python-list mailing list