equality & comparison by default (was Re: Too many 'self' in python.That's a big flaw in this language.)

Thu Jun 28 09:19:53 EDT 2007

On 2007-06-27, Alex Martelli <aleax at mac.com> wrote:
> A.T.Hofkamp <hat at se-162.se.wtb.tue.nl> wrote:
>
>>  I think that again now with the default implementation of the
>>  object.__eq__ and object.__hash__ methods. I believe these methods should
>>  not exist until the programmer explicitly defines them with a suitable
>>  notion of equivalence.
>> 
>>   Anybody have a good argument against that? :-)
>
> It's very common and practical (though not ideologically pure!) to want
> each instance of a class to "stand for itself", be equal only to itself:
> this lets me place instances in a set, etc, without fuss.

Convenience is the big counter argument, and I have thought about that.
I concluded that the convenience advantage is not big enough, and the problem
seems to be what "itself" exactly means.

In object oriented programming, objects are representations of values, and the
system shouldn't care about how many instances there are of some value, just
like numbers in math. Every instance with a certain value is the same as every
other instance with the same value.

You can also see this in the singleton concept. The fact that it is a pattern
implies that it is special, something not delivered by default in object
oriented programming.

This object-oriented notion of "itself" is not what Python delivers.

Python 2.4.4 (#1, Dec 15 2006, 13:51:44)
[GCC 3.4.4 20050721 (Red Hat 3.4.4-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class Car(object):
...   def __init__(self, number):
...     self.number = number
...   def __repr__(self):
...     return "Car(%r)" % self.number
...

>>> 12345 == 12345
True
>>> Car(123) == Car(123)
False

So in Python, the default equivalence notion for numbers is based on values,
and the default equivalence notion for objects assumes singleton objects which
is weird from an object oriented point of view.

Therefore, I concluded that we are better off without a default __eq__ .

The default existence of __hash__ gives other nasty surprises:

>>> class Car2(object):
...    def __init__(self, number):
...      self.number = number
...    def __repr__(self):
...       return "Car2(%r)" % self.number
...    def __eq__(self, other):
...       return self.number == other.number
...

Above I have fixed Car to use value equivalence (albeit not very robust).
Now if I throw these objects naively in a set:

>>> a = Car2(123)
>>> b = Car2(123)
>>> a == b
True
>>> set([a,b])
set([Car2(123), Car2(123)])

I get a set with two equal cars, something that never happens with a set
my math teacher once told me.

Of course, I should have defined an appropiate __hash__ method together with
the __eq__ method. Unfortunately, not every Python programmer has always had
enough coffee to think about that when he is programming a class. Even worse, I
may get a class such as the above from somebody else and decide that I need a
set of such objects, something the original designer never intended.
The problem is then that something like "set([Car2(123), Car2(124)])" does the
right thing for the wrong reason without telling me.

Without a default __hash__ I'd get at least an error that I cannot put Car2
objects in a set. In that setup, I can still construct a broken set, but I'd
have to write a broken __hash__ function explicitly rather than implicitly
inheriting it from object.

> I don't want, in order to get that often-useful behavior, to have to
> code a lot of boilerplate such as
>     def __hash__(self): return hash(id(self))
> and the like -- so, I like the fact that object does it for me.  I'd

I understand that you'd like to have less typing to do. I'd like that too if
only it would work without major accidents by simple omission such as
demonstrated in the set example.

Another question can be whether your coding style would be correct here.

Since you apparently want to have singleton objects (since that is what you get
and you are happy with them), shouldn't you be using "is" rather than "=="?
Then you get the equivalence notion you want, you don't need __eq__, and you
write explicitly that you have singleton objects.

In the same way, sets have very little value for singleton objects, you may as
well use lists instead of sets since duplicate **values** are not filtered.
For lists, you don't need __hash__ either.

The only exception would be to filter multiple inclusions of the same object
(that is what sets are doing by default). I don't know whether that would be
really important for singleton objects **in general**.
(ie wouldn't it be better to explicitly write a __hash__ based on identity for
those cases?)

> have no objection if there were two "variants" of object (object itself
> and politically_correct_object), inheriting from each other either way
> 'round, one of which kept the current practical approach while the other
> made __hash__ and comparisons abstract.

Or you define your own base object class "class Myobject(object)" and add a
default __eq__ and __hash__ method. This at least gives an explicit definition
of the equivalence notion for your application.

> In Python 3000, ordering comparisons will not exist by default (sigh, a
> modest loss of practicality on the altar of purity -- ah well, saw it
> coming, ever since complex numbers lost ordering comparisons), but
> equality and hashing should remain just like now (yay!).

I didn't try that, but it seems like a good decision. Ordering based on
identity may change with each invocation of the program!

Albert