[Python-Dev] For Python 3k, drop default/implicit hash, and comparison

Noam Raphael noamraph at gmail.com
Sun Nov 27 00:11:36 CET 2005


Three weeks ago, I read this and thought, "well, you have two options
for a default comparison, one based on identity and one on value, both
are useful sometimes and Guido prefers identity, and it's OK." But
today I understood that I still think otherwise.

In two sentences: sometimes you wish to compare objects according to
"identity", and sometimes you wish to compare objects according to
"values". Identity-based comparison is done by the "is" operator;
Value-based comparison should be done by the == operator.

Let's take the car example, and expand it a bit. Let's say wheels have
attributes - say, diameter and manufacturer. Let's say those can't
change (which is reasonable), to make wheels hashable. There are two
ways to compare wheels: by value and by identity. Two wheels may have
the same value, that is, they have the same diameter and were created
by the same manufacturer. Two wheels may have the same identity, that
is, they are actually the same wheel.

We may want to compare wheels based on value, for example to make sure
that all the car's wheels fit together nicely: assert car.wheel1 ==
car.wheel2 == car.wheel3 == car.wheel4. We may want to compare wheels
based on identity, for example to make sure that we actually bought
four wheels in order to assemble the car: assert car.wheel1 is not
car.wheel2 and car.wheel3 is not car.wheel1 and car.wheel3 is not
car.wheel2...

We may want to associate values with wheels based on their values. For
example, it's reasonable to suppose that the price of every wheel of
the same model is the same. In that case, we'll write: price[wheel] =
25. We may want to associate values with wheels based on their
identities. For example, we may want to note that a specific wheel is
broken. For this, I'll first define a general class (I defined it
before in one of the discussions, that's because I believe it's
useful):

class Ref(object):
    def __init__(self, obj):
        self._obj = obj
    def __call__(self):
        return self._obj
    def __eq__(self, other):
        return isinstance(other, ref) and self._obj is other._obj
    def __hash__(self):
        return id(self._obj) ^ 0xBEEF

Now again, how will we say that a specific wheel is broken? Like this:

broken[Ref(wheel)] = True

Note that the Ref class also allows us to group wheels of the same
kind in a set, regardless of their __hash__ method.

I think that most objects, especially most user-defined objects, have
a *value*. I don't have an exact definition, but a hint is that two
objects that were created in the same way have the same value.
Sometimes we wish to compare objects based on their identity - in
those cases we use the "is" operator. Sometimes we wish to compare
objects based on their value - and that's what the == operator is for.
Sometimes we wish to use the value of objects as a dictionary key or
as a set member, and that's easy. Sometimes we wish to use the
identity of objects as a dictionary key or as a set member - and I
claim that we should do that by using the Ref class, whose *value* is
the object's *identity*, or by using a dict/set subclass, and not by
misusing the __hash__ and __eq__ methods.

I think that whenever value-based comparison is meaningful, the __eq__
and __hash__ should be value-based. Treating objects by identity
should be done explicitly, by the one who uses the objects, by using
the "is" operator or the Ref class. It should not be the job of the
object to decide which method (value or identity) is more useful - it
should allow the user to use both methods, by defining __eq__ and
__hash__ based on value.

Please give me examples which prove me wrong. I currently think that
the only objects for whom value-based comparison is not meaningful,
are objects which represent entities which are "outside" of the
process, or in other words, entities which are not "computational".
This includes files, sockets, possibly user-interface objects,
loggers, etc. I think that objects that represent purely "data", have
a "value" that they can be compared according to. Even wheels that
don't have any attributes are simply equal to other wheels, and not
equal to other objects. Since user-defined classes can interact with
the "environment" only through other objects or functions, it  is
reasonable to suggest that they should get a value-based equality
operator. Many times the value is defined by the __dict__ and
__slots__ members, so it seems to me a reasonable default.

I would greatly appreciate repliers that find a tiny bit of reason in
what I said (even if they don't agree), and not deny it all as a
complete load of rubbish.

Thanks,
Noam


More information about the Python-Dev mailing list