sameness/identity

Mon Oct 1 02:59:19 EDT 2001

[Greg Weeks]
> ...
> Consider:
>
>     x = 6.02
>     y = 6.02
>
> Are x and y identical?  They are identical as *numbers* (which I'll call
> conceptual identity) but not as Python *implentations* of numbers (since
> id(x) != id(y)).

You can't know that, though.  Here:

>>> x = 6.02  # this
>>> y = 6.02  # and this
>>> id(x) == id(y)  #  just happen to be different objects today
0
>>> def f():
...     x = 6.02   # but this
...     y = 6.02   # and this
...     return id(x) == id(y) # just happen to be the same object today
...
>>> f()
1
>>>

Python guarantees nothing about identity wrt floats (or strings, or ints,
longs, complexes or tuples), except for the trivial id(x) == id(x).  Objects
of these types that compare equal may or may not be the same object, at the
implementation's discretion.  You have no control over it, and don't need
any.

For the builtin objects of other types, Python does guarantee that

    id(C(...))

will be unique among all id()s of currently living objects, where C(...) is
a constructor for the type.  So it's an accident of optimization that

    "abc" + "" is "abc"

today, but it's deliberate and guaranteed that

    [] is not []

> Good programming style requires us to think of numbers primarily
> as numbers and not as their implementations.

I expect this depends on the app, but Python does favor value semantics for
numbers, even to the extent that (e.g.)

    hash(i) == hash(long(i)) == hash(float(i)) -- hash(complex(i))

for all integers i (at least on 32-bit boxes, so that float(i) never loses
precision).

> So we are more interested in conceptual identity than in implementation
> identity.

Well, Python doesn't model either of those concepts.  For objects of builtin
types, it has (deep) value equality and object identity instead, varying by
type according to which seemed more useful for objects of that type.  Object
identity is near trivial:  "x is y" iff x and y name the same object.  You
view that as "implementation identity", but in a real sense that's
backwards:  object identity is the primary concept, and the implementation
strives to implement object identity faithfully.  It's not really that "is"
just happens to expose whatever the heck the implementation does; "is" is an
implementation of object identity.

> Now consider a simple Bank_account class:
>
>     class Bank_account:
>         def __init__(me, initial_balance):
> 	        me.balance = initial_balance
> ...
>     my_bank_account = Bank_account(100)
>     your_bank_account = Bank_account(100)
>
> Are the two bank accounts the same bank account?  No.  (How would
> you feel, for example, if I withdrew money from your bank account?)  x
> and y are conceptually the same, but my_bank_account and
> your_bank_account are not.

This is why the default __eq__ and __cmp__ implementations for instances use
object identity, but *allow* you to override those in case it's not suitable
for your objects -- "most useful most often".  In contrast,

    class Set:
        def __init__(self, sequence):
           self.data = data = []
           for x in seq:
               if x not in data:
                   data.append(x)

In almost all apps, it would be insane if Set([1, 2]) != Set([2, 1]), so the
ability to override __eq__ is crucial.

> ...
> So, we've encountered three kinds of identity in this discussion:
>
>     A.  conceptual identity
>     B.  implementation identity
>     C.  state identity
>
> In Python, B corresponds to "is".

"is" implements object identity.

> And to many Python programmers, C corresponds to "==".

It depends on the programmer and the app, and they can make "==" mean
anything their app needs.

> But that leaves A without an operator.

Feel free to overload "<<" if you're afraid of named methods <wink>.  Note
that several other notions of equality can be useful for container types
too:  "C1 == C2" may be most useful as simple object identity at top level,
or object identity of all the contained objects (and then with or without
regard to ordering), or value equality of the contained objects (likewise),
or mixtures depending on the types of the contained objects and/or the depth
of the search.

> I'm unable to swallow that.  So, I use "==" for A; I never use "is";

You should:  object identity is a powerful tool, in some apps.

> and if I need C -- which hasn't happened yet -- I write a method for it.

Cool.  I take it you haven't yet written a rational-number, Set or Date
class <wink -- but today is today whether you got here from yesterday or
moved back in time from tomorrow>.

> This works pretty well.  It works for all the immutable types.  It also
> works for bank accounts *as defined above*.  That's why you won't find me
> adding __cmp__ and __hash__ methods to bank accounts.

An irony is that "is" does exactly what you say you need for bank accounts.
Refusing to use "is" for its intended purpose doesn't seem a particularly
good idea.

> Note, by the way, that the above bank accounts are suitable hash table
> keys (and I have found that useful on occasion).

That gets closer to a real (IMO) problem:  the overload of __eq__ most
useful for a class isn't always the best idea for dict keys.  But then you
can write a wrapper for the dict-key use, redefining its __eq__ and __hash__
as needed.

> Unfortunately, "==" does not represent conceptual identity for lists and
> dictionaries.  (Lists and dictionaries are like bank accounts.  They are
> not conceptually identical even if they happen to have the same state.)
> That's what I mean when I say Python "got it wrong".

Eh -- value comparison for lists and dicts isn't compelling, but neither is
identity comparison (I've used languages with both, and they both suck for
*some* apps).  Going back to the Set() example, when using dicts to
represents sets, value comparison of dicts is exactly what's desired for
__eq__.  Ditto using, e.g., bisect.bisect to mantain a priority queue of
list-based info.  There are many examples where one is more useful than the
other.  But since "is" already does identity comparison for lists and dicts,
having __eq__ do that too seems a waste.

Note that in 2.2 you can subclass from dictionary (or list) and provide your
own idea of what __eq__ means; e.g., from a 2.2a4 shell session:

>>> class identity_list(list):
...     def __eq__(self, other):
...         return self is other
...
>>> x = identity_list([1])
>>> x.append(2)
>>> y = x[:]
>>> print x, y
[1, 2] [1, 2]
>>> x == y
0
>>>

Note that since I didn't override the slicing operator, the "y = x[:]" is
actually performed by the base (list) type, and yields an instance of list
(not of identity_list).  But I did override __eq__, so it's
identity_list.__eq__ that performed the comparison.

>>> list.__eq__(x, y)
1
>>>

That is, list's __eq__ still considers them to be equal.

> I can't *prove* that it is wrong.  I *prefer* an operator for conceptual
> identity.  And I have a *hunch* that programs are better with an operator
> for conceptual identity.  But -- as all of you know -- 95% of my hunch is
> really just my preference.

I think it's nice that Python gives different *default* meanings for __eq__
depending on type -- by your own account, it "does the right thing" by
default for both numbers and bank accounts, despite that it does different
things in those cases.  And I'm not sorry it doesn't have 13 different ways
to spell "equality-like operator".  Two ways may be one too many already, to
judge from the amount of confusion out there over "is".

it's-the-simplest-things-that-provoke-the-most-confusion-ly y'rs  - tim