'is not' or '!='

Tue Aug 19 14:26:50 EDT 2014

Marko Rauhamaa wrote:

> Skip Montanaro <skip at pobox.com>:
> 
>> The use of "is" or "is not" is the right thing to do when the object
>> of the comparison is known to be a singleton.
> 
> Object identity stirs a lot of passions on this forum. I'm guessing the
> reason is that it is not defined very clearly (<URL:
> https://docs.python.org/3/library/functions.html#id>):

Identity is defined very clearly. As you just quoted:

>    id(object)
> 
>        Return the “identity” of an object. This is an integer which is
>        guaranteed to be unique and constant for this object during its
>        lifetime. Two objects with non-overlapping lifetimes may have the
>        same id() value.

Python identity is represented by an integer, and it is guaranteed to be
unique and constant for the lifetime of the object. It may or may not be
reused once the object no longer exists. That's all you need to know about
identity; that's *all there is to know* about identity in Python.

[Actually, to be pedantic, one also needs to state that the objects have to
be part of a single Python process. Object X in one process, and object Y
in another process, may have the same id() but still be considered
distinct.]

>        CPython implementation detail: This is the address of the object
>        in memory.

I really wish CPython didn't do that, or at least not admit to it. It does
nothing but confuse people.

> The "is" relation can be defined trivially through the id() function:
> 
>    X is Y iff id(X) == id(Y)

Except that id() is a built-in function and can be shadowed or
monkey-patched, while the `is` operator is a keyword and cannot be. But
apart from that minor point, I agree.

> What remains is the characterization of the (total) id() function. For
> example, we can stipulate that:
> 
>    X = Y
>    assert(id(X) == id(Y))
>    # assignment preserves identity

That's not a property of identity. That's a property of *assignment*. So you
cannot use that fact to define identity in Python, since there could be
another language with the *exact* same definition of identity but that does
copy-on-assignment instead.

> (assuming X and Y are not modified in other threads or signal handlers).
> 
> We know further that:
> 
>    i = id(X)
>    time.sleep(T)
>    assert(i == id(X))
>    # the identity does not change over time

That would be the part of the definition that says the identity is constant.

>    def f(x, y):
>        return id(x) == id(y)
>    assert(f(X, X))
>    # parameter passing preserves the identity

Again, that's not a property of identity. There could be a language just
like Python in all respects, including identity, except that parameters are
passed by value.

[snip more examples of things which tell us nothing about identity]

> The nice thing about these kinds of formal definitions is 
> that they make no metaphysical reference to "objects" or "lifetime" (or
> the CPython implementation).

They are not metaphysical. They are concrete. You cannot understand the
semantics of identity in Python without understanding Python's execution
model. Python's execution model contains objects (which are not
metaphysical woo, but a concrete computer science data structure), and
identity in Python is defined in terms of objects. Not values, or names, or
namespaces, but objects.

> They can also be converted into
> implementation conformance statements and test cases.

True, but in most of the examples you show, they will be tests of some other
aspect of Python, e.g. that assignment (name binding) preserves identity.
They don't help us understand identity, because there are other models for
assignment, and there are an infinite number of things which could also be
preserved by assignment but aren't identity.

E.g. "the number of zero bits in the object struct".

> A much tougher task is to define when id(X) != id(Y). After all, all of
> the above would be satisfied by this implementation of id():
> 
>    def id(x):
>        return 0

That fails the definition given, that identities are *unique*.

Defining non-identity for objects that exist simultaneously is simple:

id(X) != id(Y) iff not (X is Y)

We don't have good notation for discussing objects which exist at different
times, but we can fake it with the rule:

"if either X or Y or both raise NameError, then we deem `X is Y` to be
false"

> The nonidentity will probably have to be defined separately for each
> builtin datatype. 

That is incorrect. See below.

> For example, for integers and strings we know only  
> that:
> 
>    assert(X == Y or id(X) != id(Y))
>    # inequality implies nonidentity

That tells us something about string equality. It tells us nothing about
identity.

Python's concept of identity applies equally to all types. Read the
definition again: it refers to objects, but without caring about the type
of objects. Let's just consider two types:

str:
    assert (X == Y or id(X) != id(Y)) always passes

float:
    assert (X == Y or id(X) != id(Y)) sometimes fails

Proof of the second case:

py> X = Y = float('nan')
py> assert (X == Y or id(X) != id(Y))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

This demonstrates that the condition

    (X == Y or id(X) != id(Y))

fails to tell us anything useful about identity, since it is sometimes true
and sometimes false.

You cannot understand identity from first principles, precisely because it
is not a metaphysical concept in Python. In Python it is defined by and in
terms of the concrete programming model of the language.

-- 
Steven