difference in binding between strings and tuples?

Tue May 13 17:42:04 EDT 2003

On Tue, 2003-05-13 at 13:12, Iwan van der Kleyn wrote:

>  >>> x = 'test'
>  >>> y = 'test'

>  >>> x += 's'
>  >>> x == y
> 1

What???  This must be a typo.

What I would expect is that after this operation:

x == 'tests'
y == 'test'

thus,

x != y

To answer your question, though, the fact is that Python can, and
sometimes does, cache immutable objects, thus making the identity test a
little more complicated.  These are really implementation details, and
is one reason you should be careful about testing identity.

for example:

x = y = 1

Here, the '1' integer object is created, and the names x and y both
refer to it.  Thus it makes sense that x is y.

x = 1
y = 1

Here, the '1' integer object is (implictly) created twice.  It is just
that, under the hood, Python knows to really create it once, since it is
immutable (You can't change the one object to a two object) and it makes
things faster and more memory efficient.  So x is y is also true, but it
is not strictly mandated by the language, and is only done for some
immutable object (like small integers, which are actually
pre-allocated).

Consider this:

>>> a = 1L
>>> b = 1L
>>> a is b
0

Python does NOT cache long integer objects, and thus each '1L' object
is, in fact, a separate object (this is an implementation detail,
subject to change with different versions of the interpreter).

The long and short of it is that if you have mutable object, it
sometimes makes sense to test for identity (ie. using 'is'), to see if
different names are referring to the same object (since mutating the
object will change it for both names).  With immutable objects, is it
always better to test equality, for the reasons you discovered and that
are (hopefully) explained above.

Note that this is almost orthogonal to the notion of name binding.  In
either case, a name is bound to an object (or rather, objects can be
referred to with multiple names, from multiple scopes, etc.)  The only
confusion is whether creating an object creates something with a unique
id or not.

One final note - Object ids are unique for any instant in time, but as
objects are destructed and recreated, the memory allocator is free to
reuse that memory location, and thus the id number.  The result is that
if an object has a certain id at some point in time, and then an object
has that id at another point in time, there is no language guarantee
that they are the same object, or even equivalent objects.

So, in theory, identity should ONLY be used to establish that two names
refer to different objects, and never to establish that they refer to
the same object.  In practice, this rule is often ignored for the None
object, because it is special.  Also, it is unlikely to ever bite you in
practice for any object.  Still, I prefer equivalence checking to
identity checking (the counter-argument is that equivalence checking can
be overridden, identity checking cannot).

# Short program showing objects that are not equivalent, but have been
# given the same id as previously deleted objects due to reuse of
# memory.  Tested on Linux Python 2.2.2
d = {}
a = 1L
for i in range( 100000 ):
    a += 1L
    key = id(a)

    if d.has_key( key ):
        print "found reused id(): key=%s  object=%s " % (key, a)
        break
    else:
        d[key] = key
    print key, a

-- 

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)