Optimization of Tuples and Strings

Michael Robin me at mikerobin.com
Mon Jan 7 14:13:18 EST 2002


I think we need to be careful extending your experiment to all cases.
What happens in the interpreter prompt is not necessarily 
the same as what happens when you run your program.

The idea of 'same object' is really program- and language-implementation dependent.

Even atomic objects can be tricky:
---------------------
>>> 12345678 is 12345678
1
>>> a = 12345678
>>> b = 12345678
>>> a is b
0

>>> 3 is 3
1
>>> a = 3
>>> b = 3
>>> a is b
1
--------------------

Small integers are optimized, so they will have the same object pointers,
but in general numbers aren't garanteed object identity.
Note that '12345678 is 12345678' (a one-liner) was not even
scalable to a two-liner at the interpreter prompt.
And of course, with floats you're not even sure about object equality...

As for strings, consider:
-----------------
>>> 
>>> a = "abc1"
>>> b = "abc1"
>>> a is b
1

>>> "a b c 1" is "a b c 1"
1

>>> a = "a b c 1"
>>> b = "a b c 1"
>>> a is b
0
------------------

This is because as an optimization, the Python reader treats all 
strings that are valid identifiers (even if they are 'constants')
specially - they are interned. (That is, they are added to a special
dictionary so that their id's will be the same, speeding up
compare operations later.) So in this case, the reader is effecting
object identity.

Python is not akin to a "hashed LISP" - so, in general,
the answer to:
> > since they are immutable, does it just "point" multiple
> > references to an identical tuple to the same object?  

is 'no, they do not', unless you know that:

(a) they are "smallints"
(b) they are identifier-like strings interned by the reader,
    or other objects the reader decides to cache.
(c) they are strings you interned yourself
(d) they are objects that you looked up in a mapping yourself
    and arranged to point to the same object
(e) your algolrithm is designed to point like-objects to like-objects
    by some other design.
(f) ??? (am I missing something?)

In general, if your objects are computed at run-time rather than
read by the reader, don't expect automagic object identity except in
rare cases.

As for lists, the way default args work (e.g., def fn(arg=[0]) ) 
is at least one reason why the reader can't hash them.

This comes up often enough - maybe 'When are object id's garanteed 
to be equal' should go in the docs or faq? (I looked - but maybe
I missed it.)

thanks,
mike



Carl Caulkett <carlca at dircon.co.uk> wrote in message news:<MPG.16a362fd907a289c9896d2 at newnews.dircon.co.uk>...
> In article <a1bik0$9b$1 at slb4.atl.mindspring.net>, 
> mgerrans at mindspring.com says...
> > Anyone here know whether Python treats strings and tuples the way Java does
> > Strings?   That is, since they are immutable, does it just "point" multiple
> > references to an identical tuple to the same object?  For example:
> > 
> > x = (1,2)
> > y = (1,2)
> > 
> > Are x and y really referring to the same object?
>  
> >>> x = (1,2)
> >>> y = (1,2)
> >>> id(x)
>  15769248
> >>> id(y)
>  15793536
> >>> 
>  
> >>> a = '123'
> >>> b = '123'
> >>> id(a)
>  19919952
> >>> id(b)
>  19919952
> >>> 
> 
> Tuples no, Strings yes, apparently. (using Python 2.2)



More information about the Python-list mailing list