id functions of ints, floats and strings

Thu Apr 3 20:50:03 EDT 2008

En Thu, 03 Apr 2008 19:27:47 -0300, <zillow10 at googlemail.com> escribió:

> Hi all,
>
> I've been playing around with the identity function id() for different
> types of objects, and I think I understand its behaviour when it comes
> to objects like lists and tuples in which case an assignment r2 = r1
> (r1 refers to an existing object) creates an alias r2 that refers to
> the same object as r1. In this case id(r1) == id(r2)  (or, if you
> like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
> 3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
> etc. ...this is all very well. Therefore, it seems that id(r) can be
> interpreted as the address of the object that 'r' refers to.
>
> My observations of its behaviour when comparing ints, floats and
> strings have raised some questions in my mind, though. Consider the
> following examples:
>
> #########################################################################
>
> # (1) turns out to be true
> a = 10
> b = 10
> print a is b

...only because CPython happens to cache small integers and return always  
the same object. Try again with 10000. This is just an optimization and  
the actual range of cached integer, or whether they are cached at all, is  
implementation (and version) dependent.
(As integers are immutable, the optimization *can* be done, but that  
doesn't mean that all immutable objects are always shared).

> # (2) turns out to be false
> f = 10.0
> g = 10.0
> print f is g

Because the above optimization isn't used for floats.
The `is` operator checks object identity: whether both operands are the  
very same object (*not* a copy, or being equal: the *same* object)  
("identity" is a primitive concept)
The only way to guarantee that you are talking of the same object, is  
using a reference to a previously created object. That is:

a = some_arbitrary_object
b = a
assert a is b

The name `b` now refers to the same object as name `a`; the assertion  
holds for whatever object it is.

In other cases, like (1) and (2) above, the literals are just handy  
constructors for int and float objects. You have two objects constructed  
(a and b, f and g). Whether they are identical or not is not defined; they  
might be the same, or not, depending on unknown factors that might include  
the moon phase; both alternatives are valid Python.

> # (3) checking if ids of all list elements are the same for different
> cases:
>
> a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
> b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
> f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
> g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
> g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
> False

Again, this is implementation dependent. If you try with a different  
Python version or a different implementation you may get other results -  
and that doesn't mean that any of them is broken.

> # (4) two equal floats defined inside a function body behave
> differently than case (1):
>
> def func():
> 	f = 10.0
> 	g = 10.0
> 	return f is g
>
> print func() # True

Another implementation detail related to co_consts. You shouldn't rely on  
it.

> I didn't mention any examples with strings; they behaved like ints
> with respect to their id properties for all the cases I tried.

You didn't try hard enough :)

py> x = "abc"
py> y = ''.join(x)
py> x == y
True
py> x is y
False

Long strings behave like big integers: they aren't cached:

py> x = "a rather long string, full of garbage. No, this isn't garbage,  
just non
sense text to fill space."
py> y = "a rather long string, full of garbage. No, this isn't garbage,  
just non
sense text to fill space."
py> x == y
True
py> x is y
False

As always: you have two statements constructing two objects. Whether they  
return the same object or not, it's not defined.

> While I have no particular qualms about the behaviour, I have the
> following questions:
>
> 1) Which of the above behaviours are reliable? For example, does a1 =
> a2 for ints and strings always imply that a1 is a2?

If you mean:

a1 = something
a2 = a1
a1 is a2

then, from my comments above, you should be able to answer: yes, always,  
not restricted to ints and strings.

If you mean:

a1 = someliteral
a2 = someliteral
a1 is a2

then: no, it isn't guaranteed at all, nor even for small integers or  
strings.

> 2) From the programmer's perspective, are ids of ints, floats and
> string of any practical significance at all (since these types are
> immutable)?

The same significance as id() of any other object... mostly, none, except  
for debugging purposes.

> 3) Does the behaviour of ids for lists and tuples of the same element
> (of type int, string and sometimes even float), imply that the tuple a
> = (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
> about a list, where elements can be changed at will?)

That's a different thing. A tuple maintains only references to its  
elements (as any other object in Python). The memory required for a tuple  
(I'm talking of CPython exclusively) is: (a small header) + n *  
sizeof(pointer). So the expression 10000*(anything,) will take more space  
than the singleton (anything,) because the former requires space for 10000  
pointers and the latter just one.

You have to take into account the memory for the elements themselves; but  
in both cases there is a *single* object referenced, so it doesn't matter.  
Note that it doesn't matter whether that single element is an integer, a  
string, mutable or immutable object: it's always the same object, already  
existing, and creating that 10000-uple just increments its reference count  
by 10000.

The situation is similar for lists, except that being mutable containers,  
they're over-allocated (to have room for future expansion). So the list  
[anything]*10000 has a size somewhat larger than 10000*sizeof(pointer);  
its (only) element increments its reference count by 10000.

-- 
Gabriel Genellina