id functions of ints, floats and strings

Sat Apr 5 22:30:24 EDT 2008

Gabriel Genellina wrote:
> En Thu, 03 Apr 2008 19:27:47 -0300, <zillow10 at googlemail.com> escribió:
> 
>> Hi all,
>>
>> I've been playing around with the identity function id() for different
>> types of objects, and I think I understand its behaviour when it comes
>> to objects like lists and tuples in which case an assignment r2 = r1
>> (r1 refers to an existing object) creates an alias r2 that refers to
>> the same object as r1. In this case id(r1) == id(r2)  (or, if you
>> like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
>> 3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
>> etc. ...this is all very well. Therefore, it seems that id(r) can be
>> interpreted as the address of the object that 'r' refers to.
>>
>> My observations of its behaviour when comparing ints, floats and
>> strings have raised some questions in my mind, though. Consider the
>> following examples:
>>
>> #########################################################################
>>
>> # (1) turns out to be true
>> a = 10
>> b = 10
>> print a is b
> 
> ...only because CPython happens to cache small integers and return always  
> the same object. Try again with 10000. This is just an optimization and  
> the actual range of cached integer, or whether they are cached at all, is  
> implementation (and version) dependent.
> (As integers are immutable, the optimization *can* be done, but that  
> doesn't mean that all immutable objects are always shared).
> 
>> # (2) turns out to be false
>> f = 10.0
>> g = 10.0
>> print f is g
> 
> Because the above optimization isn't used for floats.
> The `is` operator checks object identity: whether both operands are the  
> very same object (*not* a copy, or being equal: the *same* object)  
> ("identity" is a primitive concept)
> The only way to guarantee that you are talking of the same object, is  
> using a reference to a previously created object. That is:
> 
> a = some_arbitrary_object
> b = a
> assert a is b
> 
> The name `b` now refers to the same object as name `a`; the assertion  
> holds for whatever object it is.
> 
> In other cases, like (1) and (2) above, the literals are just handy  
> constructors for int and float objects. You have two objects constructed  
> (a and b, f and g). Whether they are identical or not is not defined; they  
> might be the same, or not, depending on unknown factors that might include  
> the moon phase; both alternatives are valid Python.
> 
>> # (3) checking if ids of all list elements are the same for different
>> cases:
>>
>> a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
>> b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
>> f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
>> g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
>> g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
>> False
> 
> Again, this is implementation dependent. If you try with a different  
> Python version or a different implementation you may get other results -  
> and that doesn't mean that any of them is broken.
> 
>> # (4) two equal floats defined inside a function body behave
>> differently than case (1):
>>
>> def func():
>> 	f = 10.0
>> 	g = 10.0
>> 	return f is g
>>
>> print func() # True
> 
> Another implementation detail related to co_consts. You shouldn't rely on  
> it.
> 
>> I didn't mention any examples with strings; they behaved like ints
>> with respect to their id properties for all the cases I tried.
> 
> You didn't try hard enough :)
> 
> py> x = "abc"
> py> y = ''.join(x)
> py> x == y
> True
> py> x is y
> False
> 
> Long strings behave like big integers: they aren't cached:
> 
> py> x = "a rather long string, full of garbage. No, this isn't garbage,  
> just non
> sense text to fill space."
> py> y = "a rather long string, full of garbage. No, this isn't garbage,  
> just non
> sense text to fill space."
> py> x == y
> True
> py> x is y
> False
> 
> As always: you have two statements constructing two objects. Whether they  
> return the same object or not, it's not defined.
> 
>> While I have no particular qualms about the behaviour, I have the
>> following questions:
>>
>> 1) Which of the above behaviours are reliable? For example, does a1 =
>> a2 for ints and strings always imply that a1 is a2?
> 
> If you mean:
> 
> a1 = something
> a2 = a1
> a1 is a2
> 
> then, from my comments above, you should be able to answer: yes, always,  
> not restricted to ints and strings.
> 
> If you mean:
> 
> a1 = someliteral
> a2 = someliteral
> a1 is a2
> 
> then: no, it isn't guaranteed at all, nor even for small integers or  
> strings.
> 
>> 2) From the programmer's perspective, are ids of ints, floats and
>> string of any practical significance at all (since these types are
>> immutable)?
> 
> The same significance as id() of any other object... mostly, none, except  
> for debugging purposes.
> 
>> 3) Does the behaviour of ids for lists and tuples of the same element
>> (of type int, string and sometimes even float), imply that the tuple a
>> = (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
>> about a list, where elements can be changed at will?)
> 
> That's a different thing. A tuple maintains only references to its  
> elements (as any other object in Python). The memory required for a tuple  
> (I'm talking of CPython exclusively) is: (a small header) + n *  
> sizeof(pointer). So the expression 10000*(anything,) will take more space  
> than the singleton (anything,) because the former requires space for 10000  
> pointers and the latter just one.
> 
> You have to take into account the memory for the elements themselves; but  
> in both cases there is a *single* object referenced, so it doesn't matter.  
> Note that it doesn't matter whether that single element is an integer, a  
> string, mutable or immutable object: it's always the same object, already  
> existing, and creating that 10000-uple just increments its reference count  
> by 10000.
> 
> The situation is similar for lists, except that being mutable containers,  
> they're over-allocated (to have room for future expansion). So the list  
> [anything]*10000 has a size somewhat larger than 10000*sizeof(pointer);  
> its (only) element increments its reference count by 10000.
> 
In fact all you can in truth say is that

   a is  b --> a == b

The converse definitely not true.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/