id functions of ints, floats and strings
Steve Holden
steve at holdenweb.com
Sat Apr 5 22:30:24 EDT 2008
Gabriel Genellina wrote:
> En Thu, 03 Apr 2008 19:27:47 -0300, <zillow10 at googlemail.com> escribió:
>
>> Hi all,
>>
>> I've been playing around with the identity function id() for different
>> types of objects, and I think I understand its behaviour when it comes
>> to objects like lists and tuples in which case an assignment r2 = r1
>> (r1 refers to an existing object) creates an alias r2 that refers to
>> the same object as r1. In this case id(r1) == id(r2) (or, if you
>> like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
>> 3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
>> etc. ...this is all very well. Therefore, it seems that id(r) can be
>> interpreted as the address of the object that 'r' refers to.
>>
>> My observations of its behaviour when comparing ints, floats and
>> strings have raised some questions in my mind, though. Consider the
>> following examples:
>>
>> #########################################################################
>>
>> # (1) turns out to be true
>> a = 10
>> b = 10
>> print a is b
>
> ...only because CPython happens to cache small integers and return always
> the same object. Try again with 10000. This is just an optimization and
> the actual range of cached integer, or whether they are cached at all, is
> implementation (and version) dependent.
> (As integers are immutable, the optimization *can* be done, but that
> doesn't mean that all immutable objects are always shared).
>
>> # (2) turns out to be false
>> f = 10.0
>> g = 10.0
>> print f is g
>
> Because the above optimization isn't used for floats.
> The `is` operator checks object identity: whether both operands are the
> very same object (*not* a copy, or being equal: the *same* object)
> ("identity" is a primitive concept)
> The only way to guarantee that you are talking of the same object, is
> using a reference to a previously created object. That is:
>
> a = some_arbitrary_object
> b = a
> assert a is b
>
> The name `b` now refers to the same object as name `a`; the assertion
> holds for whatever object it is.
>
> In other cases, like (1) and (2) above, the literals are just handy
> constructors for int and float objects. You have two objects constructed
> (a and b, f and g). Whether they are identical or not is not defined; they
> might be the same, or not, depending on unknown factors that might include
> the moon phase; both alternatives are valid Python.
>
>> # (3) checking if ids of all list elements are the same for different
>> cases:
>>
>> a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
>> b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
>> f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
>> g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
>> g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
>> False
>
> Again, this is implementation dependent. If you try with a different
> Python version or a different implementation you may get other results -
> and that doesn't mean that any of them is broken.
>
>> # (4) two equal floats defined inside a function body behave
>> differently than case (1):
>>
>> def func():
>> f = 10.0
>> g = 10.0
>> return f is g
>>
>> print func() # True
>
> Another implementation detail related to co_consts. You shouldn't rely on
> it.
>
>> I didn't mention any examples with strings; they behaved like ints
>> with respect to their id properties for all the cases I tried.
>
> You didn't try hard enough :)
>
> py> x = "abc"
> py> y = ''.join(x)
> py> x == y
> True
> py> x is y
> False
>
> Long strings behave like big integers: they aren't cached:
>
> py> x = "a rather long string, full of garbage. No, this isn't garbage,
> just non
> sense text to fill space."
> py> y = "a rather long string, full of garbage. No, this isn't garbage,
> just non
> sense text to fill space."
> py> x == y
> True
> py> x is y
> False
>
> As always: you have two statements constructing two objects. Whether they
> return the same object or not, it's not defined.
>
>> While I have no particular qualms about the behaviour, I have the
>> following questions:
>>
>> 1) Which of the above behaviours are reliable? For example, does a1 =
>> a2 for ints and strings always imply that a1 is a2?
>
> If you mean:
>
> a1 = something
> a2 = a1
> a1 is a2
>
> then, from my comments above, you should be able to answer: yes, always,
> not restricted to ints and strings.
>
> If you mean:
>
> a1 = someliteral
> a2 = someliteral
> a1 is a2
>
> then: no, it isn't guaranteed at all, nor even for small integers or
> strings.
>
>> 2) From the programmer's perspective, are ids of ints, floats and
>> string of any practical significance at all (since these types are
>> immutable)?
>
> The same significance as id() of any other object... mostly, none, except
> for debugging purposes.
>
>> 3) Does the behaviour of ids for lists and tuples of the same element
>> (of type int, string and sometimes even float), imply that the tuple a
>> = (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
>> about a list, where elements can be changed at will?)
>
> That's a different thing. A tuple maintains only references to its
> elements (as any other object in Python). The memory required for a tuple
> (I'm talking of CPython exclusively) is: (a small header) + n *
> sizeof(pointer). So the expression 10000*(anything,) will take more space
> than the singleton (anything,) because the former requires space for 10000
> pointers and the latter just one.
>
> You have to take into account the memory for the elements themselves; but
> in both cases there is a *single* object referenced, so it doesn't matter.
> Note that it doesn't matter whether that single element is an integer, a
> string, mutable or immutable object: it's always the same object, already
> existing, and creating that 10000-uple just increments its reference count
> by 10000.
>
> The situation is similar for lists, except that being mutable containers,
> they're over-allocated (to have room for future expansion). So the list
> [anything]*10000 has a size somewhat larger than 10000*sizeof(pointer);
> its (only) element increments its reference count by 10000.
>
In fact all you can in truth say is that
a is b --> a == b
The converse definitely not true.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-list
mailing list