id functions of ints, floats and strings
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Thu Apr 3 20:50:03 EDT 2008
En Thu, 03 Apr 2008 19:27:47 -0300, <zillow10 at googlemail.com> escribió:
> Hi all,
>
> I've been playing around with the identity function id() for different
> types of objects, and I think I understand its behaviour when it comes
> to objects like lists and tuples in which case an assignment r2 = r1
> (r1 refers to an existing object) creates an alias r2 that refers to
> the same object as r1. In this case id(r1) == id(r2) (or, if you
> like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
> 3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
> etc. ...this is all very well. Therefore, it seems that id(r) can be
> interpreted as the address of the object that 'r' refers to.
>
> My observations of its behaviour when comparing ints, floats and
> strings have raised some questions in my mind, though. Consider the
> following examples:
>
> #########################################################################
>
> # (1) turns out to be true
> a = 10
> b = 10
> print a is b
...only because CPython happens to cache small integers and return always
the same object. Try again with 10000. This is just an optimization and
the actual range of cached integer, or whether they are cached at all, is
implementation (and version) dependent.
(As integers are immutable, the optimization *can* be done, but that
doesn't mean that all immutable objects are always shared).
> # (2) turns out to be false
> f = 10.0
> g = 10.0
> print f is g
Because the above optimization isn't used for floats.
The `is` operator checks object identity: whether both operands are the
very same object (*not* a copy, or being equal: the *same* object)
("identity" is a primitive concept)
The only way to guarantee that you are talking of the same object, is
using a reference to a previously created object. That is:
a = some_arbitrary_object
b = a
assert a is b
The name `b` now refers to the same object as name `a`; the assertion
holds for whatever object it is.
In other cases, like (1) and (2) above, the literals are just handy
constructors for int and float objects. You have two objects constructed
(a and b, f and g). Whether they are identical or not is not defined; they
might be the same, or not, depending on unknown factors that might include
the moon phase; both alternatives are valid Python.
> # (3) checking if ids of all list elements are the same for different
> cases:
>
> a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
> b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
> f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
> g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
> g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
> False
Again, this is implementation dependent. If you try with a different
Python version or a different implementation you may get other results -
and that doesn't mean that any of them is broken.
> # (4) two equal floats defined inside a function body behave
> differently than case (1):
>
> def func():
> f = 10.0
> g = 10.0
> return f is g
>
> print func() # True
Another implementation detail related to co_consts. You shouldn't rely on
it.
> I didn't mention any examples with strings; they behaved like ints
> with respect to their id properties for all the cases I tried.
You didn't try hard enough :)
py> x = "abc"
py> y = ''.join(x)
py> x == y
True
py> x is y
False
Long strings behave like big integers: they aren't cached:
py> x = "a rather long string, full of garbage. No, this isn't garbage,
just non
sense text to fill space."
py> y = "a rather long string, full of garbage. No, this isn't garbage,
just non
sense text to fill space."
py> x == y
True
py> x is y
False
As always: you have two statements constructing two objects. Whether they
return the same object or not, it's not defined.
> While I have no particular qualms about the behaviour, I have the
> following questions:
>
> 1) Which of the above behaviours are reliable? For example, does a1 =
> a2 for ints and strings always imply that a1 is a2?
If you mean:
a1 = something
a2 = a1
a1 is a2
then, from my comments above, you should be able to answer: yes, always,
not restricted to ints and strings.
If you mean:
a1 = someliteral
a2 = someliteral
a1 is a2
then: no, it isn't guaranteed at all, nor even for small integers or
strings.
> 2) From the programmer's perspective, are ids of ints, floats and
> string of any practical significance at all (since these types are
> immutable)?
The same significance as id() of any other object... mostly, none, except
for debugging purposes.
> 3) Does the behaviour of ids for lists and tuples of the same element
> (of type int, string and sometimes even float), imply that the tuple a
> = (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
> about a list, where elements can be changed at will?)
That's a different thing. A tuple maintains only references to its
elements (as any other object in Python). The memory required for a tuple
(I'm talking of CPython exclusively) is: (a small header) + n *
sizeof(pointer). So the expression 10000*(anything,) will take more space
than the singleton (anything,) because the former requires space for 10000
pointers and the latter just one.
You have to take into account the memory for the elements themselves; but
in both cases there is a *single* object referenced, so it doesn't matter.
Note that it doesn't matter whether that single element is an integer, a
string, mutable or immutable object: it's always the same object, already
existing, and creating that 10000-uple just increments its reference count
by 10000.
The situation is similar for lists, except that being mutable containers,
they're over-allocated (to have room for future expansion). So the list
[anything]*10000 has a size somewhat larger than 10000*sizeof(pointer);
its (only) element increments its reference count by 10000.
--
Gabriel Genellina
More information about the Python-list
mailing list