Reference Tracking

Alex Martelli aleax at aleax.it
Tue Apr 15 03:17:23 EDT 2003


Ganesan R wrote:

>>>>>> "Alex" == Alex Martelli <aleax at aleax.it> writes:
> 
>> Note that any Python function call always passes VALUES -- i.e.
>> there is no difference between your call to sys.getrefcount in
>> the above snippet and one that directly does sys.getrefcount(1).
>> In either case, it's the VALUE (1) that you're asking info about;
>> it makes no difference HOW you obtain the value to pass (I guess
>> this is probably already clear to you, but, just in case...).
> 
> Then, I find this confusing
> 
> 
===========================================================================
> Python 2.2.1 (#1, Jul 29 2002, 23:15:49)
> [GCC 2.95.4 20011002 (Debian prerelease)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import gc
>>>> a = "hello world"
>>>> gc.get_referrers(a)
> [{'__builtins__': <module '__builtin__' (built-in)>, '__name__':
> [{'__main__', 'gc': <module 'gc' (built-in)>, '__doc__': None, 'a': 'hello
> [{world'}]
>>>> b = "hello world"
>>>> gc.get_referrers(a)
> [{'a': 'hello world', 'b': 'hello world', 'gc': <module 'gc' (built-in)>,
> [{'__builtins__': <module '__builtin__' (built-in)>, '__name__':
> [{'__main__', '__doc__': None}]
>>>> gc.get_referrers("hello world")
> [('hello world', None)]
>>>> 
> 
===========================================================================
> 
> Why am I getting a different list for the last call?

Because you're asking about a distinct (even though equal) value,
AKA object.  Python MAY but doesn't HAVE TO reuse the same object
when you use several literals indicating equal immutables -- in
practice, currently, it does for small integers, but (in Python
2.2) not for such strings as you used here.  Python can always
change this in future releases, since this is strictly an issue of
implementation -- optimizing one way or another, no more than that.

See:

[alex at lancelot ox]$ python2.2
Python 2.2.2 (#1, Oct 24 2002, 11:43:01)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a='hello world'
>>> b='hello world'
>>> id(a), id(b), id('hello world')
(135683376, 135688960, 135689328)
>>>

[alex at lancelot ox]$ python2.3
Python 2.3a2+ (#11, Mar 28 2003, 12:17:11)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a='hello world'
>>> b='hello world'
>>> id(a), id(b), id('hello world')
(1076642784, 1076642864, 1076643144)
>>>

In 2.2, we see Python using different copies of the literal string
value in all three cases.  In 2.3, we see indications of a different
strategy -- one value (object, identity, copy, call it as you wish)
is being used for the first two occurrences, a distinct one for the
third occurrence.  Strictly issues of optimization strategies on
the part of different Python implementations, no more.


Keep in mind that EQUALITY and IDENTITY are different concepts.  For
immutables, equality is what you generally care about -- identity is
shifty and implementation-dependent since such optimizations are
permissible.  Thus, you shouldn't test "a is b" nor "a is 'hello world'",
but rather you should use the == (equality check) operator in lieu
of the is (identity check) operator for such tests.

References are to specific values (specific objects, specific identities,
specific copies, call them as you wish), thus all considerations about
reference tracking need to keep this concept in mind.


One last indication (not playing a role here, but you're sure to run
into this as you play with id and the like...): id(...) is unique
as long as an object exists, but as soon as the object is destroyed,
its identity may be reused.  So, don't get tricked by such artefacts
as the following:

[alex at lancelot ox]$ python2.2
Python 2.2.2 (#1, Oct 24 2002, 11:43:01)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> id('hello world')
135683352
>>> id('hello world')
135682152
>>> id('hello world')
135683192
>>> id('hello world')
135683352
>>> id('hello world')
135682736
>>>

it may SEEM like the first and fourth literals use the same value, distinct
from the 2nd and 3rd and 5th -- but they don't -- the first one has been
destroyed, and happens to get recycled, at the time you create the 4th (in
other builds you'll no doubt get different results for this -- in 2.2, the
allocator used is more system-dependent than in 2.3).


Morals: don't care too deeply about identity of immutables...


Alex





More information about the Python-list mailing list