Question About When Objects Are Destroyed

Steve D'Aprano steve+python at pearwood.info
Fri Aug 4 21:19:09 EDT 2017


On Sat, 5 Aug 2017 09:11 am, Jon Forrest wrote:

> Consider the following Python shell session (Python 3.6.2, Win64):
> 
>  >>> def givemetwo():
> ...         x = 'two'
> ...         print(id(x))
> ...
>  >>> givemetwo()
> 1578505988392
> 
> So far fine. My understanding of object existence made me
> think that the object referred to by x would be deleted when
> the givemetwo() function returned, like a local variable in C.

Not necessarily.

Objects are destroyed when they are no longer referenced by any other object.
That may happen when the function exits, but it may not. For example, if you
return x, and the caller assigns it to a name, then the object will still be
referenced.

However, when you exit the function, what is guaranteed is that all local
variables will go out of scope and clear *their* references to whatever objects
they are bound to. Not necessarily *all* references, but just the ones from
local variables.

Consider the object "spam and eggs", a string. If I say:

s = "spam and eggs"  # first reference

def func():
    t = s  # t is a local variable, so now two refs
    u = t  # third ref
    return None

func()

While func() is executing, there are three references to the object: s, a
global, and t and u, locals. When the function exits, the *names* (variables) t
and u go out of scope and those references cease to exist, but the s reference
still exists and so the object (string "spam and eggs") still exists.

If you then re-bind the name s to something else:

s = "foo bar"

or delete the name:

del s

that will remove the last reference to the object and it can be garbage
collected.


> However, this isn't true, as shown by the following in the same
> session:
> 
>  >>> import ctypes
>  >>> print (ctypes.cast(1578505988392, ctypes.py_object).value)
> two
> 
> This shows that the object still exists, which was a surprise.

You may be right about the object still existing, but for the wrong reasons.

The Python interpreter is free to cache objects that it thinks have a good
chance of being re-created. This is obviously implementation dependent: it will
depend on the specific interpreter (CPython, Jython, IronPython, PyPy,
Stackless, Nuika), the specific version, and potentially any other factor the
interpreter wishes to take into account, up to and including the phase of the
moon.

In this case, CPython caches short strings that look like identifiers. It does
this because variables are implemented as string keys in dicts, so when you
create a variable

two = 2

the interpreter creates a string object 'two' to use as a key in the globals()
dict. Since object creation is relatively costly, the interpreter caches that
string and keeps it around.

So your test has accidentally hit an implementation-dependent feature of
CPython. If you had used a string that didn't look like an identifier, say

"two not three, okay?"

you may have seen different results.

Or maybe not. This is all implementation dependent.

By using ctypes, you are poking around in the internals of the Python
interpreter, you aren't looking at what *Python the language* guarantees, but
merely whatever this specific version of CPython happens to do, today.

For example, you are already on shaky ground by using the ID of an object, which
is an opaque integer, as if it were a memory address. Object IDs aren't memory
addresses, they are opaque integers.

It just happens that for speed, the CPython interpreter uses the address of the
object as its ID. But it can only do that because the CPython garbage collector
is very simple, and it never relocates objects from place to place. But Jython
and IronPython have more sophisticated garbage collectors which do, so they use
a different scheme for generating IDs which are consecutive integers.

So... you've effectively grabbed an arbitrary address in memory, which may have
previously contained a certain string object. You use ctypes to interpret that
chunk of memory as an object. Since CPython doesn't move memory around, you
might be okay:

- either that address actually does point to a live object, and you're safe;

- or it points to what *was* a live object, but the memory hasn't been used,
  and so all the fields are still correctly allocated, and you're safe;

but you might not be. What if some other object has re-used that piece of
memory? You might now be jumping halfway into some other object, and trying to
interpret that as the start of an object.

You can segfault CPython with ctypes.


> Will this object ever be deleted?

The specific object 'two'? Maybe, maybe not. It might be cached for the lifetime
of this interpreter session. Or there may be circumstances where cached objects
age-out and are deleted. It depends on the implementation of the cache. That
isn't a question about Python the language.


> I'm learning about function 
> decorators which, as my early studies tell me, depend on calling
> a function defined inside another function. This suggests that
> objects defined inside functions are never deleted, otherwise
> function decorators would never work. (I'm not 100% sure
> my understanding of function decorators is correct since I'm
> still learning about them).

Nope, that's wrong. Objects defined inside functions will be garbage collected
the same as objects defined outside of functions: when there are no references
to them left.

Decorators work because they create long-lasting references to the object. But
if you delete those references, say by re-binding their name to something else,
then the objects will be deleted same as anything else.


> What's the right way to think about this?

99% of the time, the right way to think about this is not to.

Python has a garbage collector so you don't have to worry about object
lifetimes. So long as something refers to an object, it will be there if you
need it. And when nothing refers to it, as far as you are concerned it will be
garbage collected.

There are a few tricky corner cases, such as:

- interpreter caches (short version: "Ignore them")

- object destructor method __del__ (short version: "Don't use it")

- reference cycles (short version: "Don't worry, they're taken care of")

but even there, 99% of the time you just don't need to care.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list