[Python-Dev] Text about cycle GC
A.M. Kuchling
akuchlin@mems-exchange.org
Tue, 27 Jun 2000 22:04:53 -0400
I've written some text for the What's New article on the GCing of
cycles, but it wasn't a topic I paid close attention to at the time,
so I'd like corrections. Here's what I've got; please send me
comments privately.
6. Optional Collection of Cycles
The C implementation of Python uses reference counting to implement
garbage collection. Every Python object maintains a count of the
number of references pointing to itself, and adjusts the count as
references are created or destroyed. Once the reference count reaches
zero, the object is no longer accessible, since you need to have a
reference to an object to access it, and if the count is zero, no
references exist any longer.
Reference counting has some pleasant properties: it's easy to
understand and implement, and the resulting implementation is
portable, fairly fast, and reacts well with other libraries that
implement their own memory handling schemes. The major problem with
reference counting is that it sometimes doesn't realise that objects
are no longer accessible, resulting in a memory leak. This happens
when there are cycles of references.
Consider the simplest possible cycle, a class instance which has a
reference to itself:
instance = SomeClass()
instance.myself = instance
After the above two lines of code have been executed, the reference
count of instance is 2; one reference is from the variable named
"'instance'", and the other is from the "myself"attribute of the
instance.
If the next line of code is del instance, what happens? The reference
count of instance is decreased by 1, so it has a reference count of 1;
the reference in the "myself" attribute still exists. Yet the instance
is no longer accessible through Python code, and it could be deleted.
Several objects can participate in a cycle if they have references to
each other, causing all of the objects to be leaked.
An experimental step has been made toward fixing this problem. When
compiling Python, the -with-cycle-gc (XXX correct option flag?) option
can be specified. This causes a cycle detection algorithm to be
periodically executed, which looks for inaccessible cycles and deletes
the objects involved.
Why isn't this enabled by default? Running the cycle detection
algorithm takes some time, and some tuning will be required to
minimize the overhead cost. It's not yet obvious how much performance
is lost, because benchmarking this is tricky and depends sensitively
on how often the program creates and destroys objects. XXX is this
actually the correct reason? Or is it fear of breaking software that
runs happily while leaving garbage?
Several people worked on this problem. Early versions were written by
XXX1, XXX2. (I vaguely remember several people writing first cuts at
this. Anyone recall who?) The implementation that's in Python 1.6 is a
rewritten version, this time done by Neil Schemenauer. Lots of other
people offered suggestions along the way, such as (in alphabetical
order) Marc-André Lemburg, Tim Peters, Greg Stein, Eric Tiedemann.
(XXX other names? If you feel you should be added, e-mail me). The
March 2000 archives of the python-dev mailing list contain most of the
relevant discussion, especially in the threads titled ``Reference
cycle collection for Python'' and ``Finalization again''.