[Python-Dev] Text about cycle GC

A.M. Kuchling akuchlin@mems-exchange.org
Tue, 27 Jun 2000 22:04:53 -0400


I've written some text for the What's New article on the GCing of
cycles, but it wasn't a topic I paid close attention to at the time,
so I'd like corrections.  Here's what I've got; please send me
comments privately. 

                       6. Optional Collection of Cycles

   The C implementation of Python uses reference counting to implement
   garbage collection. Every Python object maintains a count of the
   number of references pointing to itself, and adjusts the count as
   references are created or destroyed. Once the reference count reaches
   zero, the object is no longer accessible, since you need to have a
   reference to an object to access it, and if the count is zero, no
   references exist any longer.

   Reference counting has some pleasant properties: it's easy to
   understand and implement, and the resulting implementation is
   portable, fairly fast, and reacts well with other libraries that
   implement their own memory handling schemes. The major problem with
   reference counting is that it sometimes doesn't realise that objects
   are no longer accessible, resulting in a memory leak. This happens
   when there are cycles of references.

   Consider the simplest possible cycle, a class instance which has a
   reference to itself:

instance = SomeClass()
instance.myself = instance

   After the above two lines of code have been executed, the reference
   count of instance is 2; one reference is from the variable named
   "'instance'", and the other is from the "myself"attribute of the
   instance.

   If the next line of code is del instance, what happens? The reference
   count of instance is decreased by 1, so it has a reference count of 1;
   the reference in the "myself" attribute still exists. Yet the instance
   is no longer accessible through Python code, and it could be deleted.
   Several objects can participate in a cycle if they have references to
   each other, causing all of the objects to be leaked.

   An experimental step has been made toward fixing this problem. When
   compiling Python, the -with-cycle-gc (XXX correct option flag?) option
   can be specified. This causes a cycle detection algorithm to be
   periodically executed, which looks for inaccessible cycles and deletes
   the objects involved.

   Why isn't this enabled by default? Running the cycle detection
   algorithm takes some time, and some tuning will be required to
   minimize the overhead cost. It's not yet obvious how much performance
   is lost, because benchmarking this is tricky and depends sensitively
   on how often the program creates and destroys objects. XXX is this
   actually the correct reason? Or is it fear of breaking software that
   runs happily while leaving garbage?

   Several people worked on this problem. Early versions were written by
   XXX1, XXX2. (I vaguely remember several people writing first cuts at
   this. Anyone recall who?) The implementation that's in Python 1.6 is a
   rewritten version, this time done by Neil Schemenauer. Lots of other
   people offered suggestions along the way, such as (in alphabetical
   order) Marc-André Lemburg, Tim Peters, Greg Stein, Eric Tiedemann. 
   (XXX other names?  If you feel you should be added, e-mail me).  The
   March 2000 archives of the python-dev mailing list contain most of the
   relevant discussion, especially in the threads titled ``Reference
   cycle collection for Python'' and ``Finalization again''.