garbage collection / cyclic references

Aaron Brady castironpi at gmail.com
Sat Mar 21 19:54:09 EDT 2009


On Mar 21, 1:04 pm, John Nagle <na... at animats.com> wrote:
> Aaron Brady wrote:
> > Hello,
>
> > I was reading and Googling about garbage collection, reference
> > counting, and the problem of cyclic references.
>
> > Python's garbage collection module claims to be able to detect and
> > break cyclic garbage.  Some other languages merely prohibit it.  Is
> > this the place to ask about its technique?
>
> > I understand that the disadvantage is a non-deterministic order of
> > deletion/finalization.  
>
>      Garbage collection and destructors or "finalizers" don't play well
> together.  It's a fundamental problem.  Calling finalizers from the
> garbage collector is painful.  It introduces concurrency where the
> user may not have expected it.  Consider what happens if a finalizer
> tries to lock something.  What if GC runs while that lock is locked?
> This can create a deadlock situation.  Calling finalizers from the
> garbage collector can result in intermittent, very hard to find bugs.

As I understand it, 'obj.decref()' can call 'other.decref()', which
can try to access its reference to 'obj', which has already begun
cleanup.  At that point, 'obj' is in an inconsistent state.  Its own
methods are secretly called during its '__del__'.

One step would be to serialize this process, so that 'other.decref()'
gets deferred until 'obj.decref()' has completed.  Then attributes are
in an all-or-nothing state, which is at least consistent.  However,
that means every external reference in a '__del__' method has to be
wrapped in an exception handler, one at a time, because the object /
might/ already be in a reference cycle.  (Non-container types are
excepted.)  The remaining reference merely needs to change its class
to a ReclaimedObject type.  That's acceptable if documented.  I also
believe it solves the potential for deadlock.

> (Look up "re-animation"
> in Microsoft Managed C++ literature.  It's not pretty.)

Pass!

>      Python actually has reference counting backed up by garbage collection.
> Most objects are destroyed as soon as all references to them disappear.
> Garbage collection is only needed to deal with cycles.
>
>      Python has "weak references", which won't keep an object around
> once all the regular references are deleted.  These are useful in
> some situations.  In a tree, for example, pointers towards the leaves
> should be strong pointers, while back-pointers towards the root should
> be weak pointers.
snip
>
>     Personally, I'd argue that the right answer is to prohibit cycles of
> strong pointers.  That should be considered a programming error, and
> detected at run time, at least by debugging tools.  With weak pointers,
> you don't really need cycles of strong pointers.

Reference cycles can be detected anyway with debug tools, even prior
to destruction.  My objection is that would complicate control flow
severely:

for x in y:
    z.append( x )

becomes:

for x in y:
    if cyclic_ref( x ):
        z.append( weakref.ref( x ) )
    else:
        z.append( x )

And worse, every attribute access has to be wrapped.

for x in z:
    if isinstance( x, __builtins__.weakref ):
        if x() is not None:
            print( x() )
    else:
        print( x )

In other words, it interferes with uniform access to attributes and
container members.  However, in the case where you know a structure a
priori, it's a good technique, as your example showed.  I observe that
my proposal has the same weakness!

If you make the case that you usually do know the structure your data
have, I won't be able to disprove it.  The example would come from a
peer-to-peer representation of something, or storage of relational
data.

Regardless, the group has responded to most of my original post.  I
don't think I emphasized however that I'm designing an allocation
system that can contain reference cycles; and I was asking if such
special methods, '__gc_cycle__( self, attr )' or '__gc_clear__
( self )' would be right for me.  I'm also interested in feedback
about the serialization method of ref. counting earlier in this post.

>     The advantage of this is a clean order of destruction.  This is useful
> in window widget systems, where you have objects with pointers going in many
> directions, yet object destruction has substantial side effects.
>
>     Python originally had only reference counting, and didn't have weak pointers.
> If weak pointers had gone in before the garbage collector, Python might have
> gone in this direction.
>
>                                 John Nagle




More information about the Python-list mailing list