[Python-Dev] Re: weakref gc semantics

Tim Peters tim.peters at gmail.com
Thu Nov 4 06:25:32 CET 2004


[Tim]
> This example bothers me a little, under current Python CVS:
>
> """
> import gc, weakref
> 
> def C_gone(ignored):
>    print "An object of type C went away."
>
> class C:
>    def __init__(self):
>        self.wr = weakref.ref(self, C_gone)
>
> print "creating non-cyclic C instance"
> c = C()
> 
> print "creating cyclic C instance and destroying old one"
> c = C()  # triggers the print
> c.loop = c
> 
> print "getting rid of the cyclic one"
> c = None  # doesn't trigger the print
> gc.collect()  # ditto
> """

Recapping, it bothers me because the callback doesn't trigger now when
c is part of cyclic trash, but if c.wr were an instance with a __del__
method instead, the __del__ method *would* trigger.

There are (at least) two easy-to-code and pretty-easy-to-explain ways
to get that callback invoked safely.

One I sketched last time, and is in the attached patch-callback.txt. 
(Patches here just have gcmodule.c code changes, no corresponding
changes to comments.)  The difference is that it invokes a callback on
a weakref to trash iff the callback is reachable, instead of invoking
a callback on a weakref to trash iff the weakref it's attached to is
reachable.  All the tests we've accumulated still pass with this
change, and the example above does trigger a second print.

Another approach is in the attached patch-finalizer.txt.  This is more
(I think) like what Neil was aiming at last week, and amounts to
changing has_finalizer() to return true when it sees a weakref with a
callback.  Then the weakref, and everything reachable from it, gets
moved to the reachable set.  After that, no callback can access an
unreachable object directly, so callbacks are safe to call on that
count.  We still (and in the other patch too) clear all weakrefs to
trash before breaking cycles, so callbacks can't resurrect trash via
active weakrefs either.  But in the patch-finalizer case, the
callbacks attached to weakrefs moved to the reachable set no longer
get invoked "artificially" (forced by gc) -- they trigger iff their
referents eventually go away.

Some current tests fail with the patch-finalizer approach, but for
shallow reasons:  fewer callbacks get invoked, and some of the tests
currently use "global" (reachable) weakref callbacks as a way to
verify that gc really did clean up cyclic trash.  Those stop getting
invoked.  Some of the tests use bound method objects as callbacks, and
when a weakref contains a callback that's a bound method object, and
the weakref gets moved to the reachable set, its callback gets moved
there too, and so the instance wrt which the callback is a bound
method becomes reachable too, and so the class of that instance also
becomes reachable.  As a result, gc doesn't get rid of any of that
stuff, so the "external" callbacks never trigger.  Instead, the
weakrefs with the bound-method callbacks end up in gc.garbage.

I don't like that, because if it's possible to get rid of __del__
methods, I'd really like to see gc.garbage go away too.  Nothing we've
done with weakrefs so far can add anything to gc.garbage, so I like
them better on that count.  Pragmatically, sticking a weakref in
gc.garbage isn't of much use to the end-user now either, because a
weakref's callback is neither clearable nor even discoverable now at
the Python level.  But that could be changed too.

What I'm missing is a pile of practical <heh> use cases to distinguish these.

Let's look at weak value dicts, for reasons that will become clear. 
An entry in a WVD d now is of the form

   key: W

where W is an instance of a subclass of weakref.ref, and W has the
associated key as an attribute.  Every value in d shares the same
callback, set up by WVD.__init__():

   def __init__(self, *args, **kw):
       ...
       def remove(wr, selfref=ref(self)):
           self = selfref()
           if self is not None:
               del self.data[wr.key]
       self._remove = remove

So the callback has a weakref to d, and when it's invoked extracts the
specific key from the weakref instance and uses that to delete the
appropriate

   key: W

entry from d via the shared callback's own weakref to d.

Now it's not clear *why* the callback has a weakref to d.  It *could*
have been written like this instead:

       def remove(wr):
           del self.data[wr.key]

In fact, at one time it was written in that kind of way.  Rev 1.13 of
weakref.py changed it to use a weakref to self, and the motivations in
the patch that started this are feeble:

   http://www.python.org/sf/417795

"It's bad to rely on the garbage collector" and "this could lead to
uncollectable cycles if a subclass of Weak*Dictionary defines
__del__".  The proper responses were "no, it isn't" and "insane code
deserves whatever it gets" <0.5 wink>.

Anyway, if WVD *had* been left simple(r), it would be relevant to the
choices above:  if WVD d became part of cyclic trash, patch-finalizer
would move all its weakrefs to the reachable set, and then d itself
would be reachable too (via the shared callback's strong reference to
self).  As a result, none of the callbacks would trigger, and d would
"leak" (it wouldn't get cleaned up, and wouldn't show up in gc.garbage
either -- it would end up in an older generation, clogging gc forever
after because it would remain forever after reachable from its
(resurrected) weakrefs in gc.garbage).  Under patch-callback, d would
still go away (the simplest way to understand why is just that d *is*
in cyclic trash, and patch-callback never moves anything to the
reachable set because of weakrefs or weakref callbacks).

The *current* implementation of WVD doesn't have this problem under
patch-finalizer.  If d is part of cyclic trash, all its weakrefs get
moved to the reachable set, but d isn't directly reachable from their
shared callback, so d is not moved to the reachable set.  d's
tp_clear() gets invoked by gc in its cycle-breaking phase, and that
decrefs all of d's weakrefs that were moved to the unreachable set,
and they go away then (without triggering their callbacks) via the
normal refcount route.

The point is that this is all pretty subtle, and it bothers me that
"the obvious" way to implement WVD (the pre-patch-417795 way) would
lead to a worse outcome under patch-finalizer, so I also fear that
Python's current weakref users may be doing similar things now that
would also suffer under patch-finalizer.

OTOH, there's something conceptually nice about patch-finalizer -- it
is the easiest approach to explain (well, except for approaches that
allow segfaults to occur ...).  On the third hand, gc.garbage appears
poorly understood by users now, and patch-finalizer is actually easy
to explain only to those who already understand gc.garbage.
-------------- next part --------------
Index: Modules/gcmodule.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/gcmodule.c,v
retrieving revision 2.80
diff -u -r2.80 gcmodule.c
--- Modules/gcmodule.c	1 Nov 2004 16:39:57 -0000	2.80
+++ Modules/gcmodule.c	4 Nov 2004 03:37:50 -0000
@@ -566,9 +566,8 @@
 	 * to imagine how calling it later could create a problem for us.  wr
 	 * is moved to wrcb_to_call in this case.
 	 */
-	 		if (IS_TENTATIVELY_UNREACHABLE(wr))
+	 		if (IS_TENTATIVELY_UNREACHABLE(wr->wr_callback))
 	 			continue;
-			assert(IS_REACHABLE(wr));
 
 			/* Create a new reference so that wr can't go away
 			 * before we can process it again.
@@ -577,9 +576,8 @@
 
 			/* Move wr to wrcb_to_call, for the next pass. */
 			wrasgc = AS_GC(wr);
-			assert(wrasgc != next); /* wrasgc is reachable, but
-			                           next isn't, so they can't
-			                           be the same */
+			if (wrasgc == next)
+				next = wrasgc->gc.gc_next;
 			gc_list_move(wrasgc, &wrcb_to_call);
 		}
 	}
@@ -593,7 +591,6 @@
 
 		gc = wrcb_to_call.gc.gc_next;
 		op = FROM_GC(gc);
-		assert(IS_REACHABLE(op));
 		assert(PyWeakref_Check(op));
 		wr = (PyWeakReference *)op;
 		callback = wr->wr_callback;
@@ -621,6 +618,7 @@
 		if (wrcb_to_call.gc.gc_next == gc) {
 			/* object is still alive -- move it */
 			gc_list_move(gc, old);
+			gc->gc.gc_refs = GC_REACHABLE;
 		}
 		else
 			++num_freed;












-------------- next part --------------
Index: Modules/gcmodule.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/gcmodule.c,v
retrieving revision 2.80
diff -u -r2.80 gcmodule.c
--- Modules/gcmodule.c	1 Nov 2004 16:39:57 -0000	2.80
+++ Modules/gcmodule.c	4 Nov 2004 03:27:10 -0000
@@ -413,6 +413,9 @@
 		assert(delstr != NULL);
 		return _PyInstance_Lookup(op, delstr) != NULL;
 	}
+	else if (PyWeakref_Check(op) &&
+			((PyWeakReference *)op)->wr_callback != NULL)
+		return 1;
 	else if (PyType_HasFeature(op->ob_type, Py_TPFLAGS_HEAPTYPE))
 		return op->ob_type->tp_del != NULL;
 	else
@@ -566,10 +569,6 @@
 	 * to imagine how calling it later could create a problem for us.  wr
 	 * is moved to wrcb_to_call in this case.
 	 */
-	 		if (IS_TENTATIVELY_UNREACHABLE(wr))
-	 			continue;
-			assert(IS_REACHABLE(wr));
-
 			/* Create a new reference so that wr can't go away
 			 * before we can process it again.
 			 */














More information about the Python-Dev mailing list