[Python-Dev] subtype_dealloc needs rethinking

Tim Peters tim at zope.com
Thu Nov 13 13:03:19 EST 2003


We've got multiple segfault problems associated with weakref callbacks, and
multiple problems of that kind coming from subtype_dealloc alone.  Here's a
piece of test_weakref.py in my 2.4 checkout; the first part of the test got
fixed yesterday; the second part has not been checked in yet, because it
still fails (in a release build it corrupts memory and that may not be
visible; in a debug build it reliably segfaults, due to double
deallocation):

"""
    def test_sf_bug_840829(self):
        # "weakref callbacks and gc corrupt memory"
        # subtype_dealloc erroneously exposed a new-style instance
        # already in the process of getting deallocated to gc,
        # causing double-deallocation if the instance had a weakref
        # callback that triggered gc.
        # If the bug exists, there probably won't be an obvious symptom
        # in a release build.  In a debug build, a segfault will occur
        # when the second attempt to remove the instance from the "list
        # of all objects" occurs.

        import gc

        class C(object):
            pass

        c = C()
        wr = weakref.ref(c, lambda ignore: gc.collect())
        del c

        # There endeth the first part.  It gets worse.
        del wr

        c1 = C()
        c1.i = C()
        wr = weakref.ref(c1.i, lambda ignore: gc.collect())

        c2 = C()
        c2.c1 = c1
        del c1  # still alive because c2 points to it

        # Now when subtype_dealloc gets called on c2, it's not enough just
        # that c2 is immune from gc while the weakref callbacks associated
        # with c2 execute (there are none in this 2nd half of the test,
btw).
        # subtype_dealloc goes on to call the base classes' deallocs too,
        # so any gc triggered by weakref callbacks associated with anything
        # torn down by a base class dealloc can also trigger double
        # deallocation of c2.
        del c2
"""

There are two identifiable (so far) problems in subtype_dealloc (note that
these have nothing to do with Jim's current woes -- those are a different
problem with weakref callbacks, and he hasn't yet hit the problems I'm
talking about here -- but he will, eventually).

1. A weakref callback can resurrect self, but the code isn't aware of
   that now.  It's not *easy* to resurrect self, and we probably thought
   it wasn't possible, but it is:  if self is in a dead cycle, and the
   weakref callback invokes a method of an object in that cycle, self
   is visible to the callback (following the cycle links), and so self
   can get resurrected by the callback.  The callback doesn't have to
   specifically try to resurrect self, it can happen as a side effect of
   resurrecting anything in the cycle from which self is reachable.

2. Unlike other dealloc routines, subtype_delloc leaves the object,
   with refcnt 0, tracked by gc.  That's the cause of the now seemingly
   endless sequence of ways to provoke double deallocation:  when a
   weakref callback is invoked at any time while subtype_dealloc is
   executing (whether the callback is associated with self, or with
   anything that dies as a result of any base class cleanup calls), and
   if gc happens to trigger while the callback is executing, and self
   happens to be in a generation gc is collecting, then the tracked
   refcount=0 self looks like garbage to gc, so gc does

      incref
      call tp_clear
      decref

   on it, and the decref knocks the refcount back down to 0 again
   thus triggering another deallocation (while the original deallocation
   is still in progress).

To avoid #2, one of these two must be true:

A. self is untracked at the time gc happens.

B. self has a refcount > 0 at the time gc happens (e.g., the usual
   "temporarily resurrect" trick).

I checked in a 0-byte change yesterday that repaired the first half of the
test case, using #A (I simply moved the line that retracks self below the
*immediate* weakref callback).  But that same approach can't work for the
rest of subtype_dealloc, for reasons you explained in a comment at the end
of the function.

Doing something of the #B flavor appears so far to work (meaning it fixes
the rest of the test case, and hasn't triggered a new problem yet):

"""
Index: typeobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/typeobject.c,v
retrieving revision 2.251
diff -u -r2.251 typeobject.c
--- typeobject.c        12 Nov 2003 20:43:28 -0000      2.251
+++ typeobject.c        13 Nov 2003 17:57:07 -0000
@@ -667,6 +667,17 @@
                        goto endlabel;
        }

+       /* We're still not out of the woods:  anything torn down by slots
+        * or a base class dealloc may also trigger gc in a weakref
callback.
+        * For reasons explained at the end of the function, we have to
+        * keep self tracked now.  The only other way to make gc harmless
+        * is to temporarily resurrect self.  We couldn't do that before
+        * calling PyObject_ClearWeakRefs because that function raises
+        * an exception if its argument doesn't have a refcount of 0.
+        */
+       assert(self->ob_refcnt == 0);
+       self->ob_refcnt = 1;
+
        /*  Clear slots up to the nearest base with a different tp_dealloc
*/
        base = type;
        while ((basedealloc = base->tp_dealloc) == subtype_dealloc) {
@@ -693,6 +704,8 @@
                _PyObject_GC_UNTRACK(self);

        /* Call the base tp_dealloc() */
+       assert(self->ob_refcnt == 1);
+       self->ob_refcnt = 0;
        assert(basedealloc);
        basedealloc(self);
"""

I'm not sure those asserts *can't* trigger, though (well, actually, I'm sure
they can, if a weakref callback resurrects self -- but that's a different
problem), and the code is getting <heh> obscure.  Maybe that comes with the
territory.  So fresh eyeballs would help.

The problems with resurrection are related to Jim's problem, in that
tp_clear can leave behind insane objects, and those can kill us whether a
callback provokes the insanity directly (as in Jim's case), or a resurrected
insane object gets provoked sometime later.  I sketched a different scheme
for solving those in a long msg yesterday (it doesn't involve
subtype_dealloc; it involves changing gc to be much more aware of the
problems weakref callbacks can create).




More information about the Python-Dev mailing list