[Python-checkins] peps: New PEP: Safe object finalization

antoine.pitrou python-checkins at python.org
Fri May 17 23:20:21 CEST 2013


http://hg.python.org/peps/rev/0763444c74b3
changeset:   4890:0763444c74b3
user:        Antoine Pitrou <solipsis at pitrou.net>
date:        Fri May 17 23:17:31 2013 +0200
summary:
  New PEP: Safe object finalization

files:
  pep-0442.txt |  284 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 284 insertions(+), 0 deletions(-)


diff --git a/pep-0442.txt b/pep-0442.txt
new file mode 100644
--- /dev/null
+++ b/pep-0442.txt
@@ -0,0 +1,284 @@
+PEP: 442
+Title: Safe object finalization
+Version: $Revision$
+Last-Modified: $Date$
+Author: Antoine Pitrou <solipsis at pitrou.net>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 2013-04-18
+Python-Version: 3.4
+Post-History:
+Resolution: TBD
+
+
+Abstract
+========
+
+This PEP proposes to deal with the current limitations of object
+finalization.  The goal is to be able to define and run finalizers
+for any object, regardless of their position in the object graph.
+
+This PEP doesn't call for any change in Python code.  Objects
+with existing finalizers will benefit automatically.
+
+
+Definitions
+===========
+
+Reference
+    A directional link from an object to another.  The target of the
+    reference is kept alive by the reference, as long as the source is
+    itself alive and the reference isn't cleared.
+
+Weak reference
+    A directional link from an object to another, which doesn't keep
+    alive its target.  This PEP focusses on non-weak references.
+
+Reference cycle
+    A cyclic subgraph of directional links between objects, which keeps
+    those objects from being collected in a pure reference-counting
+    scheme.
+
+Cyclic isolate (CI)
+    A reference cycle in which no object is referenced from outside the
+    cycle *and* whose objects are still in a usable, non-broken state:
+    they can access each other from their respective finalizers.
+
+Cyclic garbage collector (GC)
+    A device able to detect cyclic isolates and turn them into cyclic
+    trash.  Objects in cyclic trash are eventually disposed of by
+    the natural effect of the references being cleared and their
+    reference counts dropping to zero.
+
+Cyclic trash (CT)
+    A reference cycle, or former reference cycle, in which no object
+    is referenced from outside the cycle *and* whose objects have
+    started being cleared by the GC.  Objects in cyclic trash are potential
+    zombies; if they are accessed by Python code, the symptoms can vary
+    from weird AttributeErrors to crashes.
+
+Zombie / broken object
+    An object part of cyclic trash.  The term stresses that the object
+    is not safe: its outgoing references may have been cleared, or one
+    of the objects it references may be zombie.  Therefore,
+    it should not be accessed by arbitrary code (such as finalizers).
+
+Finalizer
+    A function or method called when an object is intended to be
+    disposed of.  The finalizer can access the object and release any
+    resource held by the object (for example mutexes or file descriptors).
+    An example is a ``__del__`` method.
+
+Resurrection
+    The process by which a finalizer creates a new reference to an
+    object in a CI.  This can happen as a quirky but supported side-effect
+    of ``__del__`` methods.
+
+
+Impact
+======
+
+While this PEP discusses CPython-specific implementation details, the
+change in finalization semantics is expected to affect the Python
+ecosystem as a whole.  In particular, this PEP obsoletes the current
+guideline that "objects with a __del__ method should not be part of a
+reference cycle".
+
+
+Benefits
+========
+
+The primary benefits of this PEP regard objects with finalizers, such
+as objects with a ``__del__`` method and generators with a ``finally``
+block.  Those objects can now be reclaimed when they are part of a
+reference cycle.
+
+The PEP also paves the way for further benefits:
+
+* The module shutdown procedure may not need to set global variables to
+  None anymore.  This could solve a well-known class of irritating issues.
+
+The PEP doesn't change the semantics of:
+
+* Weak references caught in reference cycles.
+
+* C extension types with a custom ``tp_dealloc`` function.
+
+
+Description
+===========
+
+Reference-counted disposal
+--------------------------
+
+In normal reference-counted disposal, an object's finalizer is called
+just before the object is deallocated.  If the finalizer resurrects
+the object, deallocation is aborted.
+
+*However*, if the object was already finalized, then the finalizer isn't
+called.  This prevents us from finalizing zombies (see below).
+
+Disposal of cyclic isolates
+---------------------------
+
+Cyclic isolates are first detected by the garbage collector, and then
+disposed of.  The detection phase doesn't change and won't be described here.
+Disposal of a CI traditionally works in the following order:
+
+1. Weakrefs to CI objects are cleared, and their callbacks called. At this
+   point, the objects are still safe to use.
+
+2. The CI becomes a CT as the GC systematically breaks all
+   known references inside it (using the ``tp_clear`` function).
+
+3. Nothing.  All CT objects should have been disposed of in step 2
+   (as a side-effect of clearing references); this collection is finished.
+
+This PEP proposes to turn CI disposal into the following sequence (new
+steps are in bold):
+
+1. Weakrefs to CI objects are cleared, and their callbacks called. At this
+   point, the objects are still safe to use.
+
+2. **The finalizers of all CI objects are called.**
+
+3. **The CI is traversed again to determine if it is still isolated.
+   If it is determined that at least one object in CI is now reachable
+   from outside the CI, this collection is aborted and the whole CI
+   is resurrected.  Otherwise, proceed.**
+
+4. The CI becomes a CT as the GC systematically breaks all
+   known references inside it (using the ``tp_clear`` function).
+
+5. Nothing.  All CT objects should have been disposed of in step 4
+   (as a side-effect of clearing references); this collection is finished.
+
+
+C-level changes
+===============
+
+Type objects get a new ``tp_finalize`` slot to which ``__del__`` methods
+are bound.  Generators are also modified to use this slot, rather than
+``tp_del``.  At the C level, a ``tp_finalize`` function is a normal
+function which will be called with a regular, alive object as its only
+argument.  It should not attempt to revive or collect the object.
+
+For compatibility, ``tp_del`` is kept in the type structure.  Handling
+of objects with a non-NULL ``tp_del`` is unchanged: when part of a CI,
+they are not finalized and end up in ``gc.garbage``.  However, a non-NULL
+``tp_del`` is not encountered anymore in the CPython source tree (except
+for testing purposes).
+
+
+Discussion
+==========
+
+Predictability
+--------------
+
+Following this scheme, an object's finalizer is always called exactly
+once.  The only exception is if an object is resurrected: the finalizer
+will be called again later.
+
+For CI objects, the order in which finalizers are called (step 2 above)
+is undefined.
+
+Safety
+------
+
+It is important to explain why the proposed change is safe.  There
+are two aspects to be discussed:
+
+* Can a finalizer access zombie objects (including the object being
+  finalized)?
+
+* What happens if a finalizer mutates the object graph so as to impact
+  the CI?
+
+Let's discuss the first issue.  We will divide possible cases in two
+categories:
+
+* If the object being finalized is part of the CI: by construction, no
+  objects in CI are zombies yet, since CI finalizers are called before
+  any reference breaking is done.  Therefore, the finalizer cannot
+  access zombie objects, which don't exist.
+
+* If the object being finalized is not part of the CI/CT: by definition,
+  objects in the CI/CT don't have any references pointing to them from
+  outside the CI/CT.  Therefore, the finalizer cannot reach any zombie
+  object (that is, even if the object being finalized was itself
+  referenced from a zombie object).
+
+Now for the second issue.  There are three potential cases:
+
+* The finalizer clears an existing reference to a CI object.  The CI
+  object may be disposed of before the GC tries to break it, which
+  is fine (the GC simply has to be aware of this possibility).
+
+* The finalizer creates a new reference to a CI object.  This can only
+  happen from a CI object's finalizer (see above why).  Therefore, the
+  new reference will be detected by the GC after all CI finalizers are
+  called (step 3 above), and collection will be aborted without any
+  objects being broken.
+
+* The finalizer clears or creates a reference to a non-CI object.  By
+  construction, this is not a problem.
+
+
+Implementation
+==============
+
+An implementation is available in branch ``finalize`` of the repository
+at http://hg.python.org/features/finalize/.
+
+
+Validation
+==========
+
+Besides running the normal Python test suite, the implementation adds
+test cases for various finalization possibilities including reference cycles,
+object resurrection and legacy ``tp_del`` slots.
+
+The implementation has also been checked to not produce any regressions on
+the following test suites:
+
+* `Tulip <http://code.google.com/p/tulip/>`_, which makes an extensive
+  use of generators
+
+* `Tornado <http://www.tornadoweb.org>`_
+
+* `SQLAlchemy <http://www.sqlalchemy.org/>`_
+
+* `Django <https://www.djangoproject.com/>`_
+
+* `zope.interface <http://pypi.python.org/pypi/zope.interface>`_
+
+
+References
+==========
+
+Notes about reference cycle collection and weak reference callbacks:
+http://hg.python.org/cpython/file/4e687d53b645/Modules/gc_weakref.txt
+
+Generator memory leak: http://bugs.python.org/issue17468
+
+Allow objects to decide if they can be collected by GC:
+http://bugs.python.org/issue9141
+
+Module shutdown procedure based on GC
+http://bugs.python.org/issue812369
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list