[Python-checkins] cpython (3.2): Issue #14775: Fix a potential quadratic dict build-up due to the garbage

antoine.pitrou python-checkins at python.org
Mon May 28 22:31:23 CEST 2012


http://hg.python.org/cpython/rev/47e6217d0e84
changeset:   77210:47e6217d0e84
branch:      3.2
parent:      77208:a2e140b083e0
user:        Antoine Pitrou <solipsis at pitrou.net>
date:        Mon May 28 22:22:34 2012 +0200
summary:
  Issue #14775: Fix a potential quadratic dict build-up due to the garbage collector repeatedly trying to untrack dicts.
Additional comments by Tim Silk.

files:
  Misc/ACKS          |   1 +
  Misc/NEWS          |   3 +
  Modules/gcmodule.c |  60 ++++++++++++++++++++++++++++++++-
  3 files changed, 61 insertions(+), 3 deletions(-)


diff --git a/Misc/ACKS b/Misc/ACKS
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -866,6 +866,7 @@
 Itamar Shtull-Trauring
 Eric Siegerman
 Paul Sijben
+Tim Silk
 Kirill Simonov
 Nathan Paul Simons
 Janne Sinkkonen
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -10,6 +10,9 @@
 Core and Builtins
 -----------------
 
+- Issue #14775: Fix a potential quadratic dict build-up due to the garbage
+  collector repeatedly trying to untrack dicts.
+
 - Issue #14494: Fix __future__.py and its documentation to note that
   absolute imports are the default behavior in 3.0 instead of 2.7.
   Patch by Sven Marnach.
diff --git a/Modules/gcmodule.c b/Modules/gcmodule.c
--- a/Modules/gcmodule.c
+++ b/Modules/gcmodule.c
@@ -116,6 +116,46 @@
     http://mail.python.org/pipermail/python-dev/2008-June/080579.html
 */
 
+/*
+   NOTE: about untracking of mutable objects.
+   
+   Certain types of container cannot participate in a reference cycle, and
+   so do not need to be tracked by the garbage collector. Untracking these
+   objects reduces the cost of garbage collections. However, determining
+   which objects may be untracked is not free, and the costs must be
+   weighed against the benefits for garbage collection.
+
+   There are two possible strategies for when to untrack a container:
+
+   i) When the container is created.
+   ii) When the container is examined by the garbage collector.
+
+   Tuples containing only immutable objects (integers, strings etc, and
+   recursively, tuples of immutable objects) do not need to be tracked.
+   The interpreter creates a large number of tuples, many of which will
+   not survive until garbage collection. It is therefore not worthwhile
+   to untrack eligible tuples at creation time.
+
+   Instead, all tuples except the empty tuple are tracked when created. 
+   During garbage collection it is determined whether any surviving tuples 
+   can be untracked. A tuple can be untracked if all of its contents are 
+   already not tracked. Tuples are examined for untracking in all garbage 
+   collection cycles. It may take more than one cycle to untrack a tuple.
+
+   Dictionaries containing only immutable objects also do not need to be
+   tracked. Dictionaries are untracked when created. If a tracked item is
+   inserted into a dictionary (either as a key or value), the dictionary
+   becomes tracked. During a full garbage collection (all generations),
+   the collector will untrack any dictionaries whose contents are not
+   tracked.
+
+   The module provides the python function is_tracked(obj), which returns
+   the CURRENT tracking status of the object. Subsequent garbage
+   collections may change the tracking status of the object.
+   
+   Untracking of certain containers was introduced in issue #4688, and 
+   the algorithm was refined in response to issue #14775.
+*/
 
 /* set for debugging information */
 #define DEBUG_STATS             (1<<0) /* print collection statistics */
@@ -437,9 +477,6 @@
             if (PyTuple_CheckExact(op)) {
                 _PyTuple_MaybeUntrack(op);
             }
-            else if (PyDict_CheckExact(op)) {
-                _PyDict_MaybeUntrack(op);
-            }
         }
         else {
             /* This *may* be unreachable.  To make progress,
@@ -457,6 +494,20 @@
     }
 }
 
+/* Try to untrack all currently tracked dictionaries */
+static void
+untrack_dicts(PyGC_Head *head)
+{
+    PyGC_Head *next, *gc = head->gc.gc_next;
+    while (gc != head) {
+        PyObject *op = FROM_GC(gc);
+        next = gc->gc.gc_next;
+        if (PyDict_CheckExact(op))
+            _PyDict_MaybeUntrack(op);
+        gc = next;
+    }
+}
+
 /* Return true if object has a finalization method. */
 static int
 has_finalizer(PyObject *op)
@@ -857,6 +908,9 @@
         gc_list_merge(young, old);
     }
     else {
+        /* We only untrack dicts in full collections, to avoid quadratic
+           dict build-up. See issue #14775. */
+        untrack_dicts(young);
         long_lived_pending = 0;
         long_lived_total = gc_list_size(young);
     }

-- 
Repository URL: http://hg.python.org/cpython


More information about the Python-checkins mailing list