[pypy-svn] r76441 - pypy/extradoc/planning
agaynor at codespeak.net
agaynor at codespeak.net
Mon Aug 2 21:56:56 CEST 2010
Author: agaynor
Date: Mon Aug 2 21:56:54 2010
New Revision: 76441
Added:
pypy/extradoc/planning/gc.txt
Log:
Added some notes on carding-marking GC.
Added: pypy/extradoc/planning/gc.txt
==============================================================================
--- (empty file)
+++ pypy/extradoc/planning/gc.txt Mon Aug 2 21:56:54 2010
@@ -0,0 +1,46 @@
+Card marking GC for PyPy
+========================
+
+With a generational GC one needs to keep track of references from the old
+generation to the young one using a write barrier. Currently this is
+implemented with a list of old objects that contain pointers to young objects.
+Then, when the nursery is collected each remembered old object is scanned for
+any pointers it contains to young objects, and those are promoted out of the
+nursery. The problem with this is exceptionally large objects with pointers
+from the old generation to the young one:
+
+ def main():
+ objs = []
+ for i in xrange(10000000):
+ objs.append(i)
+ # A bunch of stuff here.
+
+If that loop does enough things on each iteration to force a garbage collection
+then every time through the loop ``objs`` will contain a pointer to a young
+object (``i``) and will be scanned, in it's entirety for young pointers. This
+results in the loop taking quadratic time, where it ought to be linear. The
+solution to this is a strategy named card marking.
+
+In a card marking generational collector, instead of storing a list of old
+objects, the heap is partitioned into sections, called cards (generally these
+are smaller than a page of memory), and the write barrier marks the card for
+the region of memory that contains the pointer. Then, during collection
+instead of scanning objects for pointers, the regions of memory identified by
+marked cards are scanned. This bounds the amount of space that must be scanned
+by the size of a card, rather than by the size of an object (which is variable,
+and could grow infinitely large).
+
+In principle this sounds nice, but there are a few issues:
+
+ * We have pre-built constants (PBCs), which don't live in our contiguous heap.
+ Since the write barrier is supposed to be very cheap, we don't want it to
+ need to special case PBCs. There are two proposed solutions:
+ * Copy all PBCs to the heap on startup. This requires double the memory for
+ PBCs. (According to fijal, this is what the JVM does).
+ * Don't pre-build PBCs, instead allocate them fresh on startup. The OO type
+ backends do this, and it results in very slow startup time (5-6 seconds),
+ though there are likely other factors involved.
+ * Currently the hybrid GC allocates large objects externally to the heap,
+ this causes the same problem as PBCs.
+ * The JVM apparently also allocates large objects externally, and pays a
+ similar price for it.
More information about the Pypy-commit
mailing list