[pypy-commit] extradoc extradoc: Nitty gritty details: starting. May become too long

arigo noreply at buildbot.pypy.org
Mon Oct 14 20:08:21 CEST 2013


Author: Armin Rigo <arigo at tunes.org>
Branch: extradoc
Changeset: r5077:e25ed3d1866c
Date: 2013-10-14 20:08 +0200
http://bitbucket.org/pypy/extradoc/changeset/e25ed3d1866c/

Log:	Nitty gritty details: starting. May become too long

diff --git a/blog/draft/incremental-gc.rst b/blog/draft/incremental-gc.rst
--- a/blog/draft/incremental-gc.rst
+++ b/blog/draft/incremental-gc.rst
@@ -49,4 +49,69 @@
 Nitty gritty details
 ====================
 
+This was done as a patch to "minimark", our current GC, and called
+"incminimark" for now.  The former is a generational stop-the-world GC.
+New objects are allocated "young", i.e. in the nursery, a special zone
+of a few MB of memory.  When it is full, a "minor collection" step moves
+the surviving objects out of the nursery.  This can be done quickly (a
+few millisecond at most) because we only need to walk through the young
+objects that survive --- usually a small fraction of all young objects.
+From time to time, this minor collection is followed by a "major
+collection": in that step, we walk *all* objects to classify which ones
+are still alive and which ones are now dead (*marking*) and free the
+memory occupied by the dead ones (*speeding*).
 
+This "major collection" is what gives the long GC pauses.  To fix this
+problem we made the GC incremental: instead of running one complete
+major collection, we split its work into a variable number of pieces
+and run each piece after every minor collection for a while, until there
+are no more pieces.  The pieces are each doing a fraction of marking, or
+a fraction of sweeping.
+
+The main issue is that splitting the major collections means that the
+main program is actually running between the pieces, and so can change
+the pointers in the objects to point to other objects.  This is not
+a problem for sweeping: dead objects will remain dead whatever the main
+program does.  However, it is a problem for marking.  Let us see why.
+
+In terms of the incremental GC literature, objects are either "white",
+"gray" or "black".  They start as "white", become "gray" when they are
+found to be alive, and become "black" when they have been fully
+traversed --- at which point the objects that it points to have
+themselves been marked gray, or maybe are already black.  The gray
+objects are the "frontier" between the black objects that we have found
+to be reachable, and the white objects that represent the unknown part
+of the world.  When there are no more gray objects, the process is
+finished: all remaining white objects are unreachable and can be freed
+(by the following sweeping phase).
+
+In this model, the important part is that a black object can never point
+to a white object: if the latter remains white until the end, it will be
+freed, which is incorrect because the black object itself can still be
+reached.
+
+The trick we used in PyPy is to consider minor collections as part of
+the whole, rather than focus only on major collections.  The existing
+minimark GC had always used a "write barrier" to do its job, like any
+generational GC.  This write barrier is used to detect when an old
+object (outside the nursery) is modified to point to a young object
+(inside the nursery), which is essential information for minor
+collections.  Actually, although this was the goal, the actual write
+barrier code was simpler: it just recorded all old objects into which we
+wrote *any* pointer --- to a young or old object.  It is actually a
+performance improvement, because we don't need to check over and over
+again if the written pointer points to a young object or not.
+
+This *unmodified* write barrier works for incminimark too.  Imagine that
+we are in the middle of the marking phase, running the main program.
+The write barrier will record all old objects that are being modified.
+Then at the next minor collection, all surviving young objects will be
+moved out of the nursery.  At this point, as we're about to continue
+running the major collection's marking phase, we simply add to the list
+of pending gray objects all the objects that we consider --- both the
+objects listed as "old objects that are being modified", and the objects
+that we just moved out of the nursery.  A fraction of the former list
+are turned back from the black to the gray color.  This technique
+implements nicely, if indirectly, what is called a "backward write
+barrier" in the literature: the backwardness is about the color that
+occasionally progresses backward from black to gray.


More information about the pypy-commit mailing list