[pypy-commit] extradoc extradoc: Nitty gritty details: starting. May become too long
arigo
noreply at buildbot.pypy.org
Mon Oct 14 20:08:21 CEST 2013
Author: Armin Rigo <arigo at tunes.org>
Branch: extradoc
Changeset: r5077:e25ed3d1866c
Date: 2013-10-14 20:08 +0200
http://bitbucket.org/pypy/extradoc/changeset/e25ed3d1866c/
Log: Nitty gritty details: starting. May become too long
diff --git a/blog/draft/incremental-gc.rst b/blog/draft/incremental-gc.rst
--- a/blog/draft/incremental-gc.rst
+++ b/blog/draft/incremental-gc.rst
@@ -49,4 +49,69 @@
Nitty gritty details
====================
+This was done as a patch to "minimark", our current GC, and called
+"incminimark" for now. The former is a generational stop-the-world GC.
+New objects are allocated "young", i.e. in the nursery, a special zone
+of a few MB of memory. When it is full, a "minor collection" step moves
+the surviving objects out of the nursery. This can be done quickly (a
+few millisecond at most) because we only need to walk through the young
+objects that survive --- usually a small fraction of all young objects.
+From time to time, this minor collection is followed by a "major
+collection": in that step, we walk *all* objects to classify which ones
+are still alive and which ones are now dead (*marking*) and free the
+memory occupied by the dead ones (*speeding*).
+This "major collection" is what gives the long GC pauses. To fix this
+problem we made the GC incremental: instead of running one complete
+major collection, we split its work into a variable number of pieces
+and run each piece after every minor collection for a while, until there
+are no more pieces. The pieces are each doing a fraction of marking, or
+a fraction of sweeping.
+
+The main issue is that splitting the major collections means that the
+main program is actually running between the pieces, and so can change
+the pointers in the objects to point to other objects. This is not
+a problem for sweeping: dead objects will remain dead whatever the main
+program does. However, it is a problem for marking. Let us see why.
+
+In terms of the incremental GC literature, objects are either "white",
+"gray" or "black". They start as "white", become "gray" when they are
+found to be alive, and become "black" when they have been fully
+traversed --- at which point the objects that it points to have
+themselves been marked gray, or maybe are already black. The gray
+objects are the "frontier" between the black objects that we have found
+to be reachable, and the white objects that represent the unknown part
+of the world. When there are no more gray objects, the process is
+finished: all remaining white objects are unreachable and can be freed
+(by the following sweeping phase).
+
+In this model, the important part is that a black object can never point
+to a white object: if the latter remains white until the end, it will be
+freed, which is incorrect because the black object itself can still be
+reached.
+
+The trick we used in PyPy is to consider minor collections as part of
+the whole, rather than focus only on major collections. The existing
+minimark GC had always used a "write barrier" to do its job, like any
+generational GC. This write barrier is used to detect when an old
+object (outside the nursery) is modified to point to a young object
+(inside the nursery), which is essential information for minor
+collections. Actually, although this was the goal, the actual write
+barrier code was simpler: it just recorded all old objects into which we
+wrote *any* pointer --- to a young or old object. It is actually a
+performance improvement, because we don't need to check over and over
+again if the written pointer points to a young object or not.
+
+This *unmodified* write barrier works for incminimark too. Imagine that
+we are in the middle of the marking phase, running the main program.
+The write barrier will record all old objects that are being modified.
+Then at the next minor collection, all surviving young objects will be
+moved out of the nursery. At this point, as we're about to continue
+running the major collection's marking phase, we simply add to the list
+of pending gray objects all the objects that we consider --- both the
+objects listed as "old objects that are being modified", and the objects
+that we just moved out of the nursery. A fraction of the former list
+are turned back from the black to the gray color. This technique
+implements nicely, if indirectly, what is called a "backward write
+barrier" in the literature: the backwardness is about the color that
+occasionally progresses backward from black to gray.
More information about the pypy-commit
mailing list