[Python-Dev] GC Proposal

Adam Olsen rhamph at gmail.com
Fri Jun 27 02:53:15 CEST 2008


We need two counters: one is the total number of traceable objects
(those we would inspect if we did a full collection) and a number of
"pending" trace operations.  Every time an object is moved into the
last generation, we increase "pending" by two - once for itself and
once for an older object.  Once pending equals the total number of
traceable objects we do a full collection (and reset "pending" to 0).

This behaves well under both extremes.  First, if allocating in a
burst, it waits until the total number of objects has doubled before
doing a collection, which works out to about 2 passes per object.

Second, a long running program may allocate an object, get it into the
oldest generation (adding to "pending"), but then delete it through
normal means.  The pending trace is effectively borrowed by the
remaining older objects, which ensures they will get scanned again
eventually.  However, we already payed for it when the young object
was allocated, so the net cost is still a constant factor.

Various refinements are possible, such as only adding 1.5 to
"pending", or taking off some of the 2 if a young object is deleted
without getting traced.  We could also tweak when in the object
life-cycle "pending" is increased.

The first two generations would remain as they are today.

-- 
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list