[Python-Dev] Proposal: Run GC less often

"Martin v. Löwis" martin at v.loewis.de
Sat Jun 21 22:23:16 CEST 2008


Here is my proposal for making the GC run less often.
The objective is to avoid the quadratic-time behavior
if many objects are created and none of them is garbage.

The youngest generation remains collected after 700
allocations, the middle generation after 10 young collections.
Therefore, full GC remains considered every 7000 allocations.
The two youngest generations don't cause quadratic behavior,
as the number of objects in each generation is bounded.

Currently, full GC is invoked every 10 middle collections.
Under my proposal, 10 middle collections must have passed,
PLUS the number of survivor objects from the middle generation
must exceed 10% of the number of objects in the oldest
generation.

As a consequence, garbage collection becomes less frequent
as the number of objects on the heap grows, resulting in
an amortized O(N) overhead for garbage collection (under
the access pattern stated above).

To determine the number of survivor objects, counts are
taken during the collection. Objects deallocated through
reference counting still remain accounted for as survivors
(likewise for determining the size of the oldest generation).

I made a simulation assuming an initially-empty heap,
and invocations of the collector every M=7000 objects.
The table below is in units of M and shows the number
of objects allocated, and the number of objects inspected
since the start of the simulation, for the old and the
new scheme (the asterisk indicates that a collection
took place; lines with no collection are dropped):

 total old_inspected new_inspected
   10         10*         10*
   20         30*         30*
   30         60*         60*
   40        100*        100*
   50        150*        150*
   60        210*        210*
   70        280*        280*
   80        360*        360*
   90        450*        450*
  100        550*        450
  102        550         552*
  110        660*        552
  115        660         667*
  120        780*        667
  129        780         796*
  130        910*        796
  140       1050*        796
...
  940      44650*       7724
  942      44650        8666*
  950      45600*       8666
  960      46560*       8666
  970      47530*       8666
  980      48510*       8666
  990      49500*       8666
 1000      50500*       8666
...
 9990    4995000*      95887

As you can see, the old and the new scheme would start
of equally until 100M objects have been allocated. The
table shows how the old scheme grows quadratically, and
the new scheme eventually approaches roughly a factor
of ten between the number of objects and the number of
times that GC considers an object.

Applications with a small number of objects will see no
change in behavior, also, applications that create
lots of small cycles will continue to see them collected
in the youngest or middle generations.

Please let me know what you think.

Regards,
Martin

P.S. I don't plan to implement this myself, so if you like
the idea, go ahead.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comp.py
Type: text/x-python
Size: 719 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20080621/6bf90fed/attachment.py>


More information about the Python-Dev mailing list