[Python-Dev] Proposal: Run GC less often

Leif Walsh adlaiff6 at gmail.com
Sat Jun 21 23:33:15 CEST 2008


If you can get me a version of the interpreter with this change made
(I wouldn't know what to change), I can run a very
allocation/deallocation-heavy application I have lying around, and get
you some good benchmarks.

On Sat, Jun 21, 2008 at 1:23 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
- Show quoted text -
> Here is my proposal for making the GC run less often.
> The objective is to avoid the quadratic-time behavior
> if many objects are created and none of them is garbage.
>
> The youngest generation remains collected after 700
> allocations, the middle generation after 10 young collections.
> Therefore, full GC remains considered every 7000 allocations.
> The two youngest generations don't cause quadratic behavior,
> as the number of objects in each generation is bounded.
>
> Currently, full GC is invoked every 10 middle collections.
> Under my proposal, 10 middle collections must have passed,
> PLUS the number of survivor objects from the middle generation
> must exceed 10% of the number of objects in the oldest
> generation.
>
> As a consequence, garbage collection becomes less frequent
> as the number of objects on the heap grows, resulting in
> an amortized O(N) overhead for garbage collection (under
> the access pattern stated above).
>
> To determine the number of survivor objects, counts are
> taken during the collection. Objects deallocated through
> reference counting still remain accounted for as survivors
> (likewise for determining the size of the oldest generation).
>
> I made a simulation assuming an initially-empty heap,
> and invocations of the collector every M=7000 objects.
> The table below is in units of M and shows the number
> of objects allocated, and the number of objects inspected
> since the start of the simulation, for the old and the
> new scheme (the asterisk indicates that a collection
> took place; lines with no collection are dropped):
>
>  total old_inspected new_inspected
>   10         10*         10*
>   20         30*         30*
>   30         60*         60*
>   40        100*        100*
>   50        150*        150*
>   60        210*        210*
>   70        280*        280*
>   80        360*        360*
>   90        450*        450*
>  100        550*        450
>  102        550         552*
>  110        660*        552
>  115        660         667*
>  120        780*        667
>  129        780         796*
>  130        910*        796
>  140       1050*        796
> ...
>  940      44650*       7724
>  942      44650        8666*
>  950      45600*       8666
>  960      46560*       8666
>  970      47530*       8666
>  980      48510*       8666
>  990      49500*       8666
>  1000      50500*       8666
> ...
>  9990    4995000*      95887
>
> As you can see, the old and the new scheme would start
> of equally until 100M objects have been allocated. The
> table shows how the old scheme grows quadratically, and
> the new scheme eventually approaches roughly a factor
> of ten between the number of objects and the number of
> times that GC considers an object.
>
> Applications with a small number of objects will see no
> change in behavior, also, applications that create
> lots of small cycles will continue to see them collected
> in the youngest or middle generations.
>
> Please let me know what you think.
>
> Regards,
> Martin
>
> P.S. I don't plan to implement this myself, so if you like
> the idea, go ahead.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/adlaiff6%40gmail.com
>
>



--
Cheers,
Leif


More information about the Python-Dev mailing list