[Python-Dev] GC Proposal

Sun Jun 29 04:31:37 CEST 2008

On Sat, Jun 28, 2008 at 2:42 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> The effect is similar for the "batch allocation" case, but opposite
>> for the "long-running program" case.
>
> I don't understand. Where is the difference?
>
>> My proposal can be made equivalent to Martin's proposal by removing
>> all of its pending traces when an untraced object is deleted.  We
>> could even change this at runtime, by adding a counter for pending
>> objects.
>
> What is a "pending trace"?
>
>> Come to think of it, I think Martin's proposal needs to be implemented
>> as mine.  He wants the middle generation to be 10% larger than the
>> oldest generation
>
> Not exactly: 10% of the oldest generation, not 10% larger. So if the
> oldest generation has 1,000,000 objects, I want to collect when the
> survivors from the middle generation reach 100,000, not when they reach
> 1,100,000.
>
>> but to find out the size you need to either iterate
>> it (reintroducing the original problem), or keep some counters.  With
>> counters, his middle generation size is my "pending traces".
>
> Yes, I have two counters: one for the number of objects in the oldest
> generation (established at the last collection, and unchanged
> afterwards), and one for the number of survivors from the middle
> collection (increased every time objects move to the oldest
> generation).
>
> So it seems there are minor difference (such as whether a counter
> for the total number of traceable objects is maintained, which you
> seem to be suggesting), but otherwise, I still think the proposals
> are essentially the same.

They are definitely quite close to equivalent.  The terminology
doesn't quite match up, so let me rearrange things and compare:

old * 0.1 <= survivors    # Martin
old <= survivors * 2.0    # Adam

Looks about equivalent, but "survivors" may mean two different things
depending on if it removes deleted survivors or not.  Splitting that
up, we get this form:

old <= survivors * 2.0 + deleted * 1.0

The deleted count ensures stable memory loads will still eventually
cause full collections.  Since our GC isn't
incremental/concurrent/realtime, we'd probably don't want the full
collection pauses except from big bursts, which is trivially done by
making the deleted factor 0.0.  My original proposal was assuming a
non-zero deleted factor, while yours (and the existing codebase)
assumed it'd be zero - this is how our two proposals differed.

(My "pending traces" is merely a running total of survivors * 2.0 +
deleted * 1.0.  It looks much easier to keep separate counts though.)

-- 
Adam Olsen, aka Rhamphoryncus