[Python-Dev] GC Proposal
Adam Olsen
rhamph at gmail.com
Sun Jun 29 04:31:37 CEST 2008
On Sat, Jun 28, 2008 at 2:42 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> The effect is similar for the "batch allocation" case, but opposite
>> for the "long-running program" case.
>
> I don't understand. Where is the difference?
>
>> My proposal can be made equivalent to Martin's proposal by removing
>> all of its pending traces when an untraced object is deleted. We
>> could even change this at runtime, by adding a counter for pending
>> objects.
>
> What is a "pending trace"?
>
>> Come to think of it, I think Martin's proposal needs to be implemented
>> as mine. He wants the middle generation to be 10% larger than the
>> oldest generation
>
> Not exactly: 10% of the oldest generation, not 10% larger. So if the
> oldest generation has 1,000,000 objects, I want to collect when the
> survivors from the middle generation reach 100,000, not when they reach
> 1,100,000.
>
>> but to find out the size you need to either iterate
>> it (reintroducing the original problem), or keep some counters. With
>> counters, his middle generation size is my "pending traces".
>
> Yes, I have two counters: one for the number of objects in the oldest
> generation (established at the last collection, and unchanged
> afterwards), and one for the number of survivors from the middle
> collection (increased every time objects move to the oldest
> generation).
>
> So it seems there are minor difference (such as whether a counter
> for the total number of traceable objects is maintained, which you
> seem to be suggesting), but otherwise, I still think the proposals
> are essentially the same.
They are definitely quite close to equivalent. The terminology
doesn't quite match up, so let me rearrange things and compare:
old * 0.1 <= survivors # Martin
old <= survivors * 2.0 # Adam
Looks about equivalent, but "survivors" may mean two different things
depending on if it removes deleted survivors or not. Splitting that
up, we get this form:
old <= survivors * 2.0 + deleted * 1.0
The deleted count ensures stable memory loads will still eventually
cause full collections. Since our GC isn't
incremental/concurrent/realtime, we'd probably don't want the full
collection pauses except from big bursts, which is trivially done by
making the deleted factor 0.0. My original proposal was assuming a
non-zero deleted factor, while yours (and the existing codebase)
assumed it'd be zero - this is how our two proposals differed.
(My "pending traces" is merely a running total of survivors * 2.0 +
deleted * 1.0. It looks much easier to keep separate counts though.)
--
Adam Olsen, aka Rhamphoryncus
More information about the Python-Dev
mailing list