[Python-Dev] iterzip()
Jeremy Hylton
jeremy@zope.com
Mon, 29 Apr 2002 18:31:57 -0400
>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:
>> I was imagining a scheme like this: Count increfs and decrefs.
>> Set two thresholds. A collection occurs when both thresholds are
>> exceeded. Perhaps 100 decrefs and 1000 increfs.
GvR> I expect you can't literally count increfs and decrefs. These
GvR> are macros that need to be super-fast, and I think we can't
GvR> really afford to increment a counter on eacn macro invocation.
GvR> The current thresholds are used to count the number of
GvR> allocations.
I was being sloppy. I meant allocations and deallactions.
GvR> Adding the number of deallocations to mix seems dangerous: when
GvR> (nearly) all data is tied up in cycles, there may not be any
GvR> deallocations.
Probably right, although function calls should provide a steady stream
of deallocations. Frame, locals, &c. are deallocated on exit. So
unless the code is blocked waiting for those cycles to be collected,
it ought to eventually make progress.
GvR> It seems hard to distinguish between these two cases:
GvR> (a) lots of memory is allocated and kept alive for real by
GvR> containers
GvR> (b) lots of memory is allocated and kept alive accidentally by
GvR> cycles
GvR> The zip example is a case of (a), but the same allocation
GvR> behavior could ensue from case (b). Only running the allocator
GvR> can determine which case we're seeing. I like Tim's idea of
GvR> adjusting the thresholds base upon the effectiveness of recent
GvR> collections.
I agree that this sounds interesting.
>> How does this come into play in the benchmark in question? It
>> seems like we should have gotten a lot of quick collections, but
>> it was still quite slow.
GvR> The benchmark has a list with a million elements, and during
GvR> execution more and more tuples are added to it. I expect that
GvR> somehow all these tuples are visited every 700 allocations,
GvR> which gives quadratic behavior. I guess the generational part
GvR> of the collector doesn't affect this --
I guess this is a question for Neil. I assumed that the generational
part would affect this.
GvR> the only way to reduce the work would be if the big list
GvR> somehow was skipped once it was known to be "old". But since
GvR> it contains references to "new" object (the 700 tuples
GvR> allocated last), it probably ends up being visited anyway.
I thought something was labelled old after it survived some number of
collections. Why would the age of the objects it contains have any
affect?
Jeremy