[Python-Dev] iterzip()

Jeremy Hylton jeremy@zope.com
Mon, 29 Apr 2002 18:31:57 -0400


>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

  >> I was imagining a scheme like this: Count increfs and decrefs.
  >> Set two thresholds.  A collection occurs when both thresholds are
  >> exceeded.  Perhaps 100 decrefs and 1000 increfs.

  GvR> I expect you can't literally count increfs and decrefs.  These
  GvR> are macros that need to be super-fast, and I think we can't
  GvR> really afford to increment a counter on eacn macro invocation.

  GvR> The current thresholds are used to count the number of
  GvR> allocations.

I was being sloppy.  I meant allocations and deallactions.

  GvR> Adding the number of deallocations to mix seems dangerous: when
  GvR> (nearly) all data is tied up in cycles, there may not be any
  GvR> deallocations.

Probably right, although function calls should provide a steady stream
of deallocations.  Frame, locals, &c. are deallocated on exit.  So
unless the code is blocked waiting for those cycles to be collected,
it ought to eventually make progress.

  GvR> It seems hard to distinguish between these two cases:

  GvR> (a) lots of memory is allocated and kept alive for real by
  GvR>     containers

  GvR> (b) lots of memory is allocated and kept alive accidentally by
  GvR>     cycles

  GvR> The zip example is a case of (a), but the same allocation
  GvR> behavior could ensue from case (b).  Only running the allocator
  GvR> can determine which case we're seeing.  I like Tim's idea of
  GvR> adjusting the thresholds base upon the effectiveness of recent
  GvR> collections.

I agree that this sounds interesting.

  >> How does this come into play in the benchmark in question?  It
  >> seems like we should have gotten a lot of quick collections, but
  >> it was still quite slow.

  GvR> The benchmark has a list with a million elements, and during
  GvR> execution more and more tuples are added to it.  I expect that
  GvR> somehow all these tuples are visited every 700 allocations,
  GvR> which gives quadratic behavior.  I guess the generational part
  GvR> of the collector doesn't affect this --

I guess this is a question for Neil.  I assumed that the generational
part would affect this.

  GvR> the only way to reduce the work would be if the big list
  GvR> somehow was skipped once it was known to be "old".  But since
  GvR> it contains references to "new" object (the 700 tuples
  GvR> allocated last), it probably ends up being visited anyway.

I thought something was labelled old after it survived some number of
collections.  Why would the age of the objects it contains have any
affect?

Jeremy