[Python-Dev] Lazily GC tracking tuples

Wed, 8 May 2002 08:32:52 -0700 (PDT)

(replying to Neil & Tim)

--- Tim Peters <tim_one@email.msn.com> wrote:
> [Neil Schemenauer, on Daniel Dunbar's interesting tuple-untrack scheme]
> > I'm not sure I like this approach.  Slowing down PyTuple_SET_ITEM to
> > optimize a micro-benchmark seems dubious.  However, dropping the number
> > of tracked objects by 50% _does_ seem worthwhile.  Couldn't we get the

That was basically the opinion I reached as well - changing 
PyTuple_SET_ITEM seems nasty in principle, and the performance 
results were dubious at best... the 50% drop in tracked objects
also seemed like the most interesting thing to me, however, its 
not clear that that percentage would remain under the dynamic 
conditions of a real application.

> > same effect by checking and untracking the appropriate tuples after
> > finishing a gen1 collection?  Each tuple would be checked once and short
> > lived tuples will probably not be checked at all.

The other thing to note is that untracking such tuples is a
very small win regardless - if they were not untracked the
tuple visit routine will quickly scan down the tuple (which
I guess is likely to be small to start with), and the GC
visitor will quickly reject adding any of the contained
objects because they are not PyObject_IS_GC().

That suggests that the only case it would make a difference
is a) when there are soooo many untrackable tuples that
simply traversing them takes a large amount of time (ie.
the iterzip case), or b) when there is a massive nesting
of untrackable tuples - an large AST or binary-tree might
benefit from this, but there are perhaps better solutions
to that problem (see below)

< snip - Tim talks about issues with removing tuples in the collector >

I still don't understand how you 'check the tuple' and then
untrack it - even if it has no NULL slots it still isn't safe 
to remove.

I don't think I agree that special casing tuples in the
collector is a good idea - this whole issue just came
up because of the iterzip test, where it was noticed
that the collector really didn't have to check all
the objects, which is just a special case solution to 
the problem that many of the objects in the gen*
lists are not going to be participating in a cycle.

A solution to that general problem would be much more 
beneficial, although presumably much more work ;)

If the collector could somehow prioritize objects,
it seems like various heuristics could be used to
gauge the likelyhood of an object participating in
a cycle vs. the 'good' of collecting that cycle.

As an aside: are there some nice established tests
for estimating the benefit of modifications to
the collector? Something that needs the collector
(so getting cycles early keeps the memory consumption
down, improving performance) and also has dynamic 
object allocation properties that mimic some ideal
of real-applications?

=====
daniel dunbar
evilzr@yahoo.com

__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com