[Python-Dev] finalization again

Wed, 08 Mar 2000 09:06:53 -0500

> A trash cycle without a finalizer isn't a problem, right?  In that case,
> topsort rules have no visible consquence so it doesn't matter in what order
> you merely reclaim the memory.

When we have a pile of garbage, we don't know whether it's all
connected or whether it's lots of little cycles.  So if we find
[objects with -- I'm going to omit this] finalizers, we have to put
those on a third list and put everything reachable from them on that
list as well (the algorithm I described before).

What's left on the first list then consists of finalizer-free garbage.
We dispose of this garbage by clearing dicts and lists.  Hopefully
this makes the refcount of some of the finalizers go to zero -- those
are finalized in the normal way.

And now we have to deal with the inevitable: finalizers that are part
of cycles.  It makes sense to reduce the graph of objects to a graph
of finalizers only.  Example:

  A <=> b -> C <=> d

A and C have finalizers.  C is part of a cycle (C-d) that contains no
other finalizers, but C is also reachable from A.  A is part of a
cycle (A-b) that keeps it alive.  The interesting thing here is that
if we only look at the finalizers, there are no cycles!

If we reduce the graph to only finalizers (setting aside for now the
problem of how to do that -- we may need to allocate more memory to
hold the reduced greaph), we get:

  A -> C

We can now finalize A (even though its refcount is nonzero!).  And
that's really all we can do!  A could break its own cycle, thereby
disposing of itself and b.  It could also break C's cycle, disposing
of C and d.  It could do nothing.  Or it could resurrect A, thereby
resurrecting all of A, b, C, and d.

This leads to (there's that weird echo again :-) Boehm's solution:
Call A's finalizer and leave the rest to the next time the garbage
collection runs.

Note that we're now calling finalizers on objects with a non-zero
refcount.  At some point (probably as a result of finalizing A) its
refcount will go to zero.  We should not finalize it again -- this
would serve no purpose.  Possible solution:

  INCREF(A);
  A->__del__();
  if (A->ob_refcnt == 1)
      A->__class__ = NULL; /* Make a finalizer-less */
  DECREF(A);

This avoids finalizing twice if the first finalization broke all
cycles in which A is involved.  But if it doesn't, A is still cyclical
garbage with a finalizer!  Even if it didn't resurrect itself.

Instead of the code fragment above, we could mark A as "just
finalized" and when it shows up at the head of the tree (of finalizers
in cyclical trash) again on the next garbage collection, to discard it
without calling the finalizer again (because this clearly means that
it didn't resurrect itself -- at least not for a very long time).

I would be happier if we could still have a rule that says that a
finalizer is called only once by magic -- even if we have two forms of
magic: refcount zero or root of the tree.  Tim: I don't know if you
object against this rule as a matter of principle (for the sake of
finalizers that resurrect the object) or if your objection is really
against the unordered calling of finalizers legitimized by Java's
rules.  I hope the latter, since I think it that this rule (__del__
called only once by magic) by itself is easy to understand and easy to
deal with, and I believe it may be necessary to guarantee progress for
the garbage collector.

The problem is that the collector can't easily tell whether A has
resurrected itself.  Sure, if the refcount is 1 after the finalizer
run, I know it didn't resurrect itself.  But even if it's higher than
before, that doesn't mean it's resurrected: it could have linked to
itself.  Without doing a full collection I can't tell the difference.
If I wait until a full collection happens again naturally, and look at
the "just finalized flag", I can't tell the difference between the
case whereby the object resurrected itself but died again before the
next collection, and the case where it was dead already.  So I don't
know how many times it was expecting the "last rites" to be performed,
and the object can't know whether to expect them again or not.  This
seems worse than the only-once rule to me.

Even if someone once found a good use for resurrecting inside __del__,
against all recommendations, I don't mind breaking their code, if it's
for a good cause.  The Java rules aren't a good cause.  But top-sorted
finalizer calls seem a worthy cause.

So now we get to discuss what to do with multi-finalizer cycles, like:

  A <=> b <=> C

Here the reduced graph is:

  A <=> C

About this case you say:

> If it has an object with a finalizer, though, at the very worst you can let
> it leak, and  make the collection of leaked objects available for
> inspection.  Even that much is a *huge* "improvement" over what they have
> today:  most cycles won't have a finalizer and so will get reclaimed, and
> for the rest they'll finally have a simple way to identify exactly where the
> problem is, and a simple criterion for predicting when it will happen.  If
> that's not "good enough", then without abandoning principle the user needs
> to have some way to reduce such a cycle *to* a topsort case themself.
> 
> > I find having a separate __cleanup__ protocol cumbersome.
> 
> Same here, but if you're not comfortable leaking, and you agree Python is
> not in the business of guesing in inherently ambiguous situations, maybe
> that's what it takes!  MAL and GregS both gravitated to this kind of thing
> at once, and that's at least suggestive; and MAL has actually been using his
> approach.  It's explicit, and that's Pythonic on the face of it.
> 
> > I think that the "finalizer only called once by magic" rule is reasonable.
> 
> If it weren't for its specific use in emulating Java's scheme, would you
> still be in favor of that?  It's a little suspicious that it never came up
> before <wink>.

Suspicious or not, it still comes up.  I still like it.  I still think
that playing games with resurrection is evil.  (Maybe my spiritual
beliefs shine through here -- I'm a convinced atheist. :-)

Anyway, once-only rule aside, we still need a protocol to deal with
cyclical dependencies between finalizers.  The __cleanup__ approach is
one solution, but it also has a problem: we have a set of finalizers.
Whose __cleanup__ do we call?  Any?  All?  Suggestions?

Note that I'd like some implementation freedom: I may not want to
bother with the graph reduction algorithm at first (which seems very
hairy) so I'd like to have the right to use the __cleanup__ API
as soon as I see finalizers in cyclical trash.  I don't mind disposing
of finalizer-free cycles first, but once I have more than one
finalizer left in the remaining cycles, I'd like the right not to
reduce the graph for topsort reasons -- that algorithm seems hard.

So we're back to the __cleanup__ design.  Strawman proposal: for all
finalizers in a trash cycle, call their __cleanup__ method, in
arbitrary order.  After all __cleanup__ calls are done, if the objects
haven't all disposed of themselves, they are all garbage-collected
without calling __del__.  (This seems to require another garbage
colelction cycle -- so perhaps there should also be a once-only rule
for __cleanup__?)

Separate question: what if there is no __cleanup__?  This should
probably be reported: "You have cycles with finalizers, buddy!  What
do you want to do about them?"  This same warning could be given when
there is a __cleanup__ but it doesn't break all cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)