[Python-Dev] finalization again

Tue, 07 Mar 2000 08:52:00 -0500

Warning: long message.  If you're not interested in reading all this,
please skip to "Conclusion" at the end.

At Tim's recommendation I had a look at what section 12.6 of the Java
language spec says about finalizers. The stuff there is sure seductive
for language designers...

Have a look at te diagram at
http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its
(seeming) complexity, it helped me understand some of the issues of
finalization better. Rather than the complex 8-state state machine
that it appears to be, think of it as a simple 3x3 table. The three
rows represent the categories reachable, finalizer-reachable
(abbreviated in the diagram as f-reachable), and unreachable. These
categories correspond directly to categories of objects that the
Schemenauer-Tiedemann cycle-reclamation scheme deals with: after
moving all the reachable objects to the second list (first the roots
and then the objects reachable from the roots), the first list is left
with the unreachable and finalizer-reachable objects.

If we want to distinguish between unreachable and finalizer-reachable
at this point, a straightforward application of the same algorithm
will work well: Create a third list (this will contain the
finalizer-reachable objects). Start by filling it with all the objects
from the first list (which contains the potential garbage at this
point) that have a finalizer. We can look for objects that have
__del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it
doesn't matter here.(*) Then walk through the third list, following
each object's references, and move all referenced objects that are
still in the first list to the third list. Now, we have:

List 1: truly unreachable objects. These have no finalizers and can be
discarded right away.

List 2: truly reachable objects. (Roots and objects reachable from
roots.) Leave them alone.

List 3: finalizer-reachable objects. This contains objects that are
unreachable but have a finalizer, and objects that are only reachable
through those.

We now have to decide on a policy for invoking finalizers. Java
suggests the following: Remember the "roots" of the third list -- the
nodes that were moved there directly from the first list because they
have a finalizer. These objects are marked *finalizable* (a category
corresponding to the second *column* of the Java diagram). The Java
spec allows the Java garbage collector to call all of these finalizers
in any order -- even simultaneously in separate threads. Java never
allows an object to go back from the finalizable to the unfinalized
state (there are no arrows pointing left in the diagram). The first
finalizer that is called could make its object reachable again (up
arrow), thereby possibly making other finalizable objects reachable
too. But this does not cancel their scheduled finalization! The
conclusion is that Java can sometimes call finalization on unreachable
objects -- but only if those objects have gone through a phase in
their life where they were unreachable or at least
finalizer-unreachable.

I agree that this is the best that Java can do: if there are cycles
containing multiple objects with finalizers, there is no way (short of
asking the programmer(s)) to decide which object to finalize first. We
could pick one at random, run its finalizer, and start garbage
collection all over -- if the finalizer doesn't resurrect anything,
this will give us the same set of unreachable objects, from which we
could pick the next finalizable object, and so on. That looks very
inefficient, might not terminate (the same object could repeatedly
show up as the candidate for finalization), and it's still arbitrary:
the programmer(s) still can't predict which finalizer in a cycle with
multiple finalizers will be called first. Assuming the recommended
characteristics of finalizers (brief and robust), it won't make much
difference if we call all finalizers (of the now-finalizeable objects)
"without looking back". Sure, some objects may find themselves in a
perfectly reachable position with their finalizer called -- but they
did go through a "near-death experience". I don't find this
objectionable, and I don't see how Java could possibly do better for
cycles with multiple finalizers.

Now let's look again at the rule that an object's finalizer will be
called at most once automatically by the garbage collector. The
transitions between the colums of the Java diagram enforce this: the
columns are labeled from left to right with unfinalized, finalizable,
and finalized, and there are no transition arrows pointing left. (In
my description above, already finalized objects are considered not to
have a finalizer.) I think this rule makes a lot of sense given Java's
multi-threaded garbage collection: the invocation of finalizers could
run concurreltly with another garbage collection, and we don't want
this to find some of the same finalizable objects and call their
finalizers again!

We could mark them with a "finalization in progress" flag only while
their finalizer is running, but in a cycle with multiple finalizers it
seems we should keep this flag set until *all* finalizers for objects
in the cycle have run. But we don't actually know exactly what the
cycles are: all we know is "these objects are involved in trash
cycles". More detailed knowledge would require yet another sweep, plus
a more hairy two-dimensional data structure (a list of separate
cycles).  And for what? as soon as we run finalizers from two separate
cycles, those cycles could be merged again (e.g. the first finalizer
could resurrect its cycle, and the second one could link to it). Now
we have a pool of objects that are marked "finalization in progress"
until all their finalizations terminate. For an incremental concurrent
garbage collector, this seems a pain, since it may continue to find
new finalizable objects and add them to the pile. Java takes the
logical conclusion: the "finalization in progress" flag is never
cleared -- and renamed to "finalized".

Conclusion
----------

Are the Java rules complex? Yes. Are there better rules possible? I'm
not so sure, given the requirement of allowing concurrent incremental
garbage collection algorithms that haven't even been invented
yet. (Plus the implied requirement that finalizers in trash cycles
should be invoked.) Are the Java rules difficult for the user? Only
for users who think they can trick finalizers into doing things for
them that they were not designed to do. I would think the following
guidelines should do nicely for the rest of us:

1. Avoid finalizers if you can; use them only to release *external*
(e.g. OS) resources.

2. Write your finalizer as robust as you can, with as little use of
other objects as you can.

3. Your only get one chance. Use it.

Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
don't have to know whether your object is involved in a cycle -- your
finalizer will still be called.

I am reconsidering to use the __del__ method as the finalizer. As a
compromise to those who want their __del__ to run whenever the
reference count reaches zero, the finalized flag can be cleared
explicitly. I am considering to use the following implementation:
after retrieving the __del__ method, but before calling it,
self.__del__ is set to None (better, self.__dict__['__del__'] = None,
to avoid confusing __setattr__ hooks). The object call remove
self.__del__ to clear the finalized flag. I think I'll use the same
mechanism to prevent __del__ from being called upon a failed
initialization.

Final note: the semantics "__del__ is called whenever the reference
count reaches zero" cannot be defended in the light of a migration to
different forms of garbage collection (e.g. JPython).  There may not
be a reference count.

--Guido van Rossum (home page: http://www.python.org/~guido/)

____
(*) Footnote: there's one complication: to ask a Python class instance
if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If
the object's class has a __getattr__ hook, this can invoke arbitrary
Python code -- even if the answer to the question is "no"! This can
make the object reachable again (in the Java diagram, arrows pointing
up or up and right). We could either use instance_getattr1(), which
avoids the __getattr__ hook, or mark all class instances as
finalizable until proven innocent.