[Python-Dev] finalization again

Tim Peters tim_one@email.msn.com
Wed, 8 Mar 2000 01:25:56 -0500


[Guido]
> Granted.  I can read Java code and sometimes I write some, but I'm not
> a Java programmer by any measure, and I wasn't aware that finalize()
> has a general bad rep.

It does, albeit often for bad reasons.

1. C++ programmers seeking to emulate techniques based on C++'s
   rigid specification of the order and timing of destruction of autos.

2. People pushing the limits (as in the URL I happened to post).

3. People trying to do anything <wink>.  Java's finalization semantics
   are very weak, and s-l-o-w too (under most current implementations).

Now I haven't used Java for real in about two years, and avoided finalizers
completely when I did use it.  I can't recall any essential use of __del__ I
make in Python code, either.  So what Python does here makes no personal
difference to me.  However, I frequently respond to complaints & questions
on c.l.py, and don't want to get stuck trying to justify Java's uniquely
baroque rules outside of comp.lang.java <0.9 wink>.

>> [Tim, passes on the first relevant URL he finds:
>>  http://www.quoininc.com/quoininc/Design_Java0197.html]

> It seems the authors make one big mistake: they recommend to call
> finalize() explicitly.  This may be par for the Java course: the
> quality of the materials is often poor, and that has to be taken into
> account when certain features have gotten a bad rep.

Well, in the "The Java Programming Language", Gosling recommends to:

a) Add a method called close(), that tolerates being called multiple
   times.

b) Write a finalize() method whose body calls close().

People tended to do that at first, but used a bunch of names other than
"close" too.  I guess people eventually got weary of having two methods that
did the same thing, so decided to just use the single name Java guaranteed
would make sense.

> (These authors also go on at length about the problems of GC in a real-
> time situation -- attempts to use Java in sutations for which it is
> inappropriate are also par for the course, inspired by all the hype.)

I could have picked any number of other URLs, but don't regret picking this
one:  you can't judge a ship in smooth waters, and people will push *all*
features beyond their original intents.  Doing so exposes weaknesses.
Besides, Sun won't come out & say Java is unsuitable for real-time, no
matter how obvious it is <wink>.

> Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that
> you should never call finalize() explicitly (except that you should
> always call super.fuinalize() in your finalize() method).

You'll find lots of conflicting advice here, be it about Java or C++.  Java
may be unique, though, in the universality of the conclusion Bruce draws
here:

> (Bruce goes on at length explaining that there aren't a lot of things
> you should use finalize() for -- except to observe the garbage collector.
:-)

Frankly, I think Java would be better off without finalizers.  Python could
do fine without __del__ too -- if you and I were the only users <0.6 wink>.

[on Java's lack of ordering promises]
> True, but note that Python won't have the ordering problem, at least
> not as long as we stick to reference counting as the primary means of
> GC.  The ordering problem in Python will only happen when there are
> cycles, and there you really can't blame the poor GC design!

I cannot.  Nor do I intend to.  The cyclic ordering problem isn't GC's
fault, it's the program's; but GC's *response* to it is entirely GC's
responsibility.

>> ... The Java spec is unhelpful here too:
>>
>>  Therefore, we recommend that the design of finalize methods be kept
>>  simple and that they be programmed defensively, so that they will
>>  work in all cases.
>>
>> Mom and apple pie, but what does it mean, exactly?  The spec realizes
>> that you're going to be tempted to try things that won't work, but
>> can't really explain what those are in terms simpler than the full set
>> of implementation consequences.  As a result, users hate it -- but
>> don't take my word for that!  If you look & don't find that Java's
>> finalization rules are widely viewed as "a problem to be wormed around"
>> by serious Java programmers, fine -- then you've got a much better
>> search engine than mine <wink>.

> Hm.  Of course programmers hate finalizers.

Oh no!  C++ programmers *love* destructors!  I mean it, they're absolutely
gaga over them.  I haven't detected signs that CPython programmers hate
__del__ either, except at shutdown time.  Regardless of language, they love
them when they're predictable and work as expected, they hate them when
they're unpredictable and confusing.  C++ auto destructors are extremely
predictable (e.g., after "{SomeClass a, b; ...}", b is destructed before a,
and both destructions are guaranteed before leaving the block they're
declared in, regardless of whether via return, exception, goto or falling
off the end).  CPython's __del__ is largely predictable (modulo shutdown,
cycles, and sometimes exceptions).  The unhappiness in the Java world comes
from Java finalizers' unpredictability and consequent all-around uselessness
in messy real life.

> They hate GC as well.

Yes, when it's unpredictable and confusing <wink>.

> But they hate even more not to have it (witness the relentless
> complaints about Python's "lack of GC" -- and Java's GC is often
> touted as one of the reasons for its superiority over C++).

Back when JimF & I were looking at gc, we may have talked each other into
really believing that paying careful attention to RC issues leads to cleaner
and more robust designs.  In fact, I still believe that, and have never
clamored for "real gc" in Python.  Jim now may even be opposed to "real gc".
But Jim and I and you all think a lot about the art of programming, and most
users just don't have time or inclination for that -- the slowly changing
nature of c.l.py is also clear evidence of this.  I'm afraid this makes
growing "real GC" a genuine necessity for Python's continued growth.  It's
not a *bad* thing in any case.  Think of it as a marketing requirement <0.7
wink>.

> I think this stuff is just hard!  (Otherwise why would we be here
> having this argument?)

Honest to Guido, I think it's because you're sorely tempted to go down an
un-Pythonic path here, and I'm fighting that.  I said early on there are no
thoroughly good answers (yes, it's hard), but that's nothing new for Python!
We're having this argument solely because you're confusing Python with some
other language <wink>.

[a 2nd or 3rd plug for taking topsort seriously]
> Maybe we have a disconnect?

Not in the technical analysis, but in what conclusions to take from it.

> We *are* using topsort -- for non-cyclical data structures.  Reference
> counting ensure that. Nothing in my design changes that.

And it's great!  Everyone understands the RC rules pretty quickly, lots of
people like them a whole lot, and if it weren't for cyclic trash everything
would be peachy.

> The issue at hand is what to do with *cyclical* data structures, where
> topsort doesn't help.  Boehm, on
> http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html,
> says: "Cycles involving one or more finalizable objects are never
> finalized."

This is like some weird echo chamber, where the third time I shout something
the first one comes back without any distortion at all <wink>.  Yes, Boehm's
first rule is "Do No Harm".  It's a great rule.  Python follows the same
rule all over the place; e.g., when you see

    x = "4" + 2

you can't possibly know what was intended, so you refuse to guess:  you
would rather *kill* the program than make a blind guess!  I see cycles with
finalizers as much the same:  it's plain wrong to guess when you can't
possibly know what was intended.  Because topsort is the only principled way
to decide order of finalization, and they've *created* a situation where a
topsort doesn't exist, what they're handing you is no less amibiguous than
in trying to add a string to an int.  This isn't the time to abandon topsort
as inconvenient, it's the time to defend it as inviolate principle!

The only throughly rational response is "you know, this doesn't make
sense -- since I can't know what you want here, I refuse to pretend that I
can".  Since that's "the right" response everywhere else in Python, what the
heck is so special about this case?  It's like you decided Python *had* to
allow adding strings to ints, and now we're going to argue about whether
Perl, Awk or Tcl makes the best unprincipled guess <wink>.

> The question remains, what to do with trash cycles?

A trash cycle without a finalizer isn't a problem, right?  In that case,
topsort rules have no visible consquence so it doesn't matter in what order
you merely reclaim the memory.

If it has an object with a finalizer, though, at the very worst you can let
it leak, and  make the collection of leaked objects available for
inspection.  Even that much is a *huge* "improvement" over what they have
today:  most cycles won't have a finalizer and so will get reclaimed, and
for the rest they'll finally have a simple way to identify exactly where the
problem is, and a simple criterion for predicting when it will happen.  If
that's not "good enough", then without abandoning principle the user needs
to have some way to reduce such a cycle *to* a topsort case themself.

> I find having a separate __cleanup__ protocol cumbersome.

Same here, but if you're not comfortable leaking, and you agree Python is
not in the business of guesing in inherently ambiguous situations, maybe
that's what it takes!  MAL and GregS both gravitated to this kind of thing
at once, and that's at least suggestive; and MAL has actually been using his
approach.  It's explicit, and that's Pythonic on the face of it.

> I think that the "finalizer only called once by magic" rule is reasonable.

If it weren't for its specific use in emulating Java's scheme, would you
still be in favor of that?  It's a little suspicious that it never came up
before <wink>.

> I believe that the ordering problems will be much less than in Java,
because
> we use topsort whenever we can.

No argument here, except that I believe there's never sufficient reason to
abandon topsort ordering.  Note that BDW's adamant refusal to yield on this
hasn't stopped "why doesn't Python use BDW?" from becoming a FAQ <wink>.

a-case-where-i-expect-adhering-to-principle-is-more-pragmatic-
    in-the-end-ly y'rs  - tim