A couple garbage collector questions

Tue Apr 3 00:33:07 EDT 2001

kragen at dnaco.net (Kragen Sitaker) writes:

> Douglas Alan  <nessus at mit.edu> wrote:

> >The fact that Python uses referencing counting for its front-line GC
> >has some very desirable properties.  Two come immediately to mind:

> >   (1) Predictable destructor semantics.  This is in contrast to Java,
> >       in which destructors are useless because you don't know if and
> >       when they will ever be called.

> I mostly disagree.  If the timing of destructor calls was really
> predictable, you'd just call them yourself instead of depending on
> the GC to do it for you.

No I wouldn't.  I don't like writing code when the language can do
things for me automatically and save me the work.

> If you know you're done with a file, for example,
> you can close it;

I could.  But I'd prefer the language to close it for me.

> if no part of your program knows when to close it (because more than
> one part of your program is using it, and they stop using it in an
> indeterminate order) then you don't really know when the destructor
> will be called, either.

I know that the destructor will be called as soon as the object is no
longer needed.  That is all the predictability that I require.

> It *is* more deterministic than a real GC, but it's still a long way
> from "predictable" by my lights.

It's predictable enough to do the job the way that I want it to be
done, while with a "real GC", it isn't.

> >   (2) No long GC pauses.

> Again, I mostly disagree.  You get long GC pauses when the last
> reference to any large, complex data structure (or a data structure
> with slow destructors) gets overwritten or goes out of scope.

When such pauses occur is much more predictable than the pauses that
occur with a non-realtime GC, and such pauses are typically much
shorter.  Realtime GC's are cool, but my impression is that they are
fairly expensive and hairy to implement.

> >You might be able to make a "real garbage collector" that has these
> >properties, but it would, I believe, be pretty complicated.

> You can't even make a reference-counting GC that has these
> properties --- you need C++-style (Limbo-style?) auto_ptrs.

auto_ptrs are not remotely like GC, so I don't see how this is
relevant.  Reference counting gives you most of the benefits of a real
GC, while preserving the usefulness of destructors.  auto_ptrs are
merely a shorthand so that you don't have to explicitly free storage.

> > Since referencing-counting typically gets rid of 99.9% of all garbage,

> Reference-counting typically gets rid of 100% of all garbage.

Let's not quibble over .1%?

> But sometimes it gets rid of 0% of all garbage, or (more often)
> somewhere in between.  Which garbage it will actually collect is a
> global property of your program and cannot always be deduced from
> looking at the code that handles that garbage.

The operative words are "cannot *always* be deduced".  This is
consistent with "can *often* be deduced".  The vast majority of useful
programs do not require a real GC.  When you need one, however, it is
*very* nice to have a real GC.  Consequently, I am very happy that
they finally added one to Python.

> > backing it up with a simple mark & sweep GC seems adequate for
> > most purposes.

> Reference-counting exacts very heavy performance costs, no matter
> what you back it up with.

What's so heavy about its performance cost?  Will a mark & sweep GC
perform better on average?  I'm skeptical.  A compacting, copying
garbage collector probably will, but it can double the space cost of
your program.

|>oug