Queue cleanup

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sat Aug 28 07:25:05 EDT 2010


On Fri, 27 Aug 2010 18:06:19 -0700, Paul Rubin wrote:

> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
>> I've repeatedly asked, both here and elsewhere, why reference counting
>> isn't "real" garbage collection. Nobody has been able to give me a
>> satisfactory answer. As far as I can tell, it's a bit of
>> pretentiousness with no basis in objective fact.
> 
> Well, it's a bit of a subjective matter.  I'd say it's not real gc
> because 1) it's unsound (misses reference cycles), 

You can add cycle detection to a reference count gc, at the cost of more 
complexity.

If you read the Wikipedia article I linked to, tracing algorithms can 
also be unsound:

    Some collectors running in a particular environment can 
    correctly identify all pointers (references) in an object; 
    these are called "precise" (also "exact" or "accurate") 
    collectors, the opposite being a "conservative" or "partly
    conservative" collector. Conservative collectors have to 
    assume that any bit pattern in memory could be a pointer if 
    (when interpreted as a pointer) it would point into any 
    allocated object. Thus, conservative collectors may have 
    some false negatives, where storage is not released because 
    of accidental fake pointers...

http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)



> and 2) it requires
> constant attention from the mutator to incr and decr the reference
> counts.

Yes. And?


> So developing modules for the CPython API means endlessly
> finding and fixing refcount bugs that lead to either crashes/security
> failures, or memory leaks.  If you program the Java JNI or a typical
> Lisp FFI, you'll find that real gc is a lot simpler to use since you
> avoid all the refcount maintenance hassles.  You allocate memory and
> shut your eyes, and the gc takes care of freeing it when it figures out
> that you are done.  Refcounting is basically a form of manual memory
> management, while gc is automatic.


That's a problem with the CPython API, not reference counting. The 
problem is that the CPython API is written at too low a level, beneath 
that at which the garbage collector exists, so naturally you have to 
manually manage memory.


> Someone said here recently that as a program gets larger, saying "this
> will work as long as we do X every time without fail" becomes equal to
> saying "this won't work".  Substitute "properly maintain all ref counts"
> for X and you can see the problem.  I've seen released "production"
> "tested" Python C modules with subtle refcount bugs on more than one
> occasion.  In gc'd systems there are fewer places for the code to go
> wrong.

On the other hand, tracing gcs have their own set of problems too, mostly 
due to the use of finalizers and attempts to make garbage collection run 
more predictably. See here:

http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/topic/com.ibm.java.doc.diagnostics.142j9/html/coexistwithgc.html

Quote:

    For tidying Java resources, think about the use of a clean 
    up routine. When you have finished with an object, call the 
    routine to null out all references, deregister listeners, 
    clear out hash tables, and so on. This is far more efficient 
    than using a finalizer and has the useful side-benefit of 
    speeding up garbage collection. The Garbage Collector does 
    not have so many object references to chase in the next 
    garbage collection cycle.


Translated: "Rather than relying on the garbage collector to clean up 
resources after you, do it yourself, manually, so the garbage collector 
has less work to do."

Tracing garbage collectors aren't a panacea. They're software themselves, 
and complex software, which means they're subject to bugs like the one 
which plagued Flash plugin 9:

http://gskinner.com/blog/archives/2008/04/failure_to_unlo.html

The more complicated the garbage collector, the more scope you have for 
some interaction between your high-level code and the gc leading to 
memory not be reclaimed or extreme slowdown. Like this:

http://tech.puredanger.com/2009/02/11/linkedblockingqueue-garbagecollection/





-- 
Steven



More information about the Python-list mailing list