[Python-Dev] C API for gc.enable() and gc.disable()

Terry Reedy tjreedy at udel.edu
Sat Jun 21 20:40:16 CEST 2008



Kevin Jacobs <jacobs at bioinformed.com> wrote:

> I can say with complete certainty that of the 20+ programmers I've had 
> working for me, many who have used Python for 3+ years, not a single one 
> would think to question the garbage collector if they observed the kind 
> of quadratic time complexity I've demonstrated.  This is not because 
> they are stupid, but because they have only a vague idea that Python 
> even has a garbage collector, never mind that it could be behaving badly 
> for such innocuous looking code.

As I understand it, gc is needed now more that ever because new style 
classes make reference cycles more common.  On the other hand, greatly 
increased RAM size (from some years ago) makes megaobject bursts 
possible.  Such large bursts move the hidden quadratic do-nothing drag 
out of the relatively flat part of the curve (total time just double or 
triple what it should be) to where it can really bite.  Leaving aside 
what you do for your local group, can we better warn Python programmers 
now, for the upcoming 2.5, 2.6, and 3.0 releases?

Paragraph 3 of the Reference Manual chapter on Data Model(3.0 version) says:
"Objects are never explicitly destroyed; however, when they become 
unreachable they may be garbage-collected. An implementation is allowed 
to postpone garbage collection or omit it altogether — it is a matter of 
implementation quality how garbage collection is implemented, as long as 
no objects are collected that are still reachable. (Implementation note: 
the current implementation uses a reference-counting scheme with 
(optional) delayed detection of cyclically linked garbage, which 
collects most objects as soon as they become unreachable, but is not 
guaranteed to collect garbage containing circular references. See the 
documentation of the gc module for information on controlling the 
collection of cyclic garbage.)"
I am not sure what to add here, (especially for those who do not read it;-).

The Library Manual gc section says "Since the collector supplements the 
reference counting already used in Python, you can disable the collector 
if you are sure your program does not create reference cycles."  Perhaps 
  it should also say "You should disable when creating millions of 
objects without cycles".

The installed documentation set (on Windows, at least) include some 
Python HOWTOs.  If one were added on Space Management (implementations, 
problems, and solutions), would your developers read it?

> Maybe we should consider more carefully before declaring the status quo 
> sufficient.  Average developers do allocate millions of objects in 
> bursts and super-linear time complexity for such operations is not 
> acceptable.  Thankfully I am around to help my programmers work around 
> such issues or else they'd be pushing to switch to Java, Ruby, C#, or 
> whatever since Python was inexplicably "too slow" for "real work".  This 
> being open source, I'm certainly willing to help in the effort to do so, 
> but not if potential solutions will be ruled out as being unnecessary.

To me, 'sufficient' (time-dependent) and 'necessary' are either too 
vague or  too strict to being about what you want -- change.  This is 
the third thread I have read (here + c.l.p) on default-mode gc  problems 
(but all in the last couple of years or so).  So, especially with the 
nice table someone posted recently, on time with and without gc, and 
considering that installed RAM continues to grow, I am persuaded that 
default behavior improvement that does not negatively impact the vast 
majority would be desirable.

Terry Jan Reedy



More information about the Python-Dev mailing list