[Python-ideas] [Python-Dev] GC Changes

Adam Olsen rhamph at gmail.com
Mon Oct 1 17:54:43 CEST 2007


[This should be on python-ideas, so I'm replying to there instead of python-dev]

On 10/1/07, Justin Tulloss <tulloss2 at uiuc.edu> wrote:
> Hello,
>
> I've been doing some tests on removing the GIL, and it's becoming clear that
> some basic changes to the garbage collector may be needed in order for this
> to happen efficiently. Reference counting as it stands today is not very
> scalable.
>
> I've been looking into a few options, and I'm leaning towards the
> implementing IBMs recycler GC (
> http://www.research.ibm.com/people/d/dfb/recycler-publications.html
> ) since it is very similar to what is in place now from the users'
> perspective. However, I haven't been around the list long enough to really
> understand the feeling in the community on GC in the future of the
> interpreter. It seems that a full GC might have a lot of benefits in terms
> of performance and scalability, and I think that the current gc module is of
> the mark-and-sweep variety. Is the trend going to be to move away from
> reference counting and towards the mark-and-sweep implementation that
> currently exists, or is reference counting a firmly ingrained tradition?

Refcounting is fairly firmly ingrained in CPython, but there are
conservative GCs for C that mostly work, and other implementations
aren't so restricted.

The problem with Python is that it produces a *lot* of garbage.
Pystones on my box does around a million objects per second and fills
up available ram in about 10 seconds.  Not only do you need to collect
often enough to not fill up the ram, but for *good* performance you
need to collect often enough to keep your L1 cache hot.  That would
seem to demand a generational GC at least.

You might as well assume it'll be more expensive than refcounting[1].
The real advantage would be in scalability.  Concurrent, parallel GCs
are an active field of research though.  If you're really interested
you should research conservative GCs aimed at C in general, and only
minimally interact with CPython (such as to disable the custom
allocators.)

A good stepping off point is The Memory Management Reference (although
it looks like it hasn't been updated in the last few years).  If some
of my terms are unfamiliar to you, go start reading. ;)
http://www.memorymanagement.org/



[1] This statement is only in the context of CPython, of course.
There are certainly many situations where a tracing GC performs
better.

-- 
Adam Olsen, aka Rhamphoryncus



More information about the Python-ideas mailing list