[Patches] Re: Garbage collection patches for Python

Vladimir Marangozov Vladimir.Marangozov@inrialpes.fr
Thu, 10 Feb 2000 12:06:05 +0100 (CET)


nascheme@enme.ucalgary.ca wrote:
> 
> On Wed, Feb 09, 2000 at 03:59:48PM +0100, Vladimir Marangozov wrote:
> > The goal is to remove Python's dependency on the standard POSIX interface
> > (malloc/realloc/free) so that we can cleanly and easily plug in the future
> > a "proprietary" mem manager, other than the one in the C library. For this
> > purpose, the Python core should be patched and "cleaned" to use one or more
> > of the following APIs:
> [...]
> 
> Shouldn't all these be based on the same malloc?  We could
> define everything in terms of PyMem_MALLOC, PyMem_REALLOC, and
> PyMem_FREE if that makes things clearer.

Yes, everything should be built on top/in terms of PyMem_MALLOC,
including the PyMem_NEW family. That's exactly what I was trying to say.

Thus switching to another malloc would equal changing 3 macros.

> 
> > Every chunk of memory must be manupulated via the same malloc family.
> 
> Yes, and this is where things get tricky.  Extension modules can
> use malloc to allocate objects and pass them to the Python core.
> Python uses PyMem_FREE or similar and *boom*, memory corruption
> (if they are different mallocs).

No! This should not happen if the extension writer knows what she does!

Two cases:

a) If Python allocates a chunk, it's Python's job to release it.
b) If a library allocates a chunk, it's the library's job to release it.

Corrolaries:

c) If a library allocates a chunk for Python, it's not Python's job
   to release it.
d) If Python allocates a chunk for a library, it's not the library's job
   to release it.

IOW, modules must manage *only* their own kids.
If c) and d) happen to be false, this is a sure sign of bad design.

This means that if Python gets some new mem from an extension, e.g.
through "tree *t = new_tree()", and this tree has to be released,
the extension must export a "del_tree(t)" function that Python should call.

If it does not, then "it is implicitely assumed that the extension uses
libc's malloc() for allocating any new mem, and the exposed function for
releasing it by others is libc's free()".

I agree that this may be tricky sometimes, but the principles remain the
same.

BTW, you'll note that for all Python objects, Guido has respected this
rule: the code for allocating and releasing an object is always in the
same file (e.g: PyInt_New and int_dealloc) and this is done systematically
for all objects. i.e every object implementation exports it's own
malloc/free.

> 
> > That is, if one gets some piece of mem through PyMem_MALLOC (1),
> > s/he must release that memory with PyMem_FREE (1). Accordingly, if one
> > gets a chunk via PyMem_MALLOC (1), that chunk *should not* be released
> > with PyMem_DEL (2).  (which is what Neil's patch does, not to mention
> > that (2) is not defined in terms of (1).
> 
> I don't think we want more than one malloc within Python.

I don't think we need that either.

> IMHO, it would be impossible to keep the calls straight.  Why would we
> want PyMem_MALLOC to use a different malloc than PyMem_NEW?

We don't want that. Do we? I don't think I've said anything like that.
I insisted on the principle of not mixing the APIs. But all API's are
defined in terms of PyMem_MALLOC.

> 
> > 1) pypcre.c
> > 
> 
> The fact that you are sure this is buggy shows how tricky this
> business is.  Of course, I could always be wrong too.  :)

Ok. I admit that I never insisted on fully understanding this code.
So free() should be PyMem_FREE, right?

> 
> > 3) readline.c
> > 
> > Neil, what's this? Could you elaborate on this one?
> 
> Its very ugly.  Readline returns memory allocated by malloc.
> That memory is eventually freed from within the interpreter by
> PyMem_FREE.

How so? This shouldn't happen! At least not with PyMem_FREE! See my
plea above. If Python has to free mem allocated by readline (i.e.
by malloc() implicitely), we have to use free().

> We can't change the PyMem_FREE call to free because
> it also frees memory allocated by PyMem_MALLOC.

Ah! This is another story. Can't we manage to separate the two things
when freeing? If we managed to mix them (malloc() + PyMem_MALLOC), we
should be able to separate them and use free() + PyMem_FREE.

About GC: let's discuss that later and stay focused on the patch.
It's not very exciting as a task, but it needs to be done and it will
pay later. FWIW, your collective progress on GC sounds really good!

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252