[Python-Dev] pymalloc killer

Tim Peters tim.one@comcast.net
Fri, 29 Mar 2002 20:03:02 -0500


[Tim]
>> Note that I am *not* proposing to change PyMem_{MALLOC, Malloc)

[martin@v.loewis.de]
> I understand that (I think). Neither do I.

I think you have to under this scheme, although I'm not clear on what "the
problem" is you're trying to solve.  I'm trying to cater to allocating via
PyMem_{MALLOC, Malloc} then free'ing via _PyMalloc_Free.  That latter is
because mixing PyObject_New with PyMem_{DEL, FREE, Del, Free} is frequent in
real-life code now.  If Guido is to get his wish that we keep the
PyObject_XXX interface and not introduce a new PyMalloc_XXX object
interface, and I'm to get my wish that we don't break mountains of existing
works-fine code, that means PyObject_XXX *and* PyMem_{DEL, FREE, Del, Free}
have to funnel thru the pymalloc package.  Else existing "get object" and
"free object" mixtures will break.  That in turn means memory allocated by
PyMem_{Malloc, MALLOC, Realloc, REALLOC, Resize, RESIZE, New, NEW} *also*
has to be suitable for feeding into pymalloc's free.  But I'm not proposing
to change PyMem_{Malloc, MALLOC, Realloc, REALLOC, Resize, RESIZE, New, NEW}
at all, meaning they'll continue to call platform malloc() and realloc()
directly without any tricky wrappers.

>> That's the killer.  We have no control over "the system
>> allocator", by which I mean libc's malloc() and realloc().

> We sure do. In _PyMalloc_Malloc, replace the line

This can't affect anything obtained via PyMem_{MALLOC, Malloc}, because they
call the platform malloc() directly.  We don't "wrap" them in any
significant way.

> 	return (void *)PyMem_MALLOC(nbytes);
>
> with
>
>         void **result;
>
>         result = (void**)PyMem_MALLOC(nbytes+8);
>         result[1] = NULL;
> 	return (result+2);

Yes, _PyMalloc_Malloc can do anything at all, but PyMem_{MALLOC, Malloc}
don't go thru this code.

> Likewise, replace
>
> 	offset = (off_t )p & POOL_SIZE_MASK;
> 	pool = (poolp )((block *)p - offset);
> 	if (pool->pooladdr != pool || pool->magic != (uint )POOL_MAGIC) {
> 		PyMem_FREE(p);
> 		return;
> 	}
>
> with
>
>         void **block = (void**)p;
>         if(block[-1] == NULL){
> 		PyMem_FREE(block-2);
> 		return;
> 	}

Something allocated via PyMem_{MALLOC, Malloc} that goes through this code
will read random gibberish (or carefully constructed hostile gibberish) at
block[-1].  So it's not safe for what I'm trying to accomplish.

>> As above, trying to hijack them is unattractive due to thread
>> issues, even for just the PyMem_{Malloc, MALLOC, Realloc, REALLOC,
>> New, NEW, Resize, RESIZE} spellings.

> I can't see how this approach would add thread issues.

That's because it seems you're not trying to make PyMem_{MALLOC, Malloc}
safe for mixed use at all.  If you did, I suppose they could funnel through
a wrapper other than _PyMalloc_Malloc (at the cost of other complications).

>> *If* we take over the system malloc, we need also to mimic the platform
>> malloc's alignment.

> I'm not sure what you mean by "take over".

I mean stop PyMem_XXX from calling platform malloc/realloc/free directly, so
that they can be made safe for mixing with _PyMalloc_Free.

> It will still be used as an allocator for arenas, and for large requests.

> ...
> Indeed, this approach might be slightly more expensive. In return, it
> is safe and (time-)efficient.

It's safe under some set of assumptions I'm not sure about, but not safe
enough under the assumptions I am sure about <wink>.

>> Primarily that hijacking the system malloc is fraught with new
>> difficulties.

> That might be, but I'm not proposing this (or I don't understand the
> work "hijacking").

It means "take over" <wink>.  If a scheme can't handle free'ing memory
safely when that memory was returned directly from the platform
malloc/realloc, it can't handle the sum of the things Guido and I want here
(he's keener to preserve PyObject_XXX; I'm keener not to break existing
code, even code that doesn't play by what some subset of Python-Dev'vers
thinks might be "the rules").

> ...
> That safe-guard isn't needed, IMO. Returning memory allocated directly
> through malloc (and not via _PyMalloc_Malloc) to _PyMalloc_Free is
> clearly a bug,

Then you either have to change PyMem_{Malloc, MALLOC, etc etc etc} too, or
you have to tell extension authors that every extension type they wrote for
1.5.2 will now fail in horrible ways (data-dependent and intermittent
segfaults, data corruption, etc), and tell users that every "old" extension
they pick up on the web may suffer such problems too even if it compiles
cleanly for 2.3.

> and there are more efficient ways to detect this kind of bug (like your
> pymalloc debug mechanism). In addition, there are so many other kinds of
> bugs that this doesn't protect against (like not taking the GC header
> into account when releasing memory) that this alone isn't convincing.

The difference is that I don't consider mixing PyObject_XXX with PyMem_XXX
to be "a bug".  In 1.5.2 such mixing was necessary, and still works fine in
2.2.  I worry about extension authors who *do* bother to fix code that works
fine <wink>.