[Python-Dev] pymalloc killer

Tim Peters tim.one@comcast.net
Fri, 29 Mar 2002 18:21:57 -0500


[Guido, points out a bunch of memory mgmt problems with myreadline.c
 and PyOS_StdioReadline, which I might understand if I devoted April to
 it <0.9 wink>]

I believe the last suggestion I posted gets much closer to allowing any
mixture of PyMem_XXX with platform malloc/realloc/free without thread
issues.  Seems the only trouble can come if a PyObject_{New, NewVar, Malloc,
Realloc} needs to allocate a new arena *and* needs to grow the vector of
arena base addresses, at the same time some jackass calls PyMem_{DEL, Del,
FREE, Free} in another thread without holding the GIL.  There's a tiny
window of vulnerability then in the latter guy, as the base-address vector
may shift in memory while the vector is growing, leaving the latter guy
indexing into suddenly-stale memory.  This would have to happen in a window
of about 3 machine instructions to do any harm, at the same time the
base-address vector is moving.  Yuck.

> I know of two places that calls PyMem_Malloc outside the GIL:
> PyOS_StdioReadline in Parser/myreadline.c and call_readline() in
> Modules/readline.c.  The memory thus allocated is returned by
> PyOS_Readline() (which calls either function through a hook pointer),
> and both its callers (the tokenizer and raw_input()) free the result
> using PyMem_DEL or PyMem_FREE (these two seem to be used
> synonymically).  The tokenizer code may also apply PyMem_RESIZE to it
> (it incorporated the input line in its buffer structure).
>
> This would have to be fixed by changing the allocations to use
> malloc() (as they did up to 1.5.2 :-) and by changing the consumers to
> use free() and realloc().  (An alternative would be to let
> PyOS_Readline() copy the memory to PyMem_Malloc'ed memory.)

This is the kind of heroic effort I don't want to impose on users:  you have
encyclopedic knowledge of how the Python implementation may be abusing this
stuff, and you *invented* the rules <wink>.  Random extension authors are
going to have a much harder time of it -- as far as they're concerned,
PyMem_{DEL, FREE, Del, Free} are all just ways to spell "platform free(),
but I'm not supposed to call free() directly for some reason I don't
understand -- I think it might have had something to do with DLLs on
Windows".

> This is part of a "hook" API that allows 3rd parties to provide their
> own alternative to PyOS_Readline.  This was put in at the request of
> some folks at LLNL who were providing their own GUI that fed into
> Python and who had some problems with sending it to stdin.  I don't
> think anybody else has used this.  There is not a single mention of
> PyOS_Readline in the entire set of Python documentation.

Well, neither is there any mention of possibly abusive functions in hundreds
of extension modules we've never heard of.

> Given the alternatives:
>
> 1. introduce new APIs PyMalloc_{New,Del}Object and tell all extension
>    writers that they have to changes their extensions once again to
>    use these brand new APIs if they want to benefit from pymalloc; or
>
> 2. fix the issues with PyOS_Readline, make PyMem_{DEL,FREE,Del,Free}
>    synonyms for Tim's clever PyMem_NukeIt, and continue to promote
>    using PyObject_{New,Del} for use by extension writers;
>
> I'm all for #2.

You're presenting #1 as user-hostile and #2 as user-friendly.  But if part
of #2 is also saying that it's now definitely illegal to call PyMem_{Free,
FREE, Del, DEL} without holding the GIL, and horrible things may start to
happen in 2.3 if you're doing so, then it's also user-hostile in that
respect.  #1 is user-friendly in the "nothing breaks" sense.  I haven't
given up on combining the best of both, but I am getting close <wink>.