[Python-Dev] RFC: PEP 445: Add new APIs to customize Python memory allocators

Wed Jun 19 17:24:21 CEST 2013

2013/6/19 Antoine Pitrou <solipsis at pitrou.net>:
> Le Tue, 18 Jun 2013 22:40:49 +0200,
> Victor Stinner <victor.stinner at gmail.com> a écrit :
>>
>> Other changes
>> -------------
>>
> [...]
>>
>> * Configure external libraries like zlib or OpenSSL to allocate memory
>>   using ``PyMem_RawMalloc()``
>
> Why so, and is it done by default?

(Oh, I realized that PyMem_Malloc() may be used instead of
PyMem_RawMalloc() if we are sure that the library will only be used
when the GIL is held.)

"is it done by default?"

First, it would be safer to only reuse PyMem_RawMalloc() allocator if
PyMem_SetRawMalloc() was called. Just to avoid regressions in Python
3.4.

Then, it depends on the library: if the allocator can be replaced for
one library object (ex: expat supports this), it can always be
replaced. Otherwise, we should only replace the library allocator if
Python is a standalone program (don't replace the library allocator if
Python is embedded). That's why I asked if it is possible to check if
Python is embedded or not.

"Why so,"

For the "track memory usage" use case, it is important to track memory
allocated in external libraries to have accurate reports, because
these allocations may be huge.

>> Only one get/set function for block allocators
>> ----------------------------------------------
>>
>> Replace the 6 functions:
>>
>> * ``void PyMem_GetRawAllocator(PyMemBlockAllocator *allocator)``
>> * ``void PyMem_GetAllocator(PyMemBlockAllocator *allocator)``
>> * ``void PyObject_GetAllocator(PyMemBlockAllocator *allocator)``
>> * ``void PyMem_SetRawAllocator(PyMemBlockAllocator *allocator)``
>> * ``void PyMem_SetAllocator(PyMemBlockAllocator *allocator)``
>> * ``void PyObject_SetAllocator(PyMemBlockAllocator *allocator)``
>>
>> with 2 functions with an additional *domain* argument:
>>
>> * ``int PyMem_GetBlockAllocator(int domain, PyMemBlockAllocator
>> *allocator)``
>> * ``int PyMem_SetBlockAllocator(int domain, PyMemBlockAllocator
>> *allocator)``
>
> I would much prefer this solution.

I don't have a strong preference between these two choices.

Oh, one argument in favor of one generic function is that code using
these functions would be simpler. Extract of the unit test of the
implementation (_testcapi.c):

+    if (api == 'o')
+        PyObject_SetAllocator(&hook.alloc);
+    else if (api == 'r')
+        PyMem_SetRawAllocator(&hook.alloc);
+    else
+        PyMem_SetAllocator(&hook.alloc);

With a generic function, this block can be replace with one unique
function call.

>> Drawback: the caller has to check if the result is 0, or handle the
>> error.
>
> Or you can just call Py_FatalError() if the domain is invalid.

I don't like Py_FatalError(), especially when Python is embedded. It's
safer to return -1 and expect the caller to check for the error case.

>> If an hook is used to the track memory usage, the ``malloc()`` memory
>> will not be seen. Remaining ``malloc()`` may allocate a lot of memory
>> and so would be missed in reports.
>
> A lot of memory? In main()?

Not in main(). The Python expat and zlib modules call directly
malloc() and may allocate large blocks. External libraries like
OpenSSL or bz2 may also allocate large blocks.

See issues #18203 and #18227.

Victor