[Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies

eliaskoromilas elias.koromilas at gmail.com
Thu May 13 12:06:02 EDT 2021


Eric Wieser wrote
>> Yes, sorry, had been a while since I had looked it up:
>>
>> https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx
> 
> That `PyMemAllocatorEx` looks almost exactly like one of the two variants
> I
> was proposing. Is there a reason for wanting to define our own structure
> vs
> just using that one?
> I think the NEP should at least offer a brief comparison to that
> structure,
> even if we ultimately end up not using it.
> 
>> I have to say it feels a bit
>> like exposing things publicly, that are really mainly used internally,
>> but not sure...  Presumably Python uses the `ctx` for something though.
> 
> I'd argue `ctx` / `baton` / `user_data` arguments are an essential part of
> any C callback API.
> I can't find any particularly good reference for this right now, but I
> have
> been bitten multiple times by C APIs that forget to add this argument.
> 
>>  If someone wants a different strategy (i.e. different alignment) they
> create a new policy
> 
> The crux of the problem here is that without very nasty hacks, C and C++
> do
> not allow new functions to be created at runtime.
> This makes it very awkward to write a parameterizable allocator. If you
> want to create two aligned allocators with different alignments, and you
> don't have a `ctx` argument to plumb through that alignment information,
> you're forced to write the entire thing twice.

The `PyMemAllocatorEx` memory API will allow (lambda) closure-like
definition of the data mem routines. That's the main idea behind the `ctx`
thing, it's huge and will enable every allocation scenario.

In my opinion, the rest of the proposals (PyObjects, PyCapsules, etc.) are
secondary and could be considered out-of-scope. I would suggest to let
people use this before hiding it behind a strict API.

Let me also give you an insight of how we plan to do it, since we are the
first to integrate this in production code. Considering this NEP as a
primitive API, I developed a new project to address our requirements:

1. Provide a Python-native way to define a new numpy allocator
2. Accept data mem routine symbols (function pointers) from open dynamic
libraries
3. Allow local-scoped allocation, e.g. inside a `with` statement

But since there was not much fun in these, I thought it would be nice if we
could exploit `ctypes` callback functions, to allow developers hook into
such routines natively (e.g. for debugging/monitoring), or even write them
entirely in Python (of course there has to be an underlying memory
allocation API).

For example, the idea is to be able to define a page-aligned allocator in
~30 lines of Python code, like that:

https://github.com/inaccel/numpy-allocator/blob/master/test/aligned_allocator.py

---

While experimenting with this project I spotted the two following issues:

1. Thread-locality
My biggest concern is the global scope of the numpy `current_allocator`
variable. Currently, an allocator change is applied globally affecting every
thread. This behavior breaks the local-scoped allocation promise of my
project. Imagine for example the implications of allocating pinned
(page-locked) memory (since you mention this use-case a lot) for random
glue-code ndarrays in background threads.

2. Allocator context (already discussed)
I found a bug, when I tried to use a Python callback (`ctypes.CFUNCTION`)
for the `PyDataMem_FreeFunc` routine. Since there are cases in which the
`free` routine is invoked after a PyErr has occurred (to clean up internal
arrays for example), `ctypes` messes with the exception state badly. This
problem can be resolved with the the use of a `ctx` (allocator context) that
will allow the routines to run clean of errors, wrapping them like that:

```
static void wrapped_free(void *ptr, size_t size, void *ctx) {
	PyObject *type;
	PyObject *value;
	PyObject *traceback;
	PyErr_Fetch(&type, &value, &traceback);
	((PyDataMem_Context *) ctx)->free(ptr, size);
	PyErr_Restore(type, value, traceback);
}
```

Note: This bug doesn't affect `CDLL` members (CFuncPtr objects), since they
are pure `dlsym` pointers.

Of course, this is a simple case of how a `ctx` could be useful for an
allocation policy. I guess people can become very creative with this in
general.

Elias




--
Sent from: http://numpy-discussion.10968.n7.nabble.com/


More information about the NumPy-Discussion mailing list