[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Tue Apr 15 10:08:45 EDT 2014

On Tue, Apr 15, 2014 at 3:07 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Apr 15, 2014 at 12:06 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>>> Good news, though! python-dev is in favor of adding calloc() to the
>>> core allocation interfaces, which will let numpy join the party. See
>>> python-dev thread:
>>>   https://mail.python.org/pipermail/python-dev/2014-April/133985.html
>>>
>>> It would be especially nice if we could get this into 3.5, since it
>>> seems likely that lots of numpy users will be switching to 3.5 when it
>>> comes out, and having a good memory tracing infrastructure there
>>> waiting for them make it even more awesome.
>>>
>>> Anyone interested in picking this up?
>>>   http://bugs.python.org/issue21233
>>
>> Hi,
>> I think it would be a better idea to instead of API functions for one
>> different type of allocator we get access to use the python hooks
>> directly with whatever allocator we want to use.
>
> Unfortunately, that's not how the API works. The way that third-party
> tracers register a 'hook' is by providing a new implementation of
> malloc/free/etc. So there's no general way to say "please pretend to
> have done a malloc".
>
> I guess we could potentially request the addition of
> fake_malloc/fake_free functions.

Unfortunate, looking at the pep it seems either you have a custom
allocator or you have tracing but not both (unless you trace
yourself).
This seems like quite a limitation.
Maybe it would have been more flexible if instead python provided
three functions:

PyMem_RegisterAlloc(size);
PyMem_RegisterReAlloc(size);
PyMem_RegisterFree(size);
+ possibly nogil variantes
These functions call into registered tracing functions (registered
e.g. by tracemalloc.start()) or do nothing.

Our allocator (and pythons) then just always calls these functions and
continues doing its stuff.

>
>> This would allow as to for example use aligned memory allocators which
>> might be relevant for the new cpu instruction sets with up to 64 byte
>> wide registers
>
> I think we might have had this conversation before, but I don't
> remember how it went... did you have some explanation about how this
> could matter in principle? We have to write code to handle unaligned
> (or imperfectly aligned) arrays regardless, so aligned allocation
> doesn't affect maintainability. And regarding speed, I can't see how
> an extra instruction here or there could make a big difference on
> small arrays, since the effect should be overwhelmed by interpreter
> overhead and memory stalls (not much time for prefetch to come into
> play on small arrays), but OTOH large arrays are usually page-aligned
> in practice, and if not then any extra start-up overhead will be
> amortized out by their size.

yes we already had this conversation :)
if you have two or more arrays not aligned the same way you can only
align one of them via peeling, the others will always have to be
accessed unaligned.
But it probably does not matter anymore with newer cpus, I should
probably just throw out my old core2 where it does :)