[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Tue Apr 15 09:07:25 EDT 2014

On Tue, Apr 15, 2014 at 12:06 PM, Julian Taylor
<jtaylor.debian at googlemail.com> wrote:
>> Good news, though! python-dev is in favor of adding calloc() to the
>> core allocation interfaces, which will let numpy join the party. See
>> python-dev thread:
>>   https://mail.python.org/pipermail/python-dev/2014-April/133985.html
>>
>> It would be especially nice if we could get this into 3.5, since it
>> seems likely that lots of numpy users will be switching to 3.5 when it
>> comes out, and having a good memory tracing infrastructure there
>> waiting for them make it even more awesome.
>>
>> Anyone interested in picking this up?
>>   http://bugs.python.org/issue21233
>
> Hi,
> I think it would be a better idea to instead of API functions for one
> different type of allocator we get access to use the python hooks
> directly with whatever allocator we want to use.

Unfortunately, that's not how the API works. The way that third-party
tracers register a 'hook' is by providing a new implementation of
malloc/free/etc. So there's no general way to say "please pretend to
have done a malloc".

I guess we could potentially request the addition of
fake_malloc/fake_free functions.

> This would allow as to for example use aligned memory allocators which
> might be relevant for the new cpu instruction sets with up to 64 byte
> wide registers

I think we might have had this conversation before, but I don't
remember how it went... did you have some explanation about how this
could matter in principle? We have to write code to handle unaligned
(or imperfectly aligned) arrays regardless, so aligned allocation
doesn't affect maintainability. And regarding speed, I can't see how
an extra instruction here or there could make a big difference on
small arrays, since the effect should be overwhelmed by interpreter
overhead and memory stalls (not much time for prefetch to come into
play on small arrays), but OTOH large arrays are usually page-aligned
in practice, and if not then any extra start-up overhead will be
amortized out by their size.

> (it would be great if someone with avx512 hardware
> could provide some benchmarks for unaligned, 16 byte aligned and 64
> byte aligned memory to judge if this is actually required)

That would be great, but given that the processors won't be in general
distribution until next year, do we have any idea of how to find such
a person? :-)

> On the other hand memory tracing is a debugging feature and you might
> not care about performance, so we could use the python allocators in
> debug mode and the aligned unhooked allocators in normal mode?

I don't think we can change which memory allocator is in use once we
have any live ndarray objects in memory (because eventually we will
need to free them, and we'll need to know which allocator was used for
which). But a major use case for this is stuff like [1]:

  %memit my_func()

and it would be really user-unfriendly to force a restart every time
you want to use %memit.

-n

[1] http://scikit-learn.org/stable/developers/performance.html#memory-usage-profiling
-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org