[Numpy-discussion] allocated memory cache for numpy

Tue Feb 18 10:21:44 EST 2014

On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On 17 Feb 2014 15:17, "Sturla Molden" <sturla.molden at gmail.com> wrote:
>>
>> Julian Taylor <jtaylor.debian at googlemail.com> wrote:
>>
>> > When an array is created it tries to get its memory from the cache and
>> > when its deallocated it returns it to the cache.
>>
...
>
> Another optimization we should consider that might help a lot in the same
> situations where this would help: for code called from the cpython eval
> loop, it's afaict possible to determine which inputs are temporaries by
> checking their refcnt. In the second call to __add__ in '(a + b) + c', the
> temporary will have refcnt 1, while the other arrays will all have refcnt
>>1. In such cases (subject to various sanity checks on shape, dtype, etc) we
> could elide temporaries by reusing the input array for the output. The risk
> is that there may be some code out there that calls these operations
> directly from C with non-temp arrays that nonetheless have refcnt 1, but we
> should at least investigate the feasibility. E.g. maybe we can do the
> optimization for tp_add but not PyArray_Add.
>

this seems to be a really good idea, I experimented a bit and it
solves the temporary problem for this types of arithmetic nicely.
Its simple to implement, just change to inplace in
array_{add,sub,mul,div} handlers for the python slots. Doing so does
not fail numpy, scipy and pandas testsuite so it seems save.
Performance wise, besides the simple page zeroing limited benchmarks
(a+b+c), it also it brings the laplace out of place benchmark to the
same speed as the inplace benchmark [0]. This is very nice as the
inplace variant is significantly harder to read.

Does anyone see any issue we might be overlooking in this refcount ==
1 optimization for the python api?
I'll post a PR with the change shortly.

Regardless of this change, caching memory blocks might still be
worthwhile for fancy indexing and other operations which require
allocations.

[0] http://yarikoptic.github.io/numpy-vbench/vb_vb_app.html#laplace-normal