[Numpy-discussion] allocated memory cache for numpy

Sturla Molden sturla.molden at gmail.com
Tue Feb 18 10:58:45 EST 2014


I am cross-posting this to Cython user group to make sure they see this.
Sturla


Nathaniel Smith <njs at pobox.com> wrote:
> On 18 Feb 2014 10:21, "Julian Taylor" <jtaylor.debian at googlemail.com> wrote:
> 
> On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On 17 Feb 2014 15:17, "Sturla Molden" <sturla.molden at gmail.com> wrote:
> 
> Julian Taylor <jtaylor.debian at googlemail.com> wrote:
> 
> When an array is created it tries to get its memory from the cache
> 
> and
> 
> when its deallocated it returns it to the cache.
> 
> ...
> 
> Another optimization we should consider that might help a lot in the
> 
> same
> 
> situations where this would help: for code called from the cpython eval
> loop, it's afaict possible to determine which inputs are temporaries by
> checking their refcnt. In the second call to __add__ in '(a + b) + c',
> 
> the
> 
> temporary will have refcnt 1, while the other arrays will all have
> 
> refcnt
> 
> 1. In such cases (subject to various sanity checks on shape, dtype,
> 
> etc) we
> 
> could elide temporaries by reusing the input array for the output. The
> 
> risk
> 
> is that there may be some code out there that calls these operations
> directly from C with non-temp arrays that nonetheless have refcnt 1,
> 
> but we
> 
> should at least investigate the feasibility. E.g. maybe we can do the
> optimization for tp_add but not PyArray_Add.
> 
> this seems to be a really good idea, I experimented a bit and it solves
> the temporary problem for this types of arithmetic nicely. Its simple to
> implement, just change to inplace in array_{add,sub,mul,div} handlers for
> the python slots. Doing so does not fail numpy, scipy and pandas
> testsuite so it seems save. Performance wise, besides the simple page
> zeroing limited benchmarks (a+b+c), it also it brings the laplace out of
> place benchmark to the same speed as the inplace benchmark [0]. This is
> very nice as the inplace variant is significantly harder to read.
> 
> Sweet.
> 
> Does anyone see any issue we might be overlooking in this refcount == 1
> optimization for the python api? I'll post a PR with the change shortly.
> 
> It occurs belatedly that Cython code like   a = np.arange(10)
>   b = np.arange(10)
>   c = a + b might end up calling tp_add with refcnt 1 arrays. Ditto for
> same with cdef np.ndarray or cdef object added. We should check...
> 
> -n
> 
> _______________________________________________ NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org <a
> href="http://mail.scipy.org/mailman/listinfo/numpy-discussion">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a>




More information about the NumPy-Discussion mailing list