[Numpy-discussion] allocated memory cache for numpy

Nathaniel Smith njs at pobox.com
Tue Feb 18 10:35:09 EST 2014


On 18 Feb 2014 10:21, "Julian Taylor" <jtaylor.debian at googlemail.com> wrote:
>
> On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith <njs at pobox.com> wrote:
> > On 17 Feb 2014 15:17, "Sturla Molden" <sturla.molden at gmail.com> wrote:
> >>
> >> Julian Taylor <jtaylor.debian at googlemail.com> wrote:
> >>
> >> > When an array is created it tries to get its memory from the cache
and
> >> > when its deallocated it returns it to the cache.
> >>
> ...
> >
> > Another optimization we should consider that might help a lot in the
same
> > situations where this would help: for code called from the cpython eval
> > loop, it's afaict possible to determine which inputs are temporaries by
> > checking their refcnt. In the second call to __add__ in '(a + b) + c',
the
> > temporary will have refcnt 1, while the other arrays will all have
refcnt
> >>1. In such cases (subject to various sanity checks on shape, dtype,
etc) we
> > could elide temporaries by reusing the input array for the output. The
risk
> > is that there may be some code out there that calls these operations
> > directly from C with non-temp arrays that nonetheless have refcnt 1,
but we
> > should at least investigate the feasibility. E.g. maybe we can do the
> > optimization for tp_add but not PyArray_Add.
> >
>
> this seems to be a really good idea, I experimented a bit and it
> solves the temporary problem for this types of arithmetic nicely.
> Its simple to implement, just change to inplace in
> array_{add,sub,mul,div} handlers for the python slots. Doing so does
> not fail numpy, scipy and pandas testsuite so it seems save.
> Performance wise, besides the simple page zeroing limited benchmarks
> (a+b+c), it also it brings the laplace out of place benchmark to the
> same speed as the inplace benchmark [0]. This is very nice as the
> inplace variant is significantly harder to read.

Sweet.

> Does anyone see any issue we might be overlooking in this refcount ==
> 1 optimization for the python api?
> I'll post a PR with the change shortly.

It occurs belatedly that Cython code like
  a = np.arange(10)
  b = np.arange(10)
  c = a + b
might end up calling tp_add with refcnt 1 arrays. Ditto for same with cdef
np.ndarray or cdef object added. We should check...

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140218/932e7ecd/attachment.html>


More information about the NumPy-Discussion mailing list