[Numpy-discussion] Index Array Performance

Mon Feb 13 19:48:12 EST 2012

On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> How would you fix it? I shouldn't speculate without profiling, but I'll be
>> naughty. Presumably the problem is that python turns that into something
>> like
>>
>> hist[i,j] = hist[i,j] + 1
>>
>> which means there's no way for numpy to avoid creating a temporary array. So
>> maybe this could be fixed by adding a fused __inplace_add__ protocol to the
>> language (and similarly for all the other inplace operators), but that seems
>> really unlikely. Fundamentally this is just the sort of optimization
>> opportunity you miss when you don't have a compiler with a global view;
>> Fortran or c++ expression templates will win every time. Maybe pypy will fix
>> it someday.
>>
>> Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work?
>>
>> - N
>
> Nope, don't buy it:
>
> In [33]: timeit arr.__iadd__(1)
> 1000 loops, best of 3: 1.13 ms per loop
>
> In [37]: timeit arr[:] += 1
> 1000 loops, best of 3: 1.13 ms per loop
>
> - Wes

Actually, apologies, I'm being silly (had too much coffee or
something). Python may be doing something nefarious with the hist[i,j]
+= 1. So both a get, add, then set, which is probably the problem.

>> On Feb 14, 2012 12:18 AM, "Wes McKinney" <wesmckinn at gmail.com> wrote:
>>>
>>> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver
>>> <m.oliver at jacobs-university.de> wrote:
>>> > Hi,
>>> >
>>> > I have a short piece of code where the use of an index array "feels
>>> > right", but incurs a severe performance penalty: It's about an order
>>> > of magnitude slower than all other operations with arrays of that
>>> > size.
>>> >
>>> > It comes up in a piece of code which is doing a large number of "on
>>> > the fly" histograms via
>>> >
>>> >  hist[i,j] += 1
>>> >
>>> > where i is an array with the bin index to be incremented and j is
>>> > simply enumerating the histograms.  I attach a full short sample code
>>> > below which shows how it's being used in context, and corresponding
>>> > timeit output from the critical code section.
>>> >
>>> > Questions:
>>> >
>>> > - Is this a matter of principle, or due to an inefficient
>>> >  implementation?
>>> > - Is there an equivalent way of doing it which is fast?
>>> >
>>> > Regards,
>>> > Marcel
>>> >
>>> > =================================================================
>>> >
>>> > #! /usr/bin/env python
>>> > # Plot the bifurcation diagram of the logistic map
>>> >
>>> > from pylab import *
>>> >
>>> > Nx = 800
>>> > Ny = 600
>>> > I = 50000
>>> >
>>> > rmin = 2.5
>>> > rmax = 4.0
>>> > ymin = 0.0
>>> > ymax = 1.0
>>> >
>>> > rr = linspace (rmin, rmax, Nx)
>>> > x = 0.5*ones(rr.shape)
>>> > hist = zeros((Ny+1,Nx), dtype=int)
>>> > j = arange(Nx)
>>> >
>>> > dy = ymax/Ny
>>> >
>>> > def f(x):
>>> >    return rr*x*(1.0-x)
>>> >
>>> > for n in xrange(1000):
>>> >    x = f(x)
>>> >
>>> > for n in xrange(I):
>>> >    x = f(x)
>>> >    i = array(x/dy, dtype=int)
>>> >    hist[i,j] += 1
>>> >
>>> > figure()
>>> >
>>> > imshow(hist,
>>> >       cmap='binary',
>>> >       origin='lower',
>>> >       interpolation='nearest',
>>> >       extent=(rmin,rmax,ymin,ymax),
>>> >       norm=matplotlib.colors.LogNorm())
>>> >
>>> > xlabel ('$r$')
>>> > ylabel ('$x$')
>>> >
>>> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$')
>>> >
>>> > show()
>>> >
>>> > ====================================================================
>>> >
>>> > In [4]: timeit y=f(x)
>>> > 10000 loops, best of 3: 19.4 us per loop
>>> >
>>> > In [5]: timeit i = array(x/dy, dtype=int)
>>> > 10000 loops, best of 3: 22 us per loop
>>> >
>>> > In [6]: timeit img[i,j] += 1
>>> > 10000 loops, best of 3: 119 us per loop
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>> This suggests to me that fancy indexing could be quite a bit faster in
>>> this case:
>>>
>>> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop
>>> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1)
>>> 10000 loops, best of 3: 20.6 us per loop
>>>
>>> I wrote a simple Cython method
>>>
>>> def fancy_inc(ndarray[int64_t, ndim=2] values,
>>>              ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc):
>>>    cdef:
>>>        Py_ssize_t i, n = len(iarr)
>>>
>>>    for i in range(n):
>>>        values[iarr[i], jarr[i]] += inc
>>>
>>> that does even faster
>>>
>>> In [8]: timeit sbx.fancy_inc(hist, i, j, 1)
>>> 100000 loops, best of 3: 4.85 us per loop
>>>
>>> About 10% faster if bounds checking and wraparound are disabled.
>>>
>>> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list?
>>>
>>> - Wes
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>