[Numpy-discussion] Index Array Performance

Mon Feb 13 19:50:39 EST 2012

On Mon, Feb 13, 2012 at 7:48 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>> On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> How would you fix it? I shouldn't speculate without profiling, but I'll be
>>> naughty. Presumably the problem is that python turns that into something
>>> like
>>>
>>> hist[i,j] = hist[i,j] + 1
>>>
>>> which means there's no way for numpy to avoid creating a temporary array. So
>>> maybe this could be fixed by adding a fused __inplace_add__ protocol to the
>>> language (and similarly for all the other inplace operators), but that seems
>>> really unlikely. Fundamentally this is just the sort of optimization
>>> opportunity you miss when you don't have a compiler with a global view;
>>> Fortran or c++ expression templates will win every time. Maybe pypy will fix
>>> it someday.
>>>
>>> Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work?
>>>
>>> - N
>>
>> Nope, don't buy it:
>>
>> In [33]: timeit arr.__iadd__(1)
>> 1000 loops, best of 3: 1.13 ms per loop
>>
>> In [37]: timeit arr[:] += 1
>> 1000 loops, best of 3: 1.13 ms per loop
>>
>> - Wes
>
> Actually, apologies, I'm being silly (had too much coffee or
> something). Python may be doing something nefarious with the hist[i,j]
> += 1. So both a get, add, then set, which is probably the problem.
>
>>> On Feb 14, 2012 12:18 AM, "Wes McKinney" <wesmckinn at gmail.com> wrote:
>>>>
>>>> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver
>>>> <m.oliver at jacobs-university.de> wrote:
>>>> > Hi,
>>>> >
>>>> > I have a short piece of code where the use of an index array "feels
>>>> > right", but incurs a severe performance penalty: It's about an order
>>>> > of magnitude slower than all other operations with arrays of that
>>>> > size.
>>>> >
>>>> > It comes up in a piece of code which is doing a large number of "on
>>>> > the fly" histograms via
>>>> >
>>>> >  hist[i,j] += 1
>>>> >
>>>> > where i is an array with the bin index to be incremented and j is
>>>> > simply enumerating the histograms.  I attach a full short sample code
>>>> > below which shows how it's being used in context, and corresponding
>>>> > timeit output from the critical code section.
>>>> >
>>>> > Questions:
>>>> >
>>>> > - Is this a matter of principle, or due to an inefficient
>>>> >  implementation?
>>>> > - Is there an equivalent way of doing it which is fast?
>>>> >
>>>> > Regards,
>>>> > Marcel
>>>> >
>>>> > =================================================================
>>>> >
>>>> > #! /usr/bin/env python
>>>> > # Plot the bifurcation diagram of the logistic map
>>>> >
>>>> > from pylab import *
>>>> >
>>>> > Nx = 800
>>>> > Ny = 600
>>>> > I = 50000
>>>> >
>>>> > rmin = 2.5
>>>> > rmax = 4.0
>>>> > ymin = 0.0
>>>> > ymax = 1.0
>>>> >
>>>> > rr = linspace (rmin, rmax, Nx)
>>>> > x = 0.5*ones(rr.shape)
>>>> > hist = zeros((Ny+1,Nx), dtype=int)
>>>> > j = arange(Nx)
>>>> >
>>>> > dy = ymax/Ny
>>>> >
>>>> > def f(x):
>>>> >    return rr*x*(1.0-x)
>>>> >
>>>> > for n in xrange(1000):
>>>> >    x = f(x)
>>>> >
>>>> > for n in xrange(I):
>>>> >    x = f(x)
>>>> >    i = array(x/dy, dtype=int)
>>>> >    hist[i,j] += 1
>>>> >
>>>> > figure()
>>>> >
>>>> > imshow(hist,
>>>> >       cmap='binary',
>>>> >       origin='lower',
>>>> >       interpolation='nearest',
>>>> >       extent=(rmin,rmax,ymin,ymax),
>>>> >       norm=matplotlib.colors.LogNorm())
>>>> >
>>>> > xlabel ('$r$')
>>>> > ylabel ('$x$')
>>>> >
>>>> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$')
>>>> >
>>>> > show()
>>>> >
>>>> > ====================================================================
>>>> >
>>>> > In [4]: timeit y=f(x)
>>>> > 10000 loops, best of 3: 19.4 us per loop
>>>> >
>>>> > In [5]: timeit i = array(x/dy, dtype=int)
>>>> > 10000 loops, best of 3: 22 us per loop
>>>> >
>>>> > In [6]: timeit img[i,j] += 1
>>>> > 10000 loops, best of 3: 119 us per loop
>>>> > _______________________________________________
>>>> > NumPy-Discussion mailing list
>>>> > NumPy-Discussion at scipy.org
>>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>> This suggests to me that fancy indexing could be quite a bit faster in
>>>> this case:
>>>>
>>>> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop
>>>> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1)
>>>> 10000 loops, best of 3: 20.6 us per loop
>>>>
>>>> I wrote a simple Cython method
>>>>
>>>> def fancy_inc(ndarray[int64_t, ndim=2] values,
>>>>              ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc):
>>>>    cdef:
>>>>        Py_ssize_t i, n = len(iarr)
>>>>
>>>>    for i in range(n):
>>>>        values[iarr[i], jarr[i]] += inc
>>>>
>>>> that does even faster
>>>>
>>>> In [8]: timeit sbx.fancy_inc(hist, i, j, 1)
>>>> 100000 loops, best of 3: 4.85 us per loop
>>>>
>>>> About 10% faster if bounds checking and wraparound are disabled.
>>>>
>>>> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list?
>>>>
>>>> - Wes
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>

But:

In [40]: timeit hist[i, j]
10000 loops, best of 3: 32 us per loop

So that's roughly 7-8x slower than a simple Cython method, so I
sincerely hope it could be brought down to the sub 10 microsecond
level with a little bit of work.