[Numpy-discussion] array.sum() slower than expected along some array axes?

Sat Feb 3 21:29:45 EST 2007

Charles R Harris wrote:
>
>
> On 2/3/07, *Stephen Simmons* <mail at stevesimmons.com 
> <mailto:mail at stevesimmons.com>> wrote:
>
>     Hi,
>
>     Does anyone know why there is an order of magnitude difference
>     in the speed of numpy's array.sum() function depending on the axis
>     of the matrix summed?
>
>     To see this, import numpy and create a big array with two rows:
>        >>> import numpy
>        >>> a = numpy.ones([2,1000000], 'f4')
>
>     Then using ipython's timeit function:
>                                                       Time (ms)
>        sum(a)                                           20
>        a.sum()                                           9     
>        a.sum(axis=1)                                     9
>        a.sum(axis=0)                                   159
>        numpy.dot(numpy.ones(a.shape[0], a.dtype), a)    15
>
>     This last one using a dot product is functionally equivalent
>     to a.sum(axis=0), suggesting that the slowdown is due to how
>     indexing is implemented in array.sum().
>
>
> In this case it is expected. There are inner and outer loops, in the 
> slow case the inner loop with its extra code is called 1000000 times, 
> in the fast case, twice. On the other hand, note this:
>
> In [10]: timeit a[0,:] + a[1,:]
> 100 loops, best of 3: 19.7 ms per loop
>
>
> Which has only one loop. Caching could also be a problem, but in this 
> case it is dominated by loop overhead.
>
> Chuck
I agree that summing along the longer axis is most probably slower 
because it makes more passes through the inner loop.

The question though is whether all of the inner loop's overhead is 
necessary.
My counterexample using numpy.dot() suggests there's considerable scope 
for improvement, at least for certain common cases.