[Numpy-discussion] Optimized sum of squares

Sun Oct 18 13:37:55 EDT 2009

On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux
> <gael.varoquaux at normalesup.org> wrote:
>> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote:
>>> Hi Gaël,
>>
>>> If you've got a 1D array/vector called "a", I think the normal idiom is
>>
>>> np.dot(a,a)
>>
>>> For the more general case, I think
>>> np.tensordot(a, a, axes=something_else)
>>> should do it, where you should be able to figure out something_else for
>>> your particular case.
>>
>> Ha, yes. Good point about the tensordot trick.
>>
>> Thank you
>>
>> Gaël
>
> I'm curious about this as I use ss, which is just np.sum(a*a, axis),
> in statsmodels and didn't much think about it.
>
> There is
>
> import numpy as np
> from scipy.stats import ss
>
> a = np.ones(5000)
>
> but
>
> timeit ss(a)
> 10000 loops, best of 3: 21.5 µs per loop
>
> timeit np.add.reduce(a*a)
> 100000 loops, best of 3: 15 µs per loop
>
> timeit np.dot(a,a)
> 100000 loops, best of 3: 5.38 µs per loop
>
> Do the number of loops matter in the timings and is dot always faster
> even without the blas dot?

David's reply once was that it depends on ATLAS and the version of lapack/blas.

I usually switched to using dot for 1d. Using tensordot looks to
complicated for me, to figure out the axes when I quickly want a sum of squares.

I never tried the timing of tensordot for 2d arrays, especially for
axis=0 for a
c ordered array. If it's faster, this could be useful to rewrite stats.ss.

I don't remember that np.add.reduce is much faster than np.sum. This might be
the additional call overhead from using another function in between.

Josef

>
> Skipper
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>