[SciPy-Dev] Slow moment function in scipy.stats
stefan
stefan.peterson at rubico.com
Thu Feb 19 06:55:26 EST 2015
Hello list,
First time poster here. Anyway, some time ago I noticed that the scipy
skewness function was a major bottleneck in an algorithm of mine. Back
then, I typed up my own replacement and thought no more about it. Today,
for some unknown reason, I decided to dig a little deeper in this and
found the major culprit to be the way moments are computed, specifically
the use of np.power.
Testing an alternative approach (missing some safeties, obviously) in
IPython:
In [1]: import numpy as np
In [2]: import scipy.stats as spstat
In [3]: def moment(x, mom=1, axis=0):
...: if mom == 1:
...: return np.float64(0.0)
...: else:
...: x_zero_mean = x - np.expand_dims(np.mean(x, axis), axis)
...: x_zero_mean_2 = x_zero_mean**2
...: s = x_zero_mean_2.copy()
...: for k in range(1, mom // 2):
...: s *= x_zero_mean_2
...: if mom % 2:
...: s *= x_zero_mean
...: return np.mean(s, axis)
...:
In [4]: a = np.random.randn(25,1e5)
In [5]: for ax in range(2):
...: for k in range(1, 8):
...: %timeit spstat.moment(a, k, ax)
...:
10000 loops, best of 3: 41.9 µs per loop
10 loops, best of 3: 44 ms per loop
1 loops, best of 3: 233 ms per loop
1 loops, best of 3: 229 ms per loop
1 loops, best of 3: 232 ms per loop
1 loops, best of 3: 232 ms per loop
1 loops, best of 3: 236 ms per loop
100000 loops, best of 3: 4.14 µs per loop
10 loops, best of 3: 43.3 ms per loop
1 loops, best of 3: 227 ms per loop
1 loops, best of 3: 225 ms per loop
1 loops, best of 3: 227 ms per loop
1 loops, best of 3: 232 ms per loop
1 loops, best of 3: 232 ms per loop
In [6]: for ax in range(2):
for k in range(1, 8):
%timeit moment(a, k, ax)
...:
1000000 loops, best of 3: 458 ns per loop
10 loops, best of 3: 21.7 ms per loop
10 loops, best of 3: 26 ms per loop
10 loops, best of 3: 25.9 ms per loop
10 loops, best of 3: 30.4 ms per loop
10 loops, best of 3: 30.3 ms per loop
10 loops, best of 3: 33.1 ms per loop
1000000 loops, best of 3: 463 ns per loop
10 loops, best of 3: 21.2 ms per loop
10 loops, best of 3: 25.5 ms per loop
10 loops, best of 3: 25.5 ms per loop
10 loops, best of 3: 30.1 ms per loop
10 loops, best of 3: 30.1 ms per loop
10 loops, best of 3: 33.2 ms per loop
In [7]: for ax in range(2):
for k in range(1, 8):
print(np.sum((spstat.moment(a, k, ax) - moment(a, k, ax))**2))
...:
0.0
0.0
6.87146461986e-28
1.49841810127e-26
6.84527222501e-26
1.26529038747e-24
1.35165136907e-23
0.0
0.0
1.34532977463e-33
3.94430452611e-30
1.5467173476e-31
1.95637504495e-28
2.48413854543e-29
So there are some rounding errors, but they're hardly alarming. Is there
another reason not to do it this way?
Best Regards,
Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150219/7f654509/attachment.html>
More information about the SciPy-Dev
mailing list