[SciPy-dev] PEP: Improving the basic statistical functions in Scipy

Bruce Southey bsouthey at gmail.com
Fri Feb 27 12:42:20 EST 2009


josef.pktd at gmail.com wrote:
[snip]
> What I would like to do, but didn't have the time yet is to run the
> tests for stats.stats
> on stats.mstats. This way even if we would have some duplicate
> functions, we would
> have some cross check that they are consistent, and it would be a reminder for
> bug fixing also the other version.
>   
Okay, I do not know how to get timeit to work with numpy/scipy but this 
is not how I would like it to be. But I managed somehow to (unfairly) 
compare the geometric means function (gmean) using this code:
import timeit
stand_t=timeit.Timer('scipy.stats.stats.gmean(X, axis=xs)', 'import 
numpy, scipy.stats.stats; X=numpy.random.gamma(shape=2, scale=1, 
size=(1,10)); xs=None').timeit(1000)
masked_t=timeit.Timer('scipy.stats.mstats.gmean(X, axis=xs)', 'import 
numpy, scipy.stats.stats; X=numpy.random.gamma(shape=2, scale=1, 
size=(1,10)); xs=None').timeit(1000)
numpy_t=timeit.Timer('numpy.exp((numpy.log(X).mean()))', 'import numpy, 
numpy.random; X=numpy.random.gamma(shape=2, scale=1, 
size=(1,10))').timeit(1000)

I use Linux and Python 2.5 but my system is very buzy so perhaps not 
that fair for benchmarks.
numpy.__version__  '1.3.0.dev6338'
scipy.__version__ '0.8.0.dev5597'

There is a cost of using _chk_asarray in this case which decreases as 
the array size increases. (I am not sure that _chk_asarray is really 
needed anyhow.)
There is a huge cost for using masked array for small sizes but 
decreases as the array size increases.

For 1 by 10 array, the difference between masked and non masked versions 
was 0.13 seconds to do it 1000 times with the ratio of masked to non 
masked = 7.94
For 1 by 10000 array, the difference between masked and non masked 
versions was 0.07 seconds to do it 1000 times with the ratio of masked 
to non masked = 2.14

However, briefly looking at some of these functions, I think that 
numpy/scipy would naturally handle the array type as I know 
numpy.exp((numpy.log(X).mean())) this works whether X is the usual array 
or if it is a masked array. If so then there is no reason for different 
functions  unless we need to address masks.


Bruce






More information about the SciPy-Dev mailing list