[SciPy-dev] Standard deviations

Tue Nov 29 15:50:00 EST 2005

Ed Schofield wrote:

>Hi all,
>
>I have three questions related to standard deviations and variances in 
>scipy.
>
>First, can someone explain the behaviour of array.std() without any 
>arguments?
>
> >>> a = arange(30).reshape(3,10)
> >>> a
>array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
>       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
>       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
> >>> a.std()
>array([ 2.99856287,  2.85723522,  2.74647109,  2.67007684,  2.63104804,
>        2.63104804,  2.67007684,  2.74647109,  2.85723522,  2.99856287])
>
>I don't understand what these numbers represent.  The correct standard 
>deviations of the column vectors are given by:
>
> >>> a.std(0)
>array([ 10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.])
>
>and the standard deviations of the row vectors are:
>
> >>> a.std(1)
>array([ 3.02765035,  3.02765035,  3.02765035])
>
>I would have expected a.std() to give the same output as
> >>> a.ravel().std()
>8.8034084308295046
>
>which is what a.mean() does.
>  
>

This is a bug.  Thanks for finding it.  I'll look into it.

>
>
>Second, I'd like to point out that some of the functions in Lib/stats/ 
>have a different convention to scipy core about whether operations are 
>performed row-wise or column-wise, and whether anyone would object to my 
>changing the stats functions to operate column-wise.  At the moment we 
>get this:
>
> >>> average(a)
>array([ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.])
>
>which is column-wise, but
>
> >>> std(a)
>array([ 3.02765035,  3.02765035,  3.02765035])
>
>which is row-wise.  I presume the default behaviour of std() and friends 
>is just a historical relic.  If so we'd be wise to get this straight 
>well before a 1.0 release.
>  
>
Good catch.  It would be nice to have things as consistent as possible.  
Feel free to make consistency changes --- especially in stats.py  which 
is still messy.

>Third, I'd like to request that we add an array.var() method to scipy 
>core to compute an array's sample variance.
>
>At the moment it seems that there is no way to compute the sample 
>variance of an array of numbers without installing the full scipy.  
>Users needing to do this will either have to roll their own function in 
>Python, like this:
>
>def var(A):
>    m = len(A)
>    return average((a-means)**2) * (m/(m-1.))
>
>or square the output of std().  Both are less efficient than a native 
>array.var() would be, requiring extra memory copying and, in the second 
>case, squaring the result of a square root operation, which also 
>introduces numerical imprecision.
>
>The extra code required is minimal.  There's an example patch below, 
>which works fine except that it inherits the weirdness of std().
>  
>
I'm O.K. with this.  Anybody else see a problem?

-Travis