[SciPy-dev] Standard deviations
Travis Oliphant
oliphant at ee.byu.edu
Tue Nov 29 15:50:00 EST 2005
Ed Schofield wrote:
>Hi all,
>
>I have three questions related to standard deviations and variances in
>scipy.
>
>First, can someone explain the behaviour of array.std() without any
>arguments?
>
> >>> a = arange(30).reshape(3,10)
> >>> a
>array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
> [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
> [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
> >>> a.std()
>array([ 2.99856287, 2.85723522, 2.74647109, 2.67007684, 2.63104804,
> 2.63104804, 2.67007684, 2.74647109, 2.85723522, 2.99856287])
>
>I don't understand what these numbers represent. The correct standard
>deviations of the column vectors are given by:
>
> >>> a.std(0)
>array([ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.])
>
>and the standard deviations of the row vectors are:
>
> >>> a.std(1)
>array([ 3.02765035, 3.02765035, 3.02765035])
>
>I would have expected a.std() to give the same output as
> >>> a.ravel().std()
>8.8034084308295046
>
>which is what a.mean() does.
>
>
This is a bug. Thanks for finding it. I'll look into it.
>
>
>Second, I'd like to point out that some of the functions in Lib/stats/
>have a different convention to scipy core about whether operations are
>performed row-wise or column-wise, and whether anyone would object to my
>changing the stats functions to operate column-wise. At the moment we
>get this:
>
> >>> average(a)
>array([ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.])
>
>which is column-wise, but
>
> >>> std(a)
>array([ 3.02765035, 3.02765035, 3.02765035])
>
>which is row-wise. I presume the default behaviour of std() and friends
>is just a historical relic. If so we'd be wise to get this straight
>well before a 1.0 release.
>
>
Good catch. It would be nice to have things as consistent as possible.
Feel free to make consistency changes --- especially in stats.py which
is still messy.
>Third, I'd like to request that we add an array.var() method to scipy
>core to compute an array's sample variance.
>
>At the moment it seems that there is no way to compute the sample
>variance of an array of numbers without installing the full scipy.
>Users needing to do this will either have to roll their own function in
>Python, like this:
>
>def var(A):
> m = len(A)
> return average((a-means)**2) * (m/(m-1.))
>
>or square the output of std(). Both are less efficient than a native
>array.var() would be, requiring extra memory copying and, in the second
>case, squaring the result of a square root operation, which also
>introduces numerical imprecision.
>
>The extra code required is minimal. There's an example patch below,
>which works fine except that it inherits the weirdness of std().
>
>
I'm O.K. with this. Anybody else see a problem?
-Travis
More information about the SciPy-Dev
mailing list