Compact Python library for math statistics

Tue Apr 6 14:38:06 EDT 2004

Gerrit <gerrit at nl.linux.org> wrote in message news:<mailman.267.1080904644.20120.python-list at python.org>...
> wrote:
> > I'm looking for a Python library for math statistics. This must be a cl
> ear set of general statistics functions like 'average', 'variance', 'cova
> riance' etc.
> 
> The next version of Python will have a 'statistics' module. It is
> probably usable in Python 2.3 as well. You can find it in CVS:
> 
> http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/nondist/sa
> ndbox/statistics/statistics.py
> 
> I'm not sure whether it's usable in current CVS, though. You may have to
> tweak it a little.

<SNIP>

It works for me, at least the mean function. A statistics module will
be nice to have, although it is easy to write your own.

Here is a minor suggestion. The functions 'mean' and 'variance' are
separate, and the latter function requires a mean to be calculated. To
save CPU time, it would be nice to have a single function that returns
both the mean and variance, or a function to compute the variance with
a known mean.

Ideally there would be a function such as 

def stats(x,ss)

where ss contains a list of statistics to be computed and the function
returns a list of the same size. If you called it with

y = stats(x,["mean","variance"])

the function would compute the mean and variance efficiently.

Other comments:
(1) In computing the median, there is a line of code

    return (select(data, n//2) + select(data, n//2-1)) / 2

I think finding the 500th and 501st elements separately out of a 1000
element array is inefficient. Isn't there a way to get consecutive
ordered elements in about the same time needed to get a single
element?

(2) The following code crashes when median(x) is computed. Why?

from statistics import mean,median
x = [1.0,2.0,3.0,4.0]
print mean(x)
print median(x)

(3) The standard deviation is computed as 

    return variance(data, sample) ** 0.5

I think the sqrt function should be used instead -- this may be
implemented more efficiently than general exponentiation.