Compact Python library for math statistics
Raymond Hettinger
python at rcn.com
Fri Apr 9 04:24:58 EDT 2004
> A statistics module will
> be nice to have, although it is easy to write your own.
>
> Here is a minor suggestion. The functions 'mean' and 'variance' are
> separate, and the latter function requires a mean to be calculated. To
> save CPU time, it would be nice to have a single function that returns
> both the mean and variance, or a function to compute the variance with
> a known mean.
Like you said, that is easy enough to write on your own. This
lightweight module is not meant to replace heavy-weights that already
exist outside of the core distribution.
The goals are to have a simple set of functions for daily use and for
these data reduction functions to work as well as possible with
generator expression (one-pass over the data whereever possibe).
> (1) In computing the median, there is a line of code
>
> return (select(data, n//2) + select(data, n//2-1)) / 2
>
> I think finding the 500th and 501st elements separately out of a 1000
> element array is inefficient. Isn't there a way to get consecutive
> ordered elements in about the same time needed to get a single
> element?
Select uses an O(n) algorithm, so they penalty is not that much.
Making it accomodate selecting a range would greatly complicate and
slow down the code. If you need the low, high, percentiles, then it
may be better to just sort the data.
> (2) The following code crashes when median(x) is computed. Why?
>
> from statistics import mean,median
> x = [1.0,2.0,3.0,4.0]
> print mean(x)
> print median(x)
Hmm, it works for me. What does your traceback look like?
> (3) The standard deviation is computed as
>
> return variance(data, sample) ** 0.5
>
> I think the sqrt function should be used instead -- this may be
> implemented more efficiently than general exponentiation.
The timings show otherwise:
C:\pydev>python timeit.py -r9 -n100000 -s "import math;
sqrt=math.sqrt" "sqrt(7.0)"
100000 loops, best of 9: 1.7 usec per loop
C:\pydev>python timeit.py -r9 -n100000 -s "7.0 ** 0.5"
100000 loops, best of 9: 0.237 usec per loop
Raymond Hettinger
More information about the Python-list
mailing list