[Python-ideas] Pre-PEP: adding a statistics module to Python

Wed Aug 7 06:25:12 CEST 2013

Steven D'Aprano writes:

 > >   >       Consequently, the above naive mean fails this
 > >   >       "torture test" with an error of 100%:
 > >   >
 > >   >           assert mean([1e30, 1, 3, -1e30]) == 1
 > >
 > > 100%?  This is a relative error of sqrt(2)*1e-30.
 > 
 > I don't understand your calculation here. Where are you getting the
 > values 2 and 1e-30 from?

The standard deviation of the example data.

Your calculation of relative error is statistically irrelevant, unless
you can assert 30 decimal places of accuracy in the measurements 1e30
and -1e30.  If you just have data and no theory about where it came
from, the relevant unit is the standard deviation.

 > > I also wonder about the utility of a "statistics" package that has no
 > > functionality for presenting and operating on the most fundamental
 > > "statistic" of all: the (empirical) distribution.
 > 
 > It's early days, and it is better to start the module small and
 > grow it than to try to fit everything and the kitchen sink in from Day
 > One.

OK.

 > I'm happy to discuss this further with you off-list.

Me too, although my implementation is way far from ready for prime
time, and the curriculum committee just nuked that whole course so I
have no interest in fixing it independent of this discussion.  But
I'll see what resources I can scrape up if the implementation is of
interest.

Other interested parties, feel free to contact me for addition to the
CC list.

Steve