PEP 450 Adding a statistics module to Python

Sun Aug 11 11:44:25 EDT 2013

On 11/08/13 15:02, Roy Smith wrote:
> In article <mailman.479.1376221844.1251.python-list at python.org>,
>   Skip Montanaro <skip at pobox.com> wrote:
>
>>> See the Rationale of PEP 450 for more reasons why â€œinstall NumPyâ€ is not
>>> a feasible solution for many use cases, and why having â€˜statisticsâ€™ as a
>>> pure-Python, standard-library package is desirable.
>>
>> I read that before posting but am not sure I agree. I don't see the
>> screaming need for this package.  Why can't it continue to live on
>> PyPI, where, once again, it is available as "pip install ..."?
>
> My previous comments on this topic were along the lines of "installing
> numpy is a non-starter if all you need are simple mean/std-dev".  You
> do, however, make a good point here.  Running "pip install statistics"
> is a much lower barrier to entry than getting numpy going, especially if
> statistics is pure python and thus has no dependencies on compiler tool
> chains which may be missing.
>
> Still, I see two classes of function in PEP-450.  Class 1 is the really
> basic stuff:
>
> * mean
> * std-dev
>
> Class 2 are the more complicated things like:
>
> * linear regression
> * median
> * mode
> * functions for calculating the probability of random variables
>    from the normal, t, chi-squared, and F distributions
> * inference on the mean
> * anything that differentiates between population and sample
>
> I could see leaving class 2 stuff in an optional pure-python module to
> be installed by pip, but for (as the PEP phrases it), the simplest and
> most obvious statistical functions (into which I lump mean and std-dev),
> having them in the standard library would be a big win.
>

I would probably move other descriptive statistics (median, mode, 
correlation, ...) into Class 1.

I roll my own statistical tests as I need them - simply to avoid having 
a dependency on R. But I generally do end up with a dependency on scipy 
because I need scipy.stats.distributions. So I guess a distinct library 
for probability distributions would be handy - but maybe it should not 
be in the standard library.

Once we move on to statistical modelling (e.g. linear regression) I 
think the case for inclusion in the standard library becomes weaker 
still. Cheers.

Duncan