PEP 450 Adding a statistics module to Python
duncan smith
buzzard at invalid.invalid
Sun Aug 11 11:44:25 EDT 2013
On 11/08/13 15:02, Roy Smith wrote:
> In article <mailman.479.1376221844.1251.python-list at python.org>,
> Skip Montanaro <skip at pobox.com> wrote:
>
>>> See the Rationale of PEP 450 for more reasons why âinstall NumPyâ is not
>>> a feasible solution for many use cases, and why having âstatisticsâ as a
>>> pure-Python, standard-library package is desirable.
>>
>> I read that before posting but am not sure I agree. I don't see the
>> screaming need for this package. Why can't it continue to live on
>> PyPI, where, once again, it is available as "pip install ..."?
>
> My previous comments on this topic were along the lines of "installing
> numpy is a non-starter if all you need are simple mean/std-dev". You
> do, however, make a good point here. Running "pip install statistics"
> is a much lower barrier to entry than getting numpy going, especially if
> statistics is pure python and thus has no dependencies on compiler tool
> chains which may be missing.
>
> Still, I see two classes of function in PEP-450. Class 1 is the really
> basic stuff:
>
> * mean
> * std-dev
>
> Class 2 are the more complicated things like:
>
> * linear regression
> * median
> * mode
> * functions for calculating the probability of random variables
> from the normal, t, chi-squared, and F distributions
> * inference on the mean
> * anything that differentiates between population and sample
>
> I could see leaving class 2 stuff in an optional pure-python module to
> be installed by pip, but for (as the PEP phrases it), the simplest and
> most obvious statistical functions (into which I lump mean and std-dev),
> having them in the standard library would be a big win.
>
I would probably move other descriptive statistics (median, mode,
correlation, ...) into Class 1.
I roll my own statistical tests as I need them - simply to avoid having
a dependency on R. But I generally do end up with a dependency on scipy
because I need scipy.stats.distributions. So I guess a distinct library
for probability distributions would be handy - but maybe it should not
be in the standard library.
Once we move on to statistical modelling (e.g. linear regression) I
think the case for inclusion in the standard library becomes weaker
still. Cheers.
Duncan
More information about the Python-list
mailing list