PEP 450 Adding a statistics module to Python

Fri Aug 16 15:41:01 EDT 2013

On 16 August 2013 20:00,  <chris.barker at noaa.gov> wrote:
>  > > One other point -- for performance reason, is would be nice to have some compiled code in there -- this adds incentive to put it in the stdlib -- external packages that need compiling is what makes numpy unacceptable to some folks.
>>
>> It might be good to have a C accelerator one day but actually I think
>> the pure-Python-ness of it is a strong reason to have it since it
>> provides accurate statistics functions to all Python implementations
>> (unlike numpy) at no additional cost.
>
> Well, I'd rather not have a package that is great for education and  toy problems, but not-so-good for the real ones...

Again it depends what you mean by "real". From the other lists where
we meet I'd guess that your problems are in the "needs a nuclear
reactor" camp. I doubt that the stdlib will ever be sufficiently
mathematically/computationally oriented to fully service either of our
needs (and I don't mean that as a criticism). I persuaded the IT guys
at my work that we needed the whole Enthought Python Distribution on
all machines just because I didn't want to have to argue about
individual packages.

However in my real work, where I compute means and variances etc. I
very often do work with very small datasets and I know a lot of others
who work almost exclusively with them (think e.g. clinical data where
N is often less than 100).

> I guess my point is this:
>
> This is a way to make the standard python distribution better for some common computational tasks. But rather than think of it as "we need some stats functions in the python stdlib", perhaps we should be thinking: "out of the box python should be better for computation" -- in which case, I'd start with a decent array object.

I think that, whether or not the statistics module gains a C
accelerator, if a fast numerical array type comes along then I'd
expect that the statistics module would use its methods as a fast
path. And if it provides a speed boost without compromising
boundedness or accuracy I'm sure that the array type would be used
internally where appropriate (just as numpy converts collections to
arrays before computation).

Oscar