[Python-ideas] Pre-PEP: adding a statistics module to Python

Joshua Landau joshua at landau.ws
Mon Aug 5 00:51:36 CEST 2013


On 08/04/2013 05:51 AM, Eli Bendersky wrote:

> On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>
> Anyhow, "minimal" is a dangerous slope. With such a module in the
> stdlib, I'm 100% sure we'll get a constant stream of - please add just
> this function (from SciPy) - it's so useful to the "average person" -
> requests. This is unavoidable. And it will be difficult to judge at
> that point why certain funcitonality belongs or does not belong here.
> So over time we'll end up with a partial Greenspun, by containing an
> ad hoc, slow implementation of half of Numpy/SciPy.
>

I disagree. Has numpy made there an unreasonable number of additions to the
math module? Why would it be different for statistics modules?

On 4 August 2013 16:20, Ethan Furman <ethan at stoneleaf.us> wrote:

>  I thought the whole point of name spaces was to be able to have the same
>>> name mean different things in different contexts.  Surely no one expects
>>> to
>>> be able to use `webbrowser.open` or `gzip.open` anywhere `open` can be
>>> used.
>>>
>>
>> This is not a fair comparison. As a pop quiz, try to imagine the
>> difference between 'open' and 'gzip.open' - do you immediately come up
>> with the differences in their functionalities? Now, how about 'sum'
>> and 'statistics.sum'?
>>
>
> It's an absolutely fair comparison.  Different modules, same name.  Their
> functionalities?  No, I don't immediately come up with the differences,
> unless "gzip.open must have something to do with gzip files" counts.


I'd say it does count.


> Coincidentally, that's the same difference I immediately come up with for
> sum and statistics.sum -- "statistics.sum must have something do to with
> statistics"; and I would never think about it again unless I had a problem
> with statistics.


That's not really true -- statistics.sum is better named
statistics.precise_sum. It's not only useful when doing statistics.

I tend to think of it this way: the name should make it clear when you
should read the documentation. gzip.open is obvious; you should read it
when you work with gzip files. statistics.sum is not because *all* sums are
statistical sums. An accurate name á la "precise_sum" would make it obvious
that you should read the docs whenever you're doing sums that need
precision. The docs should quickly say that it deals with loss of precision
dealing with variations in orders of magnitude and other floating point
mischiefs (including sum([0.1]*10)).


On a third point, would it make sense for this to be maths.statistics? It'd
increase discoverability for exactly the target audience and it seems to
make sense to me. (We could, as a bonus, then easily deprecate math.fsum in
favour of math.statistics.sum).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130804/d0eba021/attachment.html>


More information about the Python-ideas mailing list