[Python-ideas] Pre-PEP: adding a statistics module to Python

Stephen J. Turnbull stephen at xemacs.org
Tue Aug 6 11:02:06 CEST 2013


Oscar Benjamin writes:

 > >> It's also not common AFAIK in other statistical packages
 > >> (at least not under the name mode).
 > >
 > > Press et al claim it is poorly known, but much better than the
 > > binning method. It saddens me that twenty years on, it's still
 > > poorly known.

In what sense is it "better" than the binning method?  If you're
working with tax data or subsidy data, your bins will be given to you
(the brackets).  Similarly for geographical data (political
boundaries), and so on.  I've almost never found choice of bins to be
a problem (but my use cases are such that either the bins are given or
they don't much matter because there's enough data to approximate a
density graphically).

Does it properly identify multiple modes (preferably including lower
peaks), or does it involve a single-peakedness assumption?

 > My preference really is just that modes() returns a list of all
 > modes and the user should decide what to do with however many
 > values they get back.

+1

I might be useful to have helper functions or methods to make common
selections.

 > The other thing is about this idea that if all values are equally
 > common then their is "no mode". I want to say that every value is a
 > mode rathern than none.

+1

One of the things that I teach my students is that the mode (and
median) always exist, but in some distributions they're not very
informative.  I'd be disappointed if that teaching were falsified in
Python's stdlib.

I also hate edge cases like this:

 > Otherwise you get strange differences between
 > e.g.: [2,2,3,3,4,4] and [1,2,2,3,3,4,4]. I've checked on the interweb
 > though and it seems that most people disagree with me on this point so
 > never mind!

In fact, in my biased sample (math-averse, math-differently-abled MBA
students), invariably students start out by saying there is no mode
unless it's unique, are convinced of the existence of multimodalness
by examples involving physical dimensions of men and women when
aggregated as "human beings", and most look at examples like Oscar's
and are convinced that "there is no unique mode" and "every value is
modal" are the best ways to speak of these edge cases.


More information about the Python-ideas mailing list