[Python-ideas] Pre-PEP: adding a statistics module to Python

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Aug 6 14:58:02 CEST 2013


On 6 August 2013 10:02, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Oscar Benjamin writes:
>
>  > >> It's also not common AFAIK in other statistical packages
>  > >> (at least not under the name mode).
>  > >
>  > > Press et al claim it is poorly known, but much better than the
>  > > binning method. It saddens me that twenty years on, it's still
>  > > poorly known.
>
> In what sense is it "better" than the binning method?

The book that Steven is referencing is written (primarily) for the
benefit of scientists. I think it is expected that the if you're
trying to estimate the mode of a continuously distributed quantity
then it is because, say, you have experimental data from a skewed
distribution. I'm not sure though as I've just borrowed a 1999 edition
(in C) from a colleague's desk and this particular method/algorithm
isn't included (it doesn't give any method to compute the mode).

> If you're
> working with tax data or subsidy data, your bins will be given to you
> (the brackets).

That's a good point. It would be useful if a mode function could use
the appropriate bins where they are predetermined. Of course you can
bin them yourself and call modes(). Scipy/Matlab etc. provide the
bin-counting functionality separately under hist or histogram rather
than mode.

> Similarly for geographical data (political
> boundaries), and so on.

It's definitely your job to bin those!

> I've almost never found choice of bins to be
> a problem (but my use cases are such that either the bins are given or
> they don't much matter because there's enough data to approximate a
> density graphically).
>
> Does it properly identify multiple modes (preferably including lower
> peaks), or does it involve a single-peakedness assumption?

It doesn't assume single-peakedness. There are a couple of strategies
for identifying possible additional modes after finding the first (see
mode.extract).

>  > My preference really is just that modes() returns a list of all
>  > modes and the user should decide what to do with however many
>  > values they get back.
>
> +1
>
> I might be useful to have helper functions or methods to make common
> selections.

Perhaps modes() could return all modes and mode() could return 1 if
there's exactly 1 or otherwise raise an error.


Oscar


More information about the Python-ideas mailing list