[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

Mon Feb 25 20:58:06 EST 2019

Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:

>  If others agree that it is sufficiently easy, we can assign 
> the task to Cheryl.

It's only easy if we clearly specify what we want to occur.  Deciding what the right behavior should be is not a beginner skill.

Proposed spec:
'''
Modify the API statistics.mode to handle multimodal cases so that the first mode encountered is the one returned.  If the input is empty, raise a StatisticsError.

TestCases:
    mode([])   --> StatisticsError
    mode('aabbbcc') --> 'c'
    mode(iter('aabbbcc')) --> 'c'
    mode('eeffddddggaaaa') --> 'a'

Implementation:
    * Discard the internal _counts function.
    * Instead use Counter(data).most_common(1)[0][0]
      because that makes only a single pass over the data

Documentation:
    * Update statistics.rst and include a versionchanged directive

    * In the Whatsnew compatibility section, note this is a behavior change.
      Code that used to raise StatisticsError will now return a useful value.
      Note that the rationale for the change is that the current mode()
      behavior would unexpectedly fail when given multimodal data.

When: 
    * We want this for 3.8 so it can't wait very long
'''

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35892>
_______________________________________