[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets
Raymond Hettinger
report at bugs.python.org
Mon Feb 25 20:58:06 EST 2019
Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:
> If others agree that it is sufficiently easy, we can assign
> the task to Cheryl.
It's only easy if we clearly specify what we want to occur. Deciding what the right behavior should be is not a beginner skill.
Proposed spec:
'''
Modify the API statistics.mode to handle multimodal cases so that the first mode encountered is the one returned. If the input is empty, raise a StatisticsError.
TestCases:
mode([]) --> StatisticsError
mode('aabbbcc') --> 'c'
mode(iter('aabbbcc')) --> 'c'
mode('eeffddddggaaaa') --> 'a'
Implementation:
* Discard the internal _counts function.
* Instead use Counter(data).most_common(1)[0][0]
because that makes only a single pass over the data
Documentation:
* Update statistics.rst and include a versionchanged directive
* In the Whatsnew compatibility section, note this is a behavior change.
Code that used to raise StatisticsError will now return a useful value.
Note that the rationale for the change is that the current mode()
behavior would unexpectedly fail when given multimodal data.
When:
* We want this for 3.8 so it can't wait very long
'''
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35892>
_______________________________________
More information about the Python-bugs-list
mailing list