[issue25478] Consider adding a normalize() method to collections.Counter()

Sun Oct 25 22:24:40 EDT 2015

New submission from Raymond Hettinger:

Allen Downey suggested this at PyCon in Montreal and said it would be useful in his bayesian statistics courses.  Separately, Peter Norvig created a normalize() function in his probablity tutorial at In[45] in http://nbviewer.ipython.org/url/norvig.com/ipython/Probability.ipynb .

I'm creating this tracker item to record thoughts about the idea.  Right now, it isn't clear whether Counter is the right place to support this operation, how it should be designed, whether to use an in-place operation or an operation that creates a new counter, should it have rounding to make the result exactly equal to 1.0, should it use math.fsum() for float inputs?

Should it support other target totals besides 1.0?

  >>> Counter(red=11, green=5, blue=4).normalize(100) # percentage
  Counter(red=55, green=25, blue=20)

Also would it make sense to support something like this?

  sampled_gender_dist = Counter(male=405, female=421)
  world_gender_dist = Counter(male=0.51, female=0.50)
  cs = world_gender_dist.chi_squared(observed=sampled_gender_dist)

Would it be better to just have a general multiply-by-scalar operation for scaling?

  c = Counter(observations)
  c.scale_by(1.0 / sum(c.values())

Perhaps use an operator?

  c /= sum(c.values())

----------
assignee: rhettinger
components: Library (Lib)
messages: 253452
nosy: rhettinger
priority: low
severity: normal
status: open
title: Consider adding a normalize() method to collections.Counter()
type: enhancement
versions: Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25478>
_______________________________________