[Python-Dev] Incorrect length of collections.Counter objects / Multiplicity function

Gustavo Narea me at gustavonarea.net
Wed May 19 00:00:20 CEST 2010


Hello, everyone.

I've checked the new collections.Counter class and I think I've found a bug:

> >>> from collections import Counter
> >>> c1 = Counter([1, 2, 1, 3, 2])
> >>> c2 = Counter([1, 1, 2, 2, 3])
> >>> c3 = Counter([1, 1, 2, 3])
> >>> c1 == c2 and c3 not in (c1, c2)
> True
> >>> # Perfect, so far. But... There's always a "but":
> ...
> >>> len(c1)
> 3

The length of a Counter is the amount of unique elements. But the length must 
be the cardinality, and the cardinality of a multiset is the total number of 
elements (including duplicates) [1] [2]. The source code mentions that the 
recipe on ActiveState [3] was one of the references, but that recipe has this 
right.

Also, why is it indexed? The indexes of a multiset call to mind the position 
of its elements, but there's no such thing in sets. I think this is 
inconsistent with the built-in set. I would have implemented the multiplicity 
function as a method instead of the indexes:
    c1.get_multiplicity(element)
    # instead of
    c1[element]

Is this the intended behavior? If so, I'd like to propose a proper multiset 
implementation for the standard library (preferably called "Multiset"; should 
I create a PEP?). If not, I can write a patch to fix it, although I'm afraid 
it'd be a backwards incompatible change.

Cheers,

[1] http://en.wikipedia.org/wiki/Multiset#Overview
[2] http://preview.tinyurl.com/smalltalk-bag
[3] http://code.activestate.com/recipes/259174/
-- 
Gustavo Narea <xri://=Gustavo>.
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |


More information about the Python-Dev mailing list