Extending collections.Counter with top_n() to return elements by rank

Bora bora at boramalper.org
Sun Nov 1 04:56:46 EST 2020


collections.Counter has most_common([n]) method which returns the most
common n elements of the counter, but in case of a tie the result is
unspecified --- whereas in practice the order of insertion breaks the
tie. For example:

    >>> Counter(["a","a","b","a","b","c","c","d"]).most_common(2)
    [('a', 3), ('b', 2)]

    >>> Counter(["a","a","c","a","b","b","c","d"]).most_common(2)
    [('a', 3), ('c', 2)]

In some cases (which I believe are not rare) you would like to break
the tie yourself or get the top elements by *rank*. Using our example:

    Rank	Elements
       0	{"a"}
       1	{"b", "c"}
       2	{"d"}

I propose a new method top_n(n) that returns the top elements in the
first n ranks. For example:

    >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(0)
    [('a', 3)]

    >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(1)
    [('a', 3), ('b', 2), ('c', 2)]

    >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(2)
    [('a', 3), ('b', 2), ('c', 2), ('d', 1)]

    >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(99)
    [('a', 3), ('b', 2), ('c', 2), ('d', 1)]

    >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(-1)
    []

Some points to discuss:

 * What the return type should be? A list of tuples like most_common()
   or List[Tuple[int, List[T]] that conveys the rank information too?
   Each tuple is a rank, whose first element is the frequency and
   second element is the list of elements. E.g. [(3, ['a']), (2, ['b',
   'c']), (1, ['d'])]
 * Rank starts at 0 or 1?
 * Shall negative numbers raise an exception or return an empty list
   like most_common()?

I would love to hear your opinion on this, and if there is interest, I
am happy to implement it too.

Regards,

Bora M. Alper
https://boramalper.org/




More information about the Python-list mailing list