counting occurrences

Fri Aug 3 18:35:37 EDT 2001

On 3 Aug 2001 13:47:34 -0700, Grant Griffin <not.this at seebelow.org> wrote:
>This then usually leads to a need to "sort keys by value"--something like this:
>
>   itemlist = [(value, key) for (key, value) in counter.items()]
>   itemlist.sort()
>   itemlist.reverse()
...
>For me, this sort of thing comes up so often that it qualifies as an "idiom".

So put it in a function.

>However, the sorting thing is a pain (at least a first--then you get so you
>like to impress people by having finally found a use for list
>comprehensions<wink>), so I mentioned to a friend that it would sure be nice
>if Python dictionaries had an "items_sorted_by_value" method.  But, among
>other things, my friend (who has been using Python for a *long* time) said
>that he has never wanted such a thing, because he doesn't see counting
>occurrences "as a dictionary operation".
>
>So how do you think he does it?

Here's how I would probably do it:

def occurrences(pred, seq):
    # (t -> Boolean) -> [t] -> [(Int, t)]
    d = {}
    for elt in seq:
        if pred(elt):
            d[elt] = d.get(elt, 0) + 1
    a = [ (v, k) for (k, v) in d.items() ]
    a.sort()
    a.reverse()
    return a

Use it like so:

from __future__ import nested_scopes
def count_word(w, fp):
    punctuation = ',."\';:-/()[]{}'
    kill_punctuation = string.maketrans(punctuation, ' ' * len(punctuation))
    def word_in_line(s):
        return s in s.translate(kill_punctuation).split()
    return occurrences(word_in_line, fp.readlines())

Note that generators can make generating the 'seq' argument easier.

This is probably a more functional approach.  An object-oriented approach
would be to make a "counter" object which wraps your dict and presents a
'item(e)' to count a new item, 'occurrences(e)' to show how many times 'e' has
been item()ed, 'historgram()' to give a list of frequencies, etc.