counting occurrences
Quinn Dunkan
quinn at retch.ugcs.caltech.edu
Fri Aug 3 18:35:37 EDT 2001
On 3 Aug 2001 13:47:34 -0700, Grant Griffin <not.this at seebelow.org> wrote:
>This then usually leads to a need to "sort keys by value"--something like this:
>
> itemlist = [(value, key) for (key, value) in counter.items()]
> itemlist.sort()
> itemlist.reverse()
...
>For me, this sort of thing comes up so often that it qualifies as an "idiom".
So put it in a function.
>However, the sorting thing is a pain (at least a first--then you get so you
>like to impress people by having finally found a use for list
>comprehensions<wink>), so I mentioned to a friend that it would sure be nice
>if Python dictionaries had an "items_sorted_by_value" method. But, among
>other things, my friend (who has been using Python for a *long* time) said
>that he has never wanted such a thing, because he doesn't see counting
>occurrences "as a dictionary operation".
>
>So how do you think he does it?
Here's how I would probably do it:
def occurrences(pred, seq):
# (t -> Boolean) -> [t] -> [(Int, t)]
d = {}
for elt in seq:
if pred(elt):
d[elt] = d.get(elt, 0) + 1
a = [ (v, k) for (k, v) in d.items() ]
a.sort()
a.reverse()
return a
Use it like so:
from __future__ import nested_scopes
def count_word(w, fp):
punctuation = ',."\';:-/()[]{}'
kill_punctuation = string.maketrans(punctuation, ' ' * len(punctuation))
def word_in_line(s):
return s in s.translate(kill_punctuation).split()
return occurrences(word_in_line, fp.readlines())
Note that generators can make generating the 'seq' argument easier.
This is probably a more functional approach. An object-oriented approach
would be to make a "counter" object which wraps your dict and presents a
'item(e)' to count a new item, 'occurrences(e)' to show how many times 'e' has
been item()ed, 'historgram()' to give a list of frequencies, etc.
More information about the Python-list
mailing list