counting occurrences

Tim Peters tim.one at home.com
Sat Aug 4 01:18:38 EDT 2001


[Grant Griffin]
> I have long used dictionaries to counting occurrences of
> something (usually strings).  It goes something like this:
>
>     counter = {}
>     for s in strings:
>         counter[s] = counter.get(s, 0) + 1
> ...
> This then usually leads to a need to "sort keys by
> value"--something like this:
>
>    itemlist = [(value, key) for (key, value) in counter.items()]
>    itemlist.sort()
>    itemlist.reverse()
> ...
> Then, "itemlist" contains sorted (value, key) pairs--usually to
> be printed or something.
>
> For me, this sort of thing comes up so often that it qualifies as
> an "idiom".

Sure.

> However, the sorting thing is a pain ...
> so I mentioned to a friend that it would sure be nice if Python
> dictionaries had an "items_sorted_by_value" method.

Won't happen -- you can easily write a function for it yourself in Python,
and it's not going to run significantly faster if it's coded by us in C
instead.  Dicts have no internal ordering, so the best efficient thing
anyone can do here is materialize a list in full, sort it, and pull it apart
again.

Note that in Python 2.2, you can *subclass* from dicts, like here in
grantdict.py:

class Grant(dictionary):
    def items_by_value_backwards(self):
        items = [(v, k) for k, v in self.items()]
        items.sort()
        items.reverse()
        return [(k, v) for v, k in items]

g = Grant()
f = open("grantdict.py")
for line in f:
    for ch in line:
        g[ch] = g.get(ch, 0) + 1

for ch, count in g.items_by_value_backwards()[:10]:
    print "%6d %r" % (count, ch)

It counts the number of letters in itself, then prints the 10 most frequent:

    93 ' '
    20 'e'
    18 't'
    18 'r'
    18 'i'
    16 'n'
    15 's'
    15 '\n'
    12 'c'
    12 ')'

> But, among other things, my friend (who has been using Python for a
> *long* time) said that he has never wanted such a thing, because he
> doesn't see counting occurrences "as a dictionary operation".

Well, no sane person wants it built in <wink>, but of course dicts are
natural for this.

> So how do you think he does it?

Slowly and inappropriately.

> tim:-help-me-out-here<wink>-ly y'rs,

more-worried-about-your-friend-ly y'rs  - tim





More information about the Python-list mailing list