Learning Python via a little word frequency program

Wed Jan 9 06:43:37 EST 2008

Andrew Savige wrote:

> I'm learning Python by reading David Beazley's "Python Essential
> Reference" book and writing a few toy programs. To get a feel for hashes
> and sorting, I set myself this little problem today (not homework, BTW):
> 
>   Given a string containing a space-separated list of names:
> 
>     names = "freddy fred bill jock kevin andrew kevin kevin jock"
> 
>   produce a frequency table of names, sorted descending by frequency.
>   then ascending by name. For the above data, the output should be:
> 
>     kevin     : 3
>     jock      : 2
>     andrew    : 1
>     bill      : 1
>     fred      : 1
>     freddy    : 1
> 
> Here's my first attempt:
> 
> names = "freddy fred bill jock kevin andrew kevin kevin jock" freq = {}
> for name in names.split():
>     freq[name] = 1 + freq.get(name, 0)
> deco = zip([-x for x in freq.values()], freq.keys()) deco.sort() for v,
> k in deco:
>     print "%-10s: %d" % (k, -v)
> 
> I'm interested to learn how more experienced Python folks would solve
> this little problem. Though I've read about the DSU Python sorting
> idiom, I'm not sure I've strictly applied it above ... and the -x hack
> above to achieve a descending sort feels a bit odd to me, though I
> couldn't think of a better way to do it.

You can specify a reverse sort with

deco.sort(reverse=True)

Newer versions of Python have the whole idiom built in:

>>> items = freq.items()
>>> from operator import itemgetter
>>> items.sort(key=itemgetter(1), reverse=True) 
>>> for item in items:
...     print "%-10s: %d" % item
...
kevin     : 3
jock      : 2
bill      : 1
andrew    : 1
fred      : 1
freddy    : 1

You can pass an arbitrary function as key. itemgetter(1) is equivalent to

def key(item): return item[1]

> I also have a few specific questions. Instead of:
> 
> for name in names.split():
>     freq[name] = 1 + freq.get(name, 0)
> 
> I might try:
> 
> for name in names.split():
>     try:
>         freq[name] += 1
>     except KeyError:
>         freq[name] = 1
> 
> Which is preferred?

I have no strong opinion about that. Generally speaking try...except is
faster when you have many hits, i. e. the except suite is rarely invoked.
Starting with Python 2.5 you can alternatively use

from collections import defaultdict
freq = defaultdict(int)
for name in names.split():
    freq[name] += 1

> Ditto for:
> 
> deco = zip([-x for x in freq.values()], freq.keys())
> 
> versus:
> 
> deco = zip(map(operator.neg, freq.values()), freq.keys())

I think the list comprehension is slightly more readable.

> Finally, I might replace:
> 
> for v, k in deco:
>     print "%-10s: %d" % (k, -v)
> 
> with:
> 
> print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

Again, I find the explicit for loop more readable, but sometimes use the
genexp, too.

Peter