Sorting distionary by value
Jim Dennis
jimd at vega.starshine.org
Wed Mar 27 22:18:25 EST 2002
In article <slrna9m5sc.ai9.arturs at aph.waw.pdi.net>, Artur Skura wrote:
>Duncan Booth wrote:
>> Artur Skura <arturs at iidea.pl> wrote in
>> news:slrna9lqj1.9n1.arturs at aph.waw.pdi.net:
>>> Is there an idiom in Python as to sorting dictionary by value,
>>> not keys? I came up with some solutions which are so inefficient
>>> that I'm sure there must be a simple way.
>> How do you know they are inefficient? Have you profiled your application
>> and found this to be a bottleneck?
>No, and it seems the problem is not with sorting.
>I wanted to write a compact word counting script (well, in shell
>it can be done in a 5 lines or so), just for fun.
Just about a week ago I posted a word frequency counting
script which counted "words," filtered out some common
contractions and "non-words" and tracked "known words" (as
per entries from /usr/share/dict/words) and then generated
listing by highest frequency first. I also posted a modified
version that would shove its results into a PostgreSQL database
table (a couple of days later, it only took four lines).
I could mail it to you if you like, but I'd be surprised if
it's not still floating around.
(BTW: for performance, it handles almost 1800 man pages,
averaging 7Kb each, in less than 2 minutes on my mid-range
(dual 650Mhz Pentium) desktop box)).
The whole thing in only about 85 lines long and the core
function is less than ten. As so many people have suggested
in this thread, it simply uses a dictionary (awk calls them
associative arrays, perl calls them "hashes").
The core loop is something like:
freq = {}
for line in file:
for word in line.split():
if word in freq: freq[word] += 1
else: freq[word] = 1
(assuming Python2.2 for file interation and dictionary membership
support using "in." I *really* like those 2.2 features! They make
my psuedo-code so executable!)
More information about the Python-list
mailing list