Sorting distionary by value

John Machin sjmachin at lexicon.net
Sat Mar 23 06:39:59 EST 2002


Paul Rubin <phr-n2002a at nightsong.com> wrote in message news:<7xd6xwntgx.fsf at ruckus.brouhaha.com>...
> Artur Skura <arturs at iidea.pl> writes:
> > No, and it seems the problem is not with sorting.
> > I wanted to write a compact word counting script (well, in shell
> > it can be done in a 5 lines or so), just  for fun.
> >...
> > for i in a:
> >     if i not in known:
> 
> This is horrendously slow because for every input word, you're comparing
> it against all the previously seen words.

Horrendously slow can be better than not at all.

> 
> > it seems it's slow not because of sorting...
> 
> Correct.  I didn't examine your code carefully enough to be sure, but
> what I think you want is something like this:
> 
> counts = {}
> a = string.split(open(sys.argv[1],'r').read())
> for w in a:
>   if counts.has_key(w):
>      counts[w] += 1
>   else:
>      counts[w] = 1
> 
> words = counts.keys()
> words.sort()
> words.reverse()
> 
> for w in words:
>    print words[w], w

Sorry, but this doesn't work. If you were to actually *run* your code,
you would get this result [after adding

import string, sys

at the start of the script]:

    print words[w], w
TypeError: sequence index must be integer

At this stage "words" will be bound to something like ['zot', 'foo',
'bar'] and you are trying to access words['zot'] but "words" is a list
not a dictionary -- unlike in some other languages of which you may
have heard, like awk, they are different concepts in Python.

The problem starts back further with "counts.keys()". You need to use
the .items() method to get both the keys and the values out of the
dictionary, in a list of 2-tuples, like [('foo', 1), ('bar', 666),
('zot', 42)]. Then you need to fiddle with this list to bring the
count to the front of each tuple so that it can be the primary sort
key. For more information, google('Schwartzian transform Martelli')
either web-wide or in comp.lang.python. If as is usual the result is
to be presented in descending order of frequency, some care is
required so that words with the same frequency don't come out in
reversed order.

Try replacing the last part of the script with:

words = [(-count, word) for word, count in counts.items()]
words.sort()
for count, word in words:
   print word, -count

HTH,
John



More information about the Python-list mailing list