Issue in printing top 20 dictionary items by dictionary value

Peter Otten __peter__ at web.de
Sat Oct 4 06:20:07 EDT 2014


Shiva wrote:

> Hi All,
> 
> I have written a function that
> -reads a file
> -splits the words and stores it in a dictionary as word(key) and the total
> count of word in file (value).
> 
> I want to print the words with top 20 occurrences in the file in reverse
> order - but can't figure it out. Here is my function:
> 
> def print_top(filename):
> 
>     #Open a file
>     path = '/home/BCA/Documents/LearnPython/ic/'
>     fname = path + filename
>     print ('filename: ',fname)
>     filetext = open(fname)
> 
>     #Read the file
>     textstorage={}
> 
>     #print(type(textstorage))
>     readall = filetext.read().lower()
>     eachword = set(readall.split())
> 
>     #store split words as keys in dictionary
>     for w in eachword:
>         textstorage[w] = readall.count(w)

Using count() here is very inefficient. A better approach is to increment 
the dict value:

for w in readall.split():
    textstorage[w] = textstorage.get(w, 0) + 1

> 
>     #print top 20 items in dictionary by decending order of val
>     # This bit is what I can't figure out.
> 
>     for dkey in (textstorage.keys()):
>             print(dkey,sorted(textstorage[dkey]))??

Apart from the fact that you are sorting characters in the word at that 
point the sorting effort is already too late -- you need to sort the dict 
keys by the corresponding dict values.

It is possible to write a get_value() function such that

sorted(textstorage, key=get_value, reverse=True)

gives the keys in the right order, but perhaps it is simpler to convert 
textstorage into a list of (count, word) pairs first, something like

pairs = [(42, "blue"), (17, "red"), (77, "black"), ...]

When you sort that list

most_common_words = sorted(pairs, reverse=True)

you automatically get (count, word) pairs in the right order and can print 
the first 20 with

for count, word in most_common_words[:20]:
    print(word, count)

PS: Once you have it all working have a look at collections.Counter...





More information about the Python-list mailing list