[Tutor] Simple counter to determine frequencies of words in a document

Josep M. Fontana josep.m.fontana at gmail.com
Sat Nov 20 09:28:28 CET 2010


Hi,

I'm trying to do something that should be very simple. I want to
generate a list of the words that appear in a document according to
their frequencies. So, the list generated by the script should be
something like this:

the : 3
book: 2
was : 2
read: 1
by: 1
[...]

This would be obtained from a document that contained, for example,
the following text:

"The book was read by an unknown person before the librarian found
that the book was missing."

The code I started writing to achieve this result can be seen below.
You will see that first I'm trying to create a dictionary that
contains the word as the key with the frequency as its value. Later on
I will transform the dictionary into a text file with the desired
formatting.

The problem is that, in the first test I ran, the output file that
should contain the dictionary is empty. I'm afraid that this is the
consequence of a very basic misunderstanding of how Python works. I've
tried to piece this together from other scripts but obviously the
program is not doing what it is supposed to do. I know the function
should work so the problem is obviously in how I call the function.
That is, how I'm trying to write the stuff (a dictionary) that the
function returns into the output file. The relevant part of the code
is so short that I'm sure it will take seconds for most people in the
list to spot the problem but I've spent quite a lot of time changing
things around and I cannot get it to work as desired. Can anybody tell
me what's wrong so that I can say "duh" to myself once again?

---------------------------
def countWords(a_list):
    words = {}
    for i in range(len(a_list)):
        item = a_list[i]
        count = a_list.count(item)
        words[item] = count
    return sorted(words.items(), key=lambda item: item[1], reverse=True)
with open('output.txt', 'a') as token_freqs:
    with open('input.txt', 'r') as out_tokens:
        token_list = countWords(out_tokens.read())
        token_freqs.write(token_list)
----------------------

Thanks in advance.

Josep M.


More information about the Tutor mailing list