Word count from file help.

Dave K dk123456789 at REMOVEhotmail.com
Thu Feb 12 19:16:32 EST 2004


On Thu, 12 Feb 2004 01:04:20 GMT in comp.lang.python, "jester.dev"
<jester.dev at comcast.net> wrote:

>Hello,
>
>        I'm learning Python from Python Bible, and having some
>problems with this code below. When I run it, I get nothing. It
>should open the file poem.txt (which exists in the current
>directory) and count number of times any given word appears
>in the text.

When I run it (after re-formatting - you can see below how it appears
in my newsreader), and after fixing the two error messages, it prints
the results just as you describe. Try this:

1) Add the line 'CurrentWord = ""' just before the line
         'for CharacterIndex in range(0,len(Text)):'
2) Change the very last line to 'print Word, WordCount[Word]'

If that doesn't work for you then I suspect that the indenting in your
program is wrong (rather than just being mangled by posting it), but
I'm just guessing. It would be helpful if you posted the actual error
message (Traceback) that the Python interpreter prints, that makes it
much easier to find the problem.

Dave

>
>#!/usr/bin/python
>
>    
># WordCount.py - Counts the words in a given text file (poem.txt)
>
>import string
>
>def CountWords(Text):
>        "Count how many times each word appears in Text"
>        # A string (above) after a def statement is a -
>        # "docstring" - a comment intended for documentation.
>        WordCount={}
>        # We will build up (and return) a dictionary whose keys
>        # are the words, and whose values are the corresponding
>        # number of occurrences.
>        
>        CountWords=""
>        # To make the job cleaner, add a period at the end of the
>        # text; that way, we are guaranteed to be finished with
>        # the current word when we run out of letters:
>        Text=Text+"."
>        
>        # We assume that ' and - don't break words, but any other
>        # nonalphabetic character does. This assumption isn't
>        # entirely accurate, but it's close enough for us.
>        # string.letters is a string of all alphabetic charactors.
>        PiecesOfWords=string.letters+"'-"
>        
>        # Iterate over each character in the text. The function
>        # len () returns the length of a sequence.
>        for CharacterIndex in range(0,len(Text)):
>                CurrentCharacter=Text[CharacterIndex]
>                
>                # The find() method of a string finds the starting
>                # index of the first occurrence of a substring within
>                # a string, or returns -1 of it doesn't find a substring.
>                # The next line of code tests to see wether CurrentCharacter
>                # is part of a word:
>                if(PiecesOfWords.find(CurrentCharacter)!=-1):
>                        # Append this letter to the current word.
>                        CurrentWord=CurrentWord+CurrentCharacter
>                else:
>                        # This character is no a letter.
>                                if(CurrentWord!=""):
>                                        # We just finished a word.
>                                        # Convert to lowercase, so "The" and
>"the"
>                                        # fall in the same bucket...
>                                       
>CurrentWord=string.lower(CurrentWord)
>                                        
>                                        # Now increment this word's count.
>                                       
>CurrentCount=WordCount.get(CurrentWord,0)
>                                       
>WordCount[CurrentWord]=CurrentCount+1
>                                        
>                                        # Start a new word.
>                                        CurrentWord=""
>                                        return(WordCount)
>                                if (__name__=="__main__"):
>                                        # Read the text from the file
>peom.txt.
>                                        TextFile=open("poem.txt","r")
>                                        Text=TextFile.read()
>                                        TextFile.close()
>                                        
>                                        # Count the words in the text.
>                                        WordCount=CountWords(Text)
>                                        # Alphabetize the word list, and
>print them all out.
>                                        SortedWords=WordCount.keys()
>                                        SortedWords.sort()
>                                        for Word in SortedWords:
>                                                print Word.WordCount[Word]




More information about the Python-list mailing list