Word count from file help.

jester.dev jester.dev at comcast.net
Wed Feb 11 20:04:20 EST 2004


Hello,

        I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text. 

#!/usr/bin/python

    
# WordCount.py - Counts the words in a given text file (poem.txt)

import string

def CountWords(Text):
        "Count how many times each word appears in Text"
        # A string (above) after a def statement is a -
        # "docstring" - a comment intended for documentation.
        WordCount={}
        # We will build up (and return) a dictionary whose keys
        # are the words, and whose values are the corresponding
        # number of occurrences.
        
        CountWords=""
        # To make the job cleaner, add a period at the end of the
        # text; that way, we are guaranteed to be finished with
        # the current word when we run out of letters:
        Text=Text+"."
        
        # We assume that ' and - don't break words, but any other
        # nonalphabetic character does. This assumption isn't
        # entirely accurate, but it's close enough for us.
        # string.letters is a string of all alphabetic charactors.
        PiecesOfWords=string.letters+"'-"
        
        # Iterate over each character in the text. The function
        # len () returns the length of a sequence.
        for CharacterIndex in range(0,len(Text)):
                CurrentCharacter=Text[CharacterIndex]
                
                # The find() method of a string finds the starting
                # index of the first occurrence of a substring within
                # a string, or returns -1 of it doesn't find a substring.
                # The next line of code tests to see wether CurrentCharacter
                # is part of a word:
                if(PiecesOfWords.find(CurrentCharacter)!=-1):
                        # Append this letter to the current word.
                        CurrentWord=CurrentWord+CurrentCharacter
                else:
                        # This character is no a letter.
                                if(CurrentWord!=""):
                                        # We just finished a word.
                                        # Convert to lowercase, so "The" and
"the"
                                        # fall in the same bucket...
                                       
CurrentWord=string.lower(CurrentWord)
                                        
                                        # Now increment this word's count.
                                       
CurrentCount=WordCount.get(CurrentWord,0)
                                       
WordCount[CurrentWord]=CurrentCount+1
                                        
                                        # Start a new word.
                                        CurrentWord=""
                                        return(WordCount)
                                if (__name__=="__main__"):
                                        # Read the text from the file
peom.txt.
                                        TextFile=open("poem.txt","r")
                                        Text=TextFile.read()
                                        TextFile.close()
                                        
                                        # Count the words in the text.
                                        WordCount=CountWords(Text)
                                        # Alphabetize the word list, and
print them all out.
                                        SortedWords=WordCount.keys()
                                        SortedWords.sort()
                                        for Word in SortedWords:
                                                print Word.WordCount[Word]



More information about the Python-list mailing list