[Tutor] Counting words

Sean 'Shaleh' Perry shalehperry@attbi.com
Thu, 04 Apr 2002 11:42:14 -0800 (PST)


> 
> import re, string
> 
> reg = re.compile("[\W]")
> file = open("someText","r")
> 
> text = file.read()
> 
> 
> occurences = {}
> 
> for word in reg.split(text):
>     
>     occurences[word] = occurences.get(word,0)+1
>     print occurences
>     
> for word in occurences.keys():
>     print "word:",word,", occurences:",occurences[word]
> 
> _____________________________________________________
> 
> First question:  Can someone explain what's happening in the first for-loop?
>                      I don't understand occurences.get(word,0)+1 .
>                      I know it<#s counting there, but how?
> 
> 

what is happening is the 'get' method of the dictionary is being called.  It is
looking for the key specified by word and if it is not found it returns 0.

This is equivalent to:

if occurences.has_key(word):
  occurences[word] = occurences[word] + 1
else:
  occurences[word] = 1

> 
> Second question:  
> "Some" and "some" should be recognized as one word, the same is with "BORING"
> and "boring". I  thought of string.lowercase as a possible solution, but as 
> it doesn't  work , I might be wrong. Any idea what to do?
> 

word = string.lower(word)

> Third question:
> 
> Last line of output:
> Is  "\n" recognized as a word?? (My text  currently consists of three lines)
> 

>>> s = 'A typical line\n'
>>> import re
>>> reg = re.compile("[\W]")
>>> reg.split(s)
['A', 'typical', 'line', '']

notice the empty last entry.  Have you loop skip empty words.