[Tutor] Counting words

Nicole Seitz nicole.seitz@urz.uni-hd.de
Thu, 4 Apr 2002 20:28:10 +0200


Hi there!


I would like to count words in a file, i.e. I want to know how often a word 
occurs.

In "GoTo Python" I found some help, so I could write:

_____________________________________________________

import re, string

reg = re.compile("[\W]")
file = open("someText","r")

text = file.read()


occurences = {}

for word in reg.split(text):
    
    occurences[word] = occurences.get(word,0)+1
    print occurences
    
for word in occurences.keys():
    print "word:",word,", occurences:",occurences[word]

_____________________________________________________

First question:  Can someone explain what's happening in the first for-loop?
                     I don't understand occurences.get(word,0)+1 .
                     I know it<#s counting there, but how?



(one possible output)

word: of , occurences: 1
word: Some , occurences: 1
word: are , occurences: 1
word: texts , occurences: 1
word: BORING , occurences: 1
word: some , occurences: 1
word: is , occurences: 2
word: boring , occurences: 1
word: This , occurences: 2
word: kind , occurences: 1
word: text , occurences: 1
word:  , occurences: 3

Second question:  
"Some" and "some" should be recognized as one word, the same is with "BORING" 
and "boring". I  thought of string.lowercase as a possible solution, but as 
it doesn't  work , I might be wrong. Any idea what to do?

Third question:

Last line of output:
Is  "\n" recognized as a word?? (My text  currently consists of three lines)


Thanx in advance.

Nicole