[Tutor] Iteration issues

Mats Wichmann mats at wichmann.us
Thu May 10 12:43:05 EDT 2018


On 05/09/2018 05:27 PM, Roger Lea Scherer wrote:
> Hello, again.
> 
> I want to count words in a text file. If a word repeats I want to increase
> the count by 1; if the word is new to the dictionary, I want to add the
> word to the dictionary. Everything works like I would like and expect,
> except for it only adds the last word of each line to the dictionary. What
> am I missing?

So let's add some more comments...

> import string
> 
> file_name = 'oxford.txt'
> wordset = {}
> with open(file_name, 'r') as f:
>     for line in f:
>         sentence = line.strip()
>         sentence = sentence.strip(string.punctuation)
>         print(sentence)
>         sentence = sentence.lower()
>         word_list = sentence.strip()
>         word_list = word_list.split(' ')
> 
>         for i in range(len(word_list)):
>             word_list[i] = word_list[i].strip(string.punctuation)

Not sure why this step is needed at all - you've _already_ stripped
punctuation a few lines above.  However, if you think you still need to
do this, you usually don't want to iterate over a list by index this
way. (If you do need an index, look up how the enumerate() function
works). I'd just make a new list, which is cheap and easy, rather than
trying to replace each list member individually.  Like this, using a
"list comprehension"

          words = [word.strip(string.punctuation) for word in word_list]

>         print(word_list)
> 
>         if word_list[i] in wordset:
>             wordset[word_list[i]] += 1
>         else:
>             wordset[word_list[i]] = 1
>         print(wordset)

this chunk is fine (excepting my previous comment that it needs to be
inside a loop to have any useful meaning as you intend it) - increment
the occurrence counter if the key is already present in the dict, add it
if it isn't. But you could do this with a try/except block too (i.e.,
don't "ask permission", just "try and fix it up"). So using the newly
created "words" from above, like this:

          for word in words:
              try:
                  wordset[word] += 1
              except KeyError:
                  wordset[word] = 1


Does this help?


More information about the Tutor mailing list