[Tutor] FW: wierd replace problem

Joel Goldstick joel.goldstick at gmail.com
Tue Sep 14 17:28:10 CEST 2010


On Tue, Sep 14, 2010 at 10:29 AM, Roelof Wobben <rwobben at hotmail.com> wrote:

I offer my solution.  I didn't bother to make every word lower case, and I
think that would improve the result

Please offer critique, improvements


Some explaination:

line 5 -- I read the complete text into full_text, while first replacing --
with a space
line 7 -- I split the full text string into words
lines 8 - 15 -- Word by word I strip all sorts of characters that aren't in
words from the front and back of each 'word'
lines 11 - 14 -- this is EAFP -- try to add one to the bin with that word,
if no such bin, make it and give it 1
lines 16, 17 -- since dicts don't sort, sort on the keys then loop thru the
keys to print out the key (word) and the count


> ----------------------------------------
>
 1 #! /usr/bin/env python
  2
  3 word_count = {}
  4 file = open ('alice_in_wonderland.txt', 'r')
  5 full_text = file.read().replace('--',' ')
  6
  7 full_text_words = full_text.split()
  8 for words in full_text_words:
  9     stripped_words = words.strip(".,!?'`\"- ();:")
 10     ##print stripped_words
 11     try:
 12         word_count[stripped_words] += 1
 13     except KeyError:
 14         word_count[stripped_words] = 1
 15
 16 ordered_keys = word_count.keys()
 17 ordered_keys.sort()
 18 ##print ordered_keys
 19 print "All the words and their frequency in 'alice in wonderland'"
 20 for k in ordered_keys:
 21     print k, word_count[k]
 22
-- 
Joel Goldstick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100914/f28b8b0b/attachment.html>


More information about the Tutor mailing list