[Tutor] FW: wierd replace problem
Joel Goldstick
joel.goldstick at gmail.com
Tue Sep 14 17:28:10 CEST 2010
On Tue, Sep 14, 2010 at 10:29 AM, Roelof Wobben <rwobben at hotmail.com> wrote:
I offer my solution. I didn't bother to make every word lower case, and I
think that would improve the result
Please offer critique, improvements
Some explaination:
line 5 -- I read the complete text into full_text, while first replacing --
with a space
line 7 -- I split the full text string into words
lines 8 - 15 -- Word by word I strip all sorts of characters that aren't in
words from the front and back of each 'word'
lines 11 - 14 -- this is EAFP -- try to add one to the bin with that word,
if no such bin, make it and give it 1
lines 16, 17 -- since dicts don't sort, sort on the keys then loop thru the
keys to print out the key (word) and the count
> ----------------------------------------
>
1 #! /usr/bin/env python
2
3 word_count = {}
4 file = open ('alice_in_wonderland.txt', 'r')
5 full_text = file.read().replace('--',' ')
6
7 full_text_words = full_text.split()
8 for words in full_text_words:
9 stripped_words = words.strip(".,!?'`\"- ();:")
10 ##print stripped_words
11 try:
12 word_count[stripped_words] += 1
13 except KeyError:
14 word_count[stripped_words] = 1
15
16 ordered_keys = word_count.keys()
17 ordered_keys.sort()
18 ##print ordered_keys
19 print "All the words and their frequency in 'alice in wonderland'"
20 for k in ordered_keys:
21 print k, word_count[k]
22
--
Joel Goldstick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100914/f28b8b0b/attachment.html>
More information about the Tutor
mailing list