When Python outruns C++ ...

Jp Calderone exarkun at intarweb.us
Tue Apr 1 02:16:11 EST 2003


On Tue, Apr 01, 2003 at 01:48:26AM +0000, Julian Tibble wrote:
> [snip]
> 
> 
> Julian
> 
> p.s. maybe my python version is not very good, feel free to make
> it go faster:
> 
> #
> # Count the number of occurences of each word in a file
> # -- Julian Tibble
> #
> 
> import re
> 
> # "Word -> Number of occurences" mapping
> wordcount = {}
> 
> # Regular expression to match a word 
> re_word = re.compile('[a-z]+')
> 
> # Read in whole file and split into words
> try:
> 	while 1:
> 		line = raw_input().lower()
> 		line = re_word.findall(line)
> 		for word in line:
> 			if wordcount.has_key(word):
> 				wordcount[word] += 1
> 			else:
> 				wordcount[word] = 1
> except EOFError:
> 	pass
> 
> 
> # Order by descending number of occurences
> sortable = zip(wordcount.values(), wordcount.keys())
> sortable.sort()
> sortable.reverse()
> 
> for n,w in sortable:
> 	print w,n
> 

  It looks like this could easily be improved with the application of a
slightly smarter searching algorithm, such as Rabin-Karp search or
Boyer-Moore find.

  Both of these use a technique of doing a little extra work up front to
make each pass through the loop much less expensive.  Google for details.

  Jp

-- 
"There is no reason for any individual to have a computer in their
home."
                -- Ken Olson, President of DEC, World Future Society
                   Convention, 1977
-- 
 up 12 days, 2:00, 6 users, load average: 0.00, 0.00, 0.00





More information about the Python-list mailing list