help make it faster please

Lonnie Princehouse finite.automaton at gmail.com
Thu Nov 10 13:40:18 EST 2005


You're making a new countDict for each line read from the file... is
that what you meant to do?  Or are you trying to count word occurrences
across the whole file?

--

In general, any time string manipulation is going slowly, ask yourself,
"Can I use the re module for this?"

# disclaimer: untested code.  probably contains typos

import re
word_finder = re.compile('[a-z0-9_]+', re.I)

def count_words (string, word_finder = word_finder):  # avoid global
lookups
  countDict = {}
  for match in word_finder.finditer(string):
    word = match.group(0)
    countDict[word] = countDict.get(word,0) + 1
  return countDict

f = open(filename)
for i, line in enumerate(f.xreadlines()):
  countDict = count_words(line)
  print "Line %s" % i
  for word in sorted(countDict.keys()):
    print "  %s %s" % (word, countDict[word])

f.close()




More information about the Python-list mailing list