help make it faster please
Fredrik Lundh
fredrik at pythonware.com
Sun Nov 13 14:21:24 EST 2005
Ron Adam wrote:
> The \w does make a small difference, but not as much as I expected.
that's probably because your benchmark has a lot of dubious overhead:
> word_finder = re.compile('[\w@]+', re.I)
no need to force case-insensitive search here; \w looks for both lower-
and uppercase characters.
> for match in word_finder.finditer(string.lower()):
since you're using a case-insensitive RE, that lower() call is not necessary.
> word = match.group(0)
and findall() is of course faster than finditer() + m.group().
> t = time.clock()
> for line in lines.splitlines():
> countDict = foo(line)
> tt = time.clock()-t
and if you want performance, why are you creating a new dictionary for
each line in the sample?
here's a more optimized RE word finder:
word_finder_2 = re.compile('[\w@]+').findall
def count_words_2(string, word_finder=word_finder_2):
# avoid global lookups
countDict = {}
for word in word_finder(string):
countDict[word] = countDict.get(word,0) + 1
return countDict
with your original test on a slow machine, I get
count_words: 0.29868684 (best of 3)
count_words_2: 0.17244873 (best of 3)
if I call the function once, on the entire sample string, I get
count_words: 0.23096036 (best of 3)
count_words_2: 0.11690620 (best of 3)
</F>
More information about the Python-list
mailing list