performance: initializing dictionary elements
Skip Montanaro
skip at pobox.com
Tue May 21 11:54:20 EDT 2002
George> I found Skip Montanaro's page of performance tips and the
George> subsection "Initializing Dictionary Elements" presented an
George> optimization using exceptions instead of conditional looping. I
George> tried this out for my operation (which was the same one
George> incidentally, a full-text index) but found that performance
George> actually suffered. I'm using Python 2.1.1. Does anyone have any
George> idea why this would happen?
The most likely reason is that Python is not static and I haven't done a
very good job keeping up with changes. I wrote that missive several years
ago. (I have a response to it in my Python mailbox dated 06 Nov 1996.) I
should probably have timestamped the various sections, but didn't think of
it at the time.
I wrote a small script to test these two approaches (appended). It uses two
sources for words. One used all the "words" in the tex source for the
library reference manual (234k words, 33k unique). The other uses
/usr/share/dict/words (45k words, all unique). I also added another
approach to initialization that uses {}.get(). Dict's hadn't yet grown that
method when I wrote the original document.
In both cases all three versions perform about the same (within about 20% of
each other). Note that using python's -O flag has an effect on the results
because it removes fewer SET_LINENO instructions from the getinit() function
than from the other two...
--
Skip Montanaro (skip at pobox.com - http://www.mojam.com/)
"Excellant Written and Communications Skills required" - seen on chi.jobs
import glob, time
def ifinit(words):
wdict = {}
has_key = wdict.has_key
for word in words:
if not has_key(word): wdict[word] = 0
wdict[word] = wdict[word] + 1
return wdict
def excinit(words):
wdict = {}
for word in words:
try:
wdict[word] = wdict[word] + 1
except KeyError:
wdict[word] = 1
return wdict
def getinit(words):
wdict = {}
get = wdict.get
for word in words:
wdict[word] = get(word, 0) + 1
return wdict
def countwords(words):
print "corpus is", len(words), "words"
t = time.time()
for i in range(5):
wdict = ifinit(words)
print "using if: %.2f seconds to process" % (time.time()-t),
print len(wdict), "unique words"
t = time.time()
for i in range(5):
wdict = ifinit(words)
print "using try/except: %.2f seconds to process" % (time.time()-t),
print len(wdict), "unique words"
t = time.time()
for i in range(5):
wdict = getinit(words)
print "using get(): %.2f seconds to process" % (time.time()-t),
print len(wdict), "unique words"
print
words = []
for file in glob.glob("/home/skip/src/python/head/dist/src/Doc/lib/*.tex"):
words.extend(open(file).read().split())
countwords(words)
countwords(open("/usr/share/dict/words").read().split())
More information about the Python-list
mailing list