performance: initializing dictionary elements

Tue May 21 11:54:20 EDT 2002

    George> I found Skip Montanaro's page of performance tips and the
    George> subsection "Initializing Dictionary Elements" presented an
    George> optimization using exceptions instead of conditional looping. I
    George> tried this out for my operation (which was the same one
    George> incidentally, a full-text index) but found that performance
    George> actually suffered. I'm using Python 2.1.1. Does anyone have any
    George> idea why this would happen?

The most likely reason is that Python is not static and I haven't done a
very good job keeping up with changes.  I wrote that missive several years
ago.  (I have a response to it in my Python mailbox dated 06 Nov 1996.)  I
should probably have timestamped the various sections, but didn't think of
it at the time.

I wrote a small script to test these two approaches (appended).  It uses two
sources for words.  One used all the "words" in the tex source for the
library reference manual (234k words, 33k unique).  The other uses
/usr/share/dict/words (45k words, all unique).  I also added another
approach to initialization that uses {}.get().  Dict's hadn't yet grown that
method when I wrote the original document.

In both cases all three versions perform about the same (within about 20% of
each other).  Note that using python's -O flag has an effect on the results
because it removes fewer SET_LINENO instructions from the getinit() function
than from the other two...

-- 
Skip Montanaro (skip at pobox.com - http://www.mojam.com/)
"Excellant Written and Communications Skills required" - seen on chi.jobs

import glob, time

def ifinit(words):
    wdict = {}
    has_key = wdict.has_key
    for word in words:
        if not has_key(word): wdict[word] = 0
        wdict[word] = wdict[word] + 1
    return wdict

def excinit(words):
    wdict = {}
    for word in words:
        try:
            wdict[word] = wdict[word] + 1
        except KeyError:
            wdict[word] = 1
    return wdict

def getinit(words):
    wdict = {}
    get = wdict.get
    for word in words:
        wdict[word] = get(word, 0) + 1
    return wdict

def countwords(words):
    print "corpus is", len(words), "words"

    t = time.time()
    for i in range(5):
        wdict = ifinit(words)
    print "using if: %.2f seconds to process" % (time.time()-t),
    print len(wdict), "unique words"

    t = time.time()
    for i in range(5):
        wdict = ifinit(words)
    print "using try/except: %.2f seconds to process" % (time.time()-t),
    print len(wdict), "unique words"

    t = time.time()
    for i in range(5):
        wdict = getinit(words)
    print "using get(): %.2f seconds to process" % (time.time()-t),
    print len(wdict), "unique words"

    print

words = []
for file in glob.glob("/home/skip/src/python/head/dist/src/Doc/lib/*.tex"):
    words.extend(open(file).read().split())
countwords(words)

countwords(open("/usr/share/dict/words").read().split())