Two questions about efficiency

Thu May 27 01:31:52 EDT 2004

Steve M wrote:

[two implementations of dictionary with default value]

> I am interested in his claim that the second version is less efficient
> unless the requested key is "almost never" in the dictionary. I would
> have thought that the overhead to do a key lookup is quite a bit less
> than the overhead required to create an exception object, so that the
> first version would be on average more efficient only if the requested
> key is in the dictionary, say, more than half the time.
> 
> Does the "key in dict" test internally raise an exception on failure,
> thus incurring at least the same overhead as the first version? Or is
> the overhead to raise an exception much less than I have naively
> supposed?

Thinking/believing is definitely the wrong approach here. Python ships with
the nice little script timeit.py that let's you verify your assumptions.

<dicttime.py>
class DictBase(dict):
    def __init__(self, default, *args, **kw):
        dict.__init__(self, *args, **kw)
        self.default = default

class DictTry(DictBase):
    def __getitem__(self, key):
        try:
            return dict.__getitem__(self, key)
        except KeyError:
            return self.default

class DictIn(DictBase):
    def __getitem__(self, key):
        if key in self:
            return dict.__getitem__(self, key)
        else:
            return self.default

class DictGet(DictBase):
    def __getitem__(self, key):
        return self.get(key, self.default)

items = "hit alpha beta gamma delta epsilon zeta eta theta iota
kappa".split()
items = zip(items, map(str.upper, items))
dictTry = DictTry(None, items)
dictIn = DictIn(None, items)
dictGet = DictGet(None, items)
</dicttime.py>

$ timeit.py -s"from dicttime import dictTry as d" "d['hit']"
100000 loops, best of 3: 2.22 usec per loop
$ timeit.py -s"from dicttime import dictTry as d" "d['miss']"
100000 loops, best of 3: 11.2 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['hit']"
100000 loops, best of 3: 2.39 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['miss']"
1000000 loops, best of 3: 1.49 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['hit']"
100000 loops, best of 3: 2.11 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['miss']"
100000 loops, best of 3: 2.03 usec per loop

Two results stand out: misses are cheap for DictIn and expensive for
DictTry.

Peter