Two questions about efficiency

Steve M humean at fea.st
Wed May 26 16:39:30 EDT 2004


1. Near the beginning of the document "Unifying types and classes in
Python 2.2" by GvR, which can be found at
http://www.python.org/2.2.2/descrintro.html, the author writes the
following about subclassing a built in type:
---
Here's an example of a simple dict subclass, which provides a "default
value" that is returned when a missing key is requested:

    class defaultdict(dict):

        def __init__(self, default=None):
            dict.__init__(self)
            self.default = default

        def __getitem__(self, key):
            try:
                return dict.__getitem__(self, key)
            except KeyError:
                return self.default

<snip>

The __getitem__() method could also be written as follows, using the
new "key in dict" test introduced in Python 2.2:

        def __getitem__(self, key):
            if key in self:
                return dict.__getitem__(self, key)
            else:
                return self.default

I believe that this version is less efficient, because it does the key
lookup twice. The exception would be when we expect that the requested
key is almost never in the dictionary: then setting up the try/except
statement is more expensive than the failing "key in self" test.
---

I am interested in his claim that the second version is less efficient
unless the requested key is "almost never" in the dictionary. I would
have thought that the overhead to do a key lookup is quite a bit less
than the overhead required to create an exception object, so that the
first version would be on average more efficient only if the requested
key is in the dictionary, say, more than half the time.

Does the "key in dict" test internally raise an exception on failure,
thus incurring at least the same overhead as the first version? Or is
the overhead to raise an exception much less than I have naively
supposed?

I have a similar idiom buried in a nested loop, and I'd like to select
the approach that is likely to be most efficient. My best guess is
that the key lookup will fail half the time.

2. Is there any reason to prefer one of the following over the other?
Im interested in both a consensus on style and readability, and
considerations of efficiency at runtime.

for x in L:
  if f(x):
    pass

for x in L:
  if f(x):
    continue



More information about the Python-list mailing list