Two questions about efficiency

Wed May 26 23:59:13 EDT 2004

humean at fea.st (Steve M) wrote in message news:<edcd85fb.0405261239.6381cf32 at posting.google.com>...
> 1. Near the beginning of the document "Unifying types and classes in
> Python 2.2" by GvR, which can be found at
> http://www.python.org/2.2.2/descrintro.html, the author writes the
> following about subclassing a built in type:
> ---
> Here's an example of a simple dict subclass, which provides a "default
> value" that is returned when a missing key is requested:
> 
>     class defaultdict(dict):
> 
>         def __init__(self, default=None):
>             dict.__init__(self)
>             self.default = default
> 
>         def __getitem__(self, key):
>             try:
>                 return dict.__getitem__(self, key)
>             except KeyError:
>                 return self.default
> 
> <snip>
> 
> The __getitem__() method could also be written as follows, using the
> new "key in dict" test introduced in Python 2.2:
> 
>         def __getitem__(self, key):
>             if key in self:
>                 return dict.__getitem__(self, key)
>             else:
>                 return self.default
> 
> I believe that this version is less efficient, because it does the key
> lookup twice. The exception would be when we expect that the requested
> key is almost never in the dictionary: then setting up the try/except
> statement is more expensive than the failing "key in self" test.
> ---
>[...] 
> I have a similar idiom buried in a nested loop, and I'd like to select
> the approach that is likely to be most efficient. My best guess is
> that the key lookup will fail half the time.

Since Guido wrote Python, I would tend to believe him when he says the
first method is more efficient than the second. ;)  However, even if
you nested a similar construct several levels deep inside a loop, I'd
be willing to bet that if that particular loop were a bottleneck in
your program, that optimizing that particular section of code would
probably not make the difference between "too slow" and "fast enough."

Since both options are equally readable, if I thought about it, I
would select option 2 while coding, by default.  I wouldn't make a big
deal out of changing code that uses option 1, though.