Defaultdict and speed

Fri Nov 3 23:57:21 EST 2006

bearophileHUGS at lycos.com wrote:
> This post sums some things I have written in another Python newsgroup.
> More than 40% of the times I use defaultdict like this, to count
> things:
>
> >>> from collections import defaultdict as DD
> >>> s = "abracadabra"
> >>> d = DD(int)
> >>> for c in s: d[c] += 1
> ...
> >>> d
> defaultdict(<type 'int'>, {'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1})
>
> But I have seen that if keys are quite sparse, and int() becomes called
> too much often, then code like this is faster:
>
> >>> d = {}
> >>> for c in s:
> ...   if c in d: d[c] += 1
> ...   else: d[c] = 1
> ...
> >>> d
> {'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}
>
> So to improve the speed for such special but common situation, the
> defaultdict can manage the case with default_factory=int in a different
> and faster way.

Benchmarks?  I doubt it is worth complicating defaultdict's code (and
slowing down other uses of the class) for this improvement...
especially when the faster alternative is so easy to code.  If that
performance difference matters, you would likely find more fruitful
gains in coding it in c, using PyDict_SET.

-Mike