Defaultdict and speed

Sat Nov 4 03:43:57 EST 2006

Klaas wrote:
> Benchmarks?

There is one (fixed in a succesive post) in the original thread I was
referring to:
http://groups.google.com/group/it.comp.lang.python/browse_thread/thread/aff60c644969f9b/
If you want I can give more of them (and a bit less silly, with strings
too, etc).

def ddict(n):
    t = clock()
    d = defaultdict(int)
    for i in xrange(n):
        d[i] += 1
    print round(clock()-t, 2)

def ndict(n):
    t = clock()
    d = {}
    for i in xrange(n):
        if i in d:
            d[i] += 1
        else:
            d[i] = 1
    print round(clock()-t, 2)

ddict(300000)
ndict(300000)

> (and slowing down other uses of the class)

All it has to do is to cheek if the default_factory is an int, it's
just an "if" done only once, so I don't think it slows down the other
cases significantly.

> especially when the faster alternative is so easy to code.

The faster alternative is easy to create, but the best faster
alternative can't be coded, because if you code it in Python you need
two hash accesses, while the defaultdict can require only one of them:

if n in d:
    d[n] += 1
else:
    d[n] = 1

>If that performance difference matters,

With Python it's usually difficult to tell if some performance
difference matters. Probably in some programs it may matter, but in
most other programs it doesn't matter. This is probably true for all
the performance tweaks I may invent in the future too.

> you would likely find more fruitful
> gains in coding it in c, using PyDict_SET

I've just started creating a C lib for related purposes, I'd like to
show it to you all on c.l.p, but first I have to find a place to put it
on :-) (It's not easy to find a suitable place, it's a python + c +
pyd, and it's mostly an exercise).

Bye,
bearophile