dict.get and str.xsplit

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Tue Feb 26 09:02:12 EST 2008


When I use dictionaries I find the dict.get() method very handy, and
I'd like to use it often, but I think it's quite slow. This is a small
benchmark:

from random import randrange
from timeit import default_timer as clock

def main(N, M):
    keys = list(set(randrange(2*N) for n in xrange(N)))
    n2 = len(keys) / 2
    keys1, keys2 = keys[:n2], keys[n2:]
    adict = dict.fromkeys(keys1)

    t = clock()
    for _ in xrange(M):
        for k in keys1:
            r = adict.get(k, None)
        for k in keys2:
            r = adict.get(k, None)
    print round(clock() - t, 2), "s"

    t = clock()
    for _ in xrange(M):
        for k in keys1:
            if k in adict:
                r = adict[k]
            else:
                r = None
        for k in keys2:
            if k in adict:
                r = adict[k]
            else:
                r = None
    print round(clock() - t, 2), "s"

main(40000, 30)


On my old PC the output is:
2.36 s
1.59 s
(Your timings may be 4-5 times smaller, so you can tweak N and M to
have significant results).
And note that the if/then version is faster even if it calls the hash
function two times, while get() is supposed to do it once (in this
case the hashing function is very simple and fast. If the keys are
long strings the results of a similar benchmark may be different,
because the computing of the hashing function may become the slowest
part of that code).

This is a real difference, that has real impact on the programs I
write, so I often use the if/else approach, despite the dict.get()
method being semantically fitter and shorter.
So can the dict.get() method be speed up? And if not, why?

------------------

Another method I'd like to see, is the str.xsplit() I was talking
about time ago too. It's like str.split() but it's lazy. It avoids to
build a potentially huge list. In real D programs I have seen such
string function may speed up heavy-string-processing code (the kind of
code you may want to use Python/Perl/Ruby for) up to 2-3 times. Seeing
how such kind of processing is common in Python I think this isn't an
optimization to ignore.
(I am now getting better in writing C code, so in few months I may
even become able to submit a xsplit patch myself. I think it's not too
much complex to write, it can borrow lot of code from the str.split
method).

Bye, and thank you for your kind attention,
bearophile



More information about the Python-list mailing list