optimizing large dictionaries

Thu Jan 15 18:43:20 EST 2009

On Thu, 15 Jan 2009 14:49:29 -0800, bearophileHUGS wrote:

> Matimus, your suggestions are all good.
> 
> Try-except is slower than:
> if x in adict: ... else: ...

Not according to my tests.

>>> def tryexcept(D, key):
...     try:
...             return D[key]
...     except KeyError:
...             return None
...
>>> def ifinelse(D, key):
...     if key in D:
...             return D[key]
...     else:
...             return None
...
>>> from timeit import Timer
>>> setup = "from __main__ import tryexcept, ifinelse; D = {2: 'c'}"
>>>
>>> t1 = Timer('tryexcept(D, 2)', setup)
>>> t2 = Timer('ifinelse(D, 2)', setup)
>>>
>>> min(t1.repeat())  # try...except
0.61699414253234863
>>> min(t2.repeat())  # if...else
0.71856999397277832

Perhaps what you meant to say is that try...except is slower *if the key 
is not found*. And in that case, there's a truly massive difference:

>>> t3 = Timer('tryexcept(D, 5)', setup)
>>> t4 = Timer('ifinelse(D, 5)', setup)
>>>
>>> min(t3.repeat())  # try...except
5.3846139907836914
>>> min(t4.repeat())  # if...else
0.5983281135559082

Which solution is faster on average will depend on the ratio of keys 
found versus keys not found. On my PC, based on the above results, 
if...else becomes faster when just 2% of keys are not found.

But if you can arrange matters so that nearly all keys are found, then 
try...except can be faster.

There's another solution nobody has mentioned (as far as I can see): the 
get method. In my test I call get() from inside a function, rather than 
directly, so that the timing results include the function call overhead 
for all three strategies.

>>> def get(D, key):
...     return D.get(key)
...
>>>
>>> setup = "from __main__ import get; D = {2: 'c'}"
>>> t5 = Timer('get(D, 2)', setup)
>>> t6 = Timer('get(D, 5)', setup)
>>>
>>> min(t5.repeat())  # found key
0.88362312316894531
>>> min(t6.repeat())  # missing key
0.87782382965087891

-- 
Steven