optimizing large dictionaries
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Thu Jan 15 18:43:20 EST 2009
On Thu, 15 Jan 2009 14:49:29 -0800, bearophileHUGS wrote:
> Matimus, your suggestions are all good.
>
> Try-except is slower than:
> if x in adict: ... else: ...
Not according to my tests.
>>> def tryexcept(D, key):
... try:
... return D[key]
... except KeyError:
... return None
...
>>> def ifinelse(D, key):
... if key in D:
... return D[key]
... else:
... return None
...
>>> from timeit import Timer
>>> setup = "from __main__ import tryexcept, ifinelse; D = {2: 'c'}"
>>>
>>> t1 = Timer('tryexcept(D, 2)', setup)
>>> t2 = Timer('ifinelse(D, 2)', setup)
>>>
>>> min(t1.repeat()) # try...except
0.61699414253234863
>>> min(t2.repeat()) # if...else
0.71856999397277832
Perhaps what you meant to say is that try...except is slower *if the key
is not found*. And in that case, there's a truly massive difference:
>>> t3 = Timer('tryexcept(D, 5)', setup)
>>> t4 = Timer('ifinelse(D, 5)', setup)
>>>
>>> min(t3.repeat()) # try...except
5.3846139907836914
>>> min(t4.repeat()) # if...else
0.5983281135559082
Which solution is faster on average will depend on the ratio of keys
found versus keys not found. On my PC, based on the above results,
if...else becomes faster when just 2% of keys are not found.
But if you can arrange matters so that nearly all keys are found, then
try...except can be faster.
There's another solution nobody has mentioned (as far as I can see): the
get method. In my test I call get() from inside a function, rather than
directly, so that the timing results include the function call overhead
for all three strategies.
>>> def get(D, key):
... return D.get(key)
...
>>>
>>> setup = "from __main__ import get; D = {2: 'c'}"
>>> t5 = Timer('get(D, 2)', setup)
>>> t6 = Timer('get(D, 5)', setup)
>>>
>>> min(t5.repeat()) # found key
0.88362312316894531
>>> min(t6.repeat()) # missing key
0.87782382965087891
--
Steven
More information about the Python-list
mailing list