optimizing large dictionaries

Christian Heimes lists at cheimes.de
Thu Jan 15 19:08:57 EST 2009


Per Freem schrieb:
> 1] is Try-Except really slower? my dict actually has two layers, so
> my_dict[aKey][bKeys]. the aKeys are very small (less than 100) where
> as the bKeys are the ones that are in the millions.  so in that case,
> doing a Try-Except on aKey should be very efficient, since often it
> will not fail, where as if I do: "if aKey in my_dict", that statement
> will get executed for each aKey. can someone definitely say whether
> Try-Except is faster or not? My benchmarks aren't conclusive and i
> hear it both ways from several people (though majority thinks
> TryExcept is faster).

A defaultdict is faster than a try/except if the key is missing. A
defaultdict is the canonical way to solve your problem, too.

> 3] more importantly, is there likely to be a big improvement for
> splitting up one big dictionary into several smaller ones? if so, is
> there a straight forward elegant way to implement this? the way i am
> thinking is to just fix a number of dicts and populate them with
> elements. then during retrieval, try the first dict, if that fails,
> try the second, if not the third, etc... but i can imagine how that's
> more likely to lead to bugs / debugging give the way my code is setup
> so i am wondering whether it is really worth it.
> if it can lead to a factor of 2 difference, i will definitely
> implement it -- does anyone have experience with this?

Nested dicts more likely slower than one large dict. You have to keep in
mind that Python's dict have been optimized to utilize the CPU cache as
much as possible. Any layer you put on top of a single dict can increase
the number of cache misses and increase the number of function calls.
This may lead to a general slow down.

I assume that you won't notice a slow down. However my instincts tell me
that you'll hardly get a speedup, either. You have to roll your own
benchmarks to study which algorithm is faster in reality.

Christian




More information about the Python-list mailing list