optimizing large dictionaries

Thu Jan 15 18:31:51 EST 2009

thanks to everyone for the excellent suggestions. a few follow up q's:

1] is Try-Except really slower? my dict actually has two layers, so
my_dict[aKey][bKeys]. the aKeys are very small (less than 100) where
as the bKeys are the ones that are in the millions.  so in that case,
doing a Try-Except on aKey should be very efficient, since often it
will not fail, where as if I do: "if aKey in my_dict", that statement
will get executed for each aKey. can someone definitely say whether
Try-Except is faster or not? My benchmarks aren't conclusive and i
hear it both ways from several people (though majority thinks
TryExcept is faster).

2] is there an easy way to have nested defaultdicts? ie i want to say
that my_dict = defaultdict(defaultdict(int)) -- to reflect the fact
that my_dict is a dictionary, whose values are dictionary that map to
ints. but that syntax is not valid.

3] more importantly, is there likely to be a big improvement for
splitting up one big dictionary into several smaller ones? if so, is
there a straight forward elegant way to implement this? the way i am
thinking is to just fix a number of dicts and populate them with
elements. then during retrieval, try the first dict, if that fails,
try the second, if not the third, etc... but i can imagine how that's
more likely to lead to bugs / debugging give the way my code is setup
so i am wondering whether it is really worth it.
if it can lead to a factor of 2 difference, i will definitely
implement it -- does anyone have experience with this?

On Jan 15, 5:58 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Thu, 15 Jan 2009 23:22:48 +0100, Christian Heimes wrote:
> >> is there anything that can be done to speed up this simply code? right
> >> now it is taking well over 15 minutes to process, on a 3 Ghz machine
> >> with lots of RAM (though this is all taking CPU power, not RAM at this
> >> point.)
>
> > class MyClass(object):
> >     # a new style class with slots saves some memory
> >     __slots__ = ("field1", "field2", "field2")
>
> I was curious whether using slots would speed up attribute access.
>
> >>> class Parrot(object):
>
> ...     def __init__(self, a, b, c):
> ...             self.a = a
> ...             self.b = b
> ...             self.c = c
> ...>>> class SlottedParrot(object):
>
> ...     __slots__ = 'a', 'b', 'c'
> ...     def __init__(self, a, b, c):
> ...             self.a = a
> ...             self.b = b
> ...             self.c = c
> ...
>
> >>> p = Parrot(23, "something", [1, 2, 3])
> >>> sp = SlottedParrot(23, "something", [1, 2, 3])
>
> >>> from timeit import Timer
> >>> setup = "from __main__ import p, sp"
> >>> t1 = Timer('p.a, p.b, p.c', setup)
> >>> t2 = Timer('sp.a, sp.b, sp.c', setup)
> >>> min(t1.repeat())
> 0.83308887481689453
> >>> min(t2.repeat())
>
> 0.62758088111877441
>
> That's not a bad improvement. I knew that __slots__ was designed to
> reduce memory consumption, but I didn't realise they were faster as well.
>
> --
> Steven