Most efficient way to build very large dictionaries

Wed Dec 24 02:43:35 EST 2008

I'm working with some very large dictionaries (about 25M items
per dictionary) that get filled based on data parsed from text
based log files. I'm using Python dictionaries to track the
frequency of specific combinations of events, eg, I build a key
and then add various timing info from the current record to
that's key's current value. If the key doesn't exist, I create it
and initalize it to its first value. Most keys will have anywhere
from 2 to 10,000 value increments.
Over time, I've noticed that 90%+ of the keys in my dictionaries
stay constant across daily runs of my program. Can I take
advantage of this knowledge to optimize my dictionary performance
by pre-building my dictionary based on a given list of keys whose
values are all set to 0? Specifically, is there a way to bulk
load a dictionary with keys/initial values that is faster than
individually adding most dictionary entries one at a time?
Here are 2 high level strategies I've been thinking about for
improving the performance of my dictionaries.
1. Create a list of keys from a current dictionary, pickle this
list of keys and then on my next program run, load the pickled
list of keys and somehow convert this list to a dictionary with
all key values set to 0.
2. After I've performed my analysis on a current dictionary,
iterate through this dictionary, setting all values to 0 (is
there a fast way to do this?), and then on my next program run,
load this pickled dictionary of keys and zero values.
I would appreciate your thoughts on whether there are advantages
to working with a pre-built dictionary and if so, what are the
best ways to create a pre-loaded dictionary.
Thank you,
Malcolm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20081224/d70e5343/attachment-0001.html>