my computer is allergic to pickles

Bob Fnord bob at example.com
Fri Mar 4 20:56:05 EST 2011


I'm using python to do some log file analysis and I need to store
on disk a very large dict with tuples of strings as keys and
lists of strings and numbers as values.

I started by using cPickle to save the instance of the class that
contained this dict, but the pickling process started to write
the file but ate so much memory that my computer (4 GB RAM)
crashed so badly that I had to press the reset button. I've never
seen out-of-memory errors do this before. Is this normal?

(I know from the output that got written before the crash that my
program had finished building the dict and started the
pickle. When I tried running the other program that reads the
pickle and analyzes the data in it, it gave an error because the
file was incomplete. So I know where in my code the crash
happened.)

>From searching the web, I get the impression that pickle uses a
lot of memory because it checked for recursion and other things
that could break other serialization methods. So I've switched to
using marshal to save the dict itself (the only persistent thing
in the class, which just has convenience methods for adding data
to the dict and searching it for the second stage of analysis).

I found some references to h5 tables for getting around the
pickling memory problem, but I got the impression they only work
with fixed columns, not a somewhat complex data structure like
mine.

Any comments, suggestions?




More information about the Python-list mailing list