my computer is allergic to pickles

MRAB python at mrabarnett.plus.com
Fri Mar 4 21:14:04 EST 2011


On 05/03/2011 01:56, Bob Fnord wrote:
> I'm using python to do some log file analysis and I need to store
> on disk a very large dict with tuples of strings as keys and
> lists of strings and numbers as values.
>
> I started by using cPickle to save the instance of the class that
> contained this dict, but the pickling process started to write
> the file but ate so much memory that my computer (4 GB RAM)
> crashed so badly that I had to press the reset button. I've never
> seen out-of-memory errors do this before. Is this normal?
>
> (I know from the output that got written before the crash that my
> program had finished building the dict and started the
> pickle. When I tried running the other program that reads the
> pickle and analyzes the data in it, it gave an error because the
> file was incomplete. So I know where in my code the crash
> happened.)
>
>> From searching the web, I get the impression that pickle uses a
> lot of memory because it checked for recursion and other things
> that could break other serialization methods. So I've switched to
> using marshal to save the dict itself (the only persistent thing
> in the class, which just has convenience methods for adding data
> to the dict and searching it for the second stage of analysis).
>
> I found some references to h5 tables for getting around the
> pickling memory problem, but I got the impression they only work
> with fixed columns, not a somewhat complex data structure like
> mine.
>
> Any comments, suggestions?
>
Would a database work?



More information about the Python-list mailing list