Pickle MemoryError - any ideas?

Carl Banks pavlovevidence at gmail.com
Tue Jul 20 20:06:50 EDT 2010


On Jul 20, 3:01 pm, Peter <peter.milli... at gmail.com> wrote:
> I have created a class that contains a list of files (contents,
> binary) - so it uses a LOT of memory.
>
> When I first pickle.dump the list it creates a 1.9GByte file on the
> disk. I can load the contents back again, but when I attempt to dump
> it again (with or without additions), I get the following:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "c:\Python26\Lib\pickle.py", line 1362, in dump
>     Pickler(file, protocol).dump(obj)
>   File "c:\Python26\Lib\pickle.py", line 224, in dump
>     self.save(obj)
>   File "c:\Python26\Lib\pickle.py", line 286, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "c:\Python26\Lib\pickle.py", line 600, in save_list
>     self._batch_appends(iter(obj))
>   File "c:\Python26\Lib\pickle.py", line 615, in _batch_appends
>     save(x)
>   File "c:\Python26\Lib\pickle.py", line 286, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "c:\Python26\Lib\pickle.py", line 488, in save_string
>     self.write(STRING + repr(obj) + '\n')
> MemoryError

(Aside) Wow, pickle concatenates strings like this?


> I get this error either attempting to dump the entire list or dumping
> it in "segments" i.e. the list is 2229 elements long, so from the
> command line I attempted using pickle to dump individual parts of the
> list into into files i.e. every 500 elements were saved to their own
> file - but I still get the same error.
>
> I used the following sequence when attempting to dump the list in
> segments - X and Y were 500 element indexes apart, the sequence fails
> on [1000:1500]:
>
> f = open('archive-1', 'wb', 2)
> pickle.dump(mylist[X:Y], f)
> f.close()

First thing to do is try cPickle module instead of pickle.


> I am assuming that available memory has been exhausted, so I tried
> "waiting" between dumps in the hopes that garbage collection might
> free some memory - but that doesn't help at all.

Waiting won't trigger a garbage collection.  Well first of all, it's
not garbage collection but cycle collection (objects not part of
refernce cycles are collected immediately after they're destroyed, at
least they are in CPyhton), and given that your items are all binary
data, I doubt there are many reference cycles in your data.

Anyway, cycle collection is triggered when object creation/deletion
counts meet certain criteria (which won't happen if you are waiting),
but you could call gc.collect() to force a cycle collection.


> In summary:
>
> 1. The list gets originally created from various sources
> 2. the list can be dumped successfully
> 3. the program restarts and successfully loads the list
> 4. the list can not be (re) dumped without getting a MemoryError
>
> This seems like a bug in pickle?

No


> Any ideas (other than the obvious - don't save all of these files
> contents into a list! Although that is the only "answer" I can see at
> the moment :-)).

You should at least consider if one of the dbm-style databases (dbm,
gdbm, or dbhash) meets your needs.


Carl Banks



More information about the Python-list mailing list