getting MemoryError with dicts; suspect memory fragmentation

Dan Stromberg drsalists at gmail.com
Fri Jun 4 19:41:51 EDT 2010


On Thu, Jun 3, 2010 at 3:43 PM, Emin.shopper Martinian.shopper <
emin.shopper at gmail.com> wrote:

> Dear Experts,
>
> I am getting a MemoryError when creating a dict in a long running
> process and suspect this is due to memory fragmentation. Any
> suggestions would be welcome. Full details of the problem are below.
>
> I have a long running processing which eventually dies to a
> MemoryError exception


Here's something good about Python memory leaks:
http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks

<http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks>Just
curious: What if you use a gdbm key-value store or treap instead of a
builtin {} dictionary?  gdbm is usually built in, and my treap module is
here, as pure python or partial  CPython+partial cython:
http://stromberg.dnsalias.org/~dstromberg/treap/

The gdbm module would require your keys and values to be strings (unless you
wrap it), but it should pretty much take dictionary-related Virtual Memory
(VM) out of the picture by mostly storing your data to disk.  One way of
wrapping one would be to use my (UCI's) cachedb module, which does implicit
to- and from-string conversions using functions you provide to its
constructor, and allows you to control how many values to cache in VM:
http://stromberg.dnsalias.org/~strombrg/cachedb.html

The treap module should be able to store arbitrary objects in VM out of the
box, and is much like a dictionary but implemented quite differently - its
chief difference is a little memory _overhead_ (normally, maybe not this
time due to the significantly different implementation details) in order to
get the "dictionary's" objects always sorted in key order, and to store
things as a (mostly!) self-balanced tree (it sacrifices fastidious balancing
to get better average performance) with a heap invariant.  Because it stores
things as a balanced tree of discrete objects, it should be pretty clear
that it's not looking for contiguous memory unless the backend CPython
memory management does for some reason beyond dictionary specifics.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100604/e5edf0ea/attachment-0001.html>


More information about the Python-list mailing list