Reclaiming (lots of) memory

Thomas Rast foo.bar at freesurf.ch.invalid
Sat Oct 2 15:27:05 EDT 2004


Hello everyone

My scenario is somewhat involved, so if you're in a hurry, here's an
abstract: I have a daemon that needs about 80MB of RAM to build its
internal data structures, but can pack them to 20MB for the rest of
its lifetime.  The other 60MB are never returned to the operating
system.  I've thought of building and packing the structures in a
fork()ed process, then piping them over to the main part; is there an
easier way to get my RAM back?

Explanation:

The program is part of a toy IRC bot which reads roughly 100'000 lines
from logfiles and builds a huge dict of dicts which is then used to
build sentences much like word-based dissociated press would.  All
entries are ints representing words or, as values of the inner
dictionaries, frequencies.

The final dictionary has about 325'000 entries, and the python process
uses around 80MB of RAM.  See the test case at the end of my post:
fill() creates a similiar data structure.  If you run it as a script
and compare the 'ps' outputs, you'll see that 'del d' frees only a
small part of the memory to the OS; on my Linux system, about 10%.

Using the standard 'struct' module, I can pack the keys and values of
the outer dictionary into strings, which brings memory usage down to
about 20M.  Unfortunately, this has to be done as a second step (as
opposed to always keeping the dictionary in that format), otherwise it
would slow down things too much.  Writing everything to disk (using
e.g. 'bsddb3') suffers from the same problem; I really have to do the
initial work in RAM to get acceptable speed.

So, do I really have to do this in two separate processes?  Would it
help if I implemented the data storage part as a C extension module?
Or am I worrying too much about a problem that is better left to the
OS's memory management (i.e. swapping)?

Of course I'd also appreciate ideas for more efficient data layouts
;-)

Thomas

---8<---
#!/usr/bin/python

import random
import os

def fill(map):
    random.seed(0)
    for i in xrange(300000):
        k1 = (random.randrange(100),
              random.randrange(100),
              random.randrange(100))
        k2 = random.randrange(100)
        if k1 in map:
            d = map[k1]
        else:
            d = dict()
        d[k2] = d.get(k2, 0) + 1
        map[k1] = d

if __name__ == "__main__":
    os.system('ps v')
    d = dict()
    fill(d)
    os.system('ps v')
    del d
    os.system('ps v')
--->8---

-- 
If you want to reply by mail, substitute my first and last name for
'foo' and 'bar', respectively, and remove '.invalid'.



More information about the Python-list mailing list