Python Memory Usage

Tue Jun 19 23:48:02 EDT 2007

I am using Python to process particle data from a physics simulation.
There are about 15 MB of data associated with each simulation, but
there are many simulations.  I read the data from each simulation into
Numpy arrays and do a simple calculation on them that involves a few
eigenvalues of small matricies and quite a number of temporary
arrays.  I had assumed that that generating lots of temporary arrays
would make my program run slowly, but I didn't think that it would
cause the program to consume all of the computer's memory, because I'm
only dealing with 10-20 MB at a time.

So, I have a function that reliably increases the virtual memory usage
by ~40 MB each time it's run.  I'm measuring memory usage by looking
at the VmSize and VmRSS lines in the /proc/[pid]/status file on an
Ubuntu (edgy) system.  This seems strange because I only have 15 MB of
data.

I started looking at the difference between what gc.get_objects()
returns before and after my function.  I expected to see zillions of
temporary Numpy arrays that I was somehow unintentionally maintaining
references to.  However, I found that only 27 additional objects  were
in the list that comes from get_objects(), and all of them look
small.  A few strings, a few small tuples, a few small dicts, and a
Frame object.

I also found a tool called heapy (http://guppy-pe.sourceforge.net/)
which seems to be able to give useful information about memory usage
in Python.  This seemed to confirm what I found from manual
inspection: only a few new objects are allocated by my function, and
they're small.

I found Evan Jones article about the Python 2.4 memory allocator never
freeing memory in certain circumstances:  http://evanjones.ca/python-memory.html.
This sounds a lot like what's happening to me.  However, his patch was
applied in Python 2.5 and I'm using Python 2.5.  Nevertheless, it
looks an awful lot like Python doesn't think it's holding on to the
memory, but doesn't give it back to the operating system, either.  Nor
does Python reuse the memory, since each successive call to my
function consumes an additional 40 MB.  This continues until finally
the VM usage is gigabytes and I get a MemoryException.

I'm using Python 2.5 on an Ubuntu edgy box, and numpy 1.0.3.  I'm also
using a few routines from scipy 0.5.2, but for this part of the code
it's just the eigenvalue routines.

It seems that the standard advice when someone has a bit of Python
code that progressively consumes all memory is to fork a process.  I
guess that's not the worst thing in the world, but it certainly is
annoying.  Given that others seem to have had this problem, is there a
slick package to do this?  I envision:
value = call_in_separate_process(my_func, my_args)

Suggestions about how to proceed are welcome.  Ideally I'd like to
know why this is going on and fix it.  Short of that workarounds that
are more clever than the "separate process" one are also welcome.

Thanks,
Greg