[Numpy-discussion] python memory use

Muhammad Alkarouri malkarouri at yahoo.co.uk
Sun May 4 09:39:04 EDT 2008


--- Robin <robince at gmail.com> wrote:
[...]
> While investigating this I found this script:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474
> which does wonders for my code. I was wondering if this function
> should be included in Numpy as it seems to provide an important
> feature, or perhaps an entry on the wiki (in Cookbook section?)

I am the author of the mentioned recipe, and the reason I have written it is
similar to your situation. I would add, however that ideally there shouldn't be
such a problem but in reality there is. I have no clue why.

As Christian said, Python does release memory. There was a problem before
Python 2.5 as I understand, but the memory manager was patched (see
http://evanjones.ca/python-memory-part3.html) and now I personally don't use
Python <2.5 for that reason. The new manager helped, but still I face that
problem, so I wrote the recipe.

--- Andrew wrote:
[...]
> It's hard to say without knowing what your code does. A first guess is
> that you're allocating lots of memory without allowing it to be freed.
> Specifically, you may have references to objects which you no longer
> need, and you should eliminate those references and allow them to be
> garbage collected. In some cases, circular references can be hard for
> python to detect, so you might want to play around with the gc module
> and judicious use of the del statement. Note also that IPython keeps
> references to past results by default (the history).

Sound advice, specially the part about iPython which is often overlooked. I
would have to say I have tried to play a lot with the gc module, calling
gc.collect / enable / disable / playing with thresholds. In practice it helps a
little but not much. In my experience, it is more likely in numpy code using
only arrays of numbers to have references/views to arrays that you do not need
than to have circular references.
I haven't looked at the internals of gc, obmalloc or any other Python code.

What happens to me is usually the machine starts to use virtual memory, slowing
the whole computation a lot. I wonder if your algorithm that needs allocation
of huge memory to cause a MemoryError can be modified to avoid that. I have
found that to be the case is some situations. As an example, for PCA you might
find depending on you matrix size the use of the transpose or other algorithms
more suitable -- I ended up using http://folk.uio.no/henninri/pca_module. 

While I am of course partial to the fate of the cookbook recipe, I also feel
that it doesn't directly belong in numpy -- it should be useful for other
Pythonistas. May be in numpy, python proper somewhere, or one of the parallel
processing libraries. I agree that a wiki page will be more beneficial --
though not sure what else should be there.

Regards,

Muhammad Alkarouri


      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html



More information about the NumPy-Discussion mailing list