[Chicago] Out of Memory: Killed Process: on CentOS

Tue Apr 28 01:13:43 CEST 2009

On Mon, Apr 27, 2009 at 12:02:35PM -0500, Brian Ray wrote:
>>> What is the best ways to manage this scenario?  Specifically, what
>>> should I try?

My first thought was to ask if there's actually a business need to
process such huge piles of data in one request, since that is surely
what's getting you into trouble; if not, then the simple solution to
these headaches might be to stop banging your head against that wall. 
The headaches are exasperated by mod_python, since that means that
every apache worker process may eventually become bloated by having the
embedded Python interpreter using gobs of memory, which, according to
everything I've read and all my experience, never goes away. It also
passed through my mind that from what you imply about the number of
concurrent requests of this type that you need to handle it might help
to run a separate process (fastcgi, ... wsgi), as that might result in
less processes with bloated memory spaces hanging about.  But since I
didn't stop reading even then...

> I am ok with running differently; although, generally Apache is working 
> well for us on situations where we have sites that get a lot of traffic. 
> I would want to use two things at once, perhaps or simply run a separate 
> process of python for each request.

Brilliant, you've answered your own question here!

>>> Is there some other way to clear memory when running large requests?

Right, you want to run these beasts as plain old CGI.  It's a perfect
fit - relatively infrequent, huge, requests, so the "horrid" overhead
of starting up a process to handle each one will be barely visible, and
having the process die at the end will free up the sometimes huge VM
demands.  You could still get in trouble if there are enough concurrent
huge requests, but... wait, let me tie off that other loose end.

> What methods work best for finding leaks. Of course the app consumed  
> more and more memory while it is parsing.  Are there any tools out there 
> you would recommend to check for leaks.

Killing the process cures memory leaks.  Assuming there are actually
any leaks, as opposed to just a huge peak demand that then sticks
around because the Python VM doesn't free memory back to the OS.

> Do you have any good SAX style API you would recommend?

That's a great idea too, and I think it also would encourage you to run
these jobs as CGI.  Unless I misremember - quite possible - the CGI job
gets the body of the request on its stdin, and I think it's actually
wired to the socket, so if you can move to a more incremental parse &
process those huge requests won't clutter up memory (whether being
bufferred, ouch, or just from having passed through on their way to and
from disk).

Use CGI because it has less overhead!  /me ROFLs contrarianly...

-- 
The true danger is when liberty is nibbled away,
for expedients, and by parts.  -- Edmund Burke