The OOM-Killer vs. Python

Mon Mar 25 04:58:15 EST 2002

In article <Xns91DBED451517cliechtigmxnet at 62.2.16.82>, Chris Liechti wrote:

> gerson.kurz at t-online.de (Gerson Kurz) wrote in news:3c9e2b5f.9993062
> @news.t-online.de:
>> I have a python-based SMTP server (see http://p-nand-
>> q.com/shicks.htm)
>> running on our server, and in general it has worked flawless (since
>> about nov 2001). However, in the recent week that dreaded linux OOM
>> killer twice killed the python process. [The machine has 768mb ram 
>> call me oldfashioned but that SHOULD be enough for both Linux & 
>> Python to get along, really. OK, its running KDE, and has only 256mb 
>> for swap, but still...]

> maybe you have some other leaking app and the OOM killer just picks 
> the largest. if you want a reliable server i wouldn't use it as 
> workstation too :-)

>> Anyway, even though I believe that this is more of a fault of the
>> Linux Kernel VM quality than the script (the system has been 
>> running fine for months and now two kills in one week - that smells 
>> fishy 

 About two years ago (March of 2000) there was a huge flamewar
 on the Linux kernel mailing list (LKML) about a proposed set of
 patches to allow users (admins) to disable "overcommit."

 When your program uses malloc (implicitly done by your Python
 processes, of course) then the Linux kernel will return success
 even if there isn't physically enough memory+swap to guarantee
 that the whole block of memory can be supplied.  This is called
 "overcommit" (and is common among UNIX and other general purpose
 operating systems).  

 Searching Google's Linux pages on "overcommit" or "disable overcommit"
 or even "disable overcommit" and "patch" will quickly bring up various
 subsets of that discussion.

 Unfortunately I don't know the current status of Linux sysctl to control
 the "overcommit" vm features. I'm cross posting this to c.o.l.d.s
 (comp.os.linux.dev.system) in hopes of getting a clue.
 I see a /proc/sys/vm/overcommit_memory entry on my  2.4.9 kernel which
 might be either to a magic sysctl node.  You might be able to 
 echo -1 > into that node (virtual file) to disable overcommit.  That
 should cause malloc()'s to fail when there isn't enough RAM+swap to
 satisfy a request.  

 Note that this might not actually solve your problem.  It should
 prevent the OOM killer from becoming active (since you won't truly
 be "out-of-memory") but might cause programs to abend (abnormally
 terminate themselves) when their malloc's return -ENOMEM.  If you have
 swap active it also might make the system go into vm thrashing and it
 might make the whole system seem much worse because many of the 
 (formerly innocuous) malloc()s will be going unsatisfied.

 Personally I think the various libraries, compilers and applications
 which are blindly allocating far more memory then they actually use are
 really the heart of the whole overcommit problem.  That might not be 
 Python (or it might be indirectly due to Python's use of some libraries 
 on your system, possibly it could be glibc and libm bloating).
 Unfortunately memory is a shared resource, so it only takes one bad
 appl. (so to speak) to ruin the whole basket for all of us.