[Mailman-Users] Memory usage

Thu Dec 6 03:27:13 CET 2007

On 12/5/07, Grigory Batalov wrote:

>   The problem is that some qrunners quickly eat memory. Most of them
>   use 20-37Mb after 13 hours of running. But today several qrunners
>   6 times took above 200Mb! Fortunately now I have Monit that checks
>   memory usage, and kills such runners.

I'm reasonably sure there aren't any memory leaks in the Mailman or 
Python code, but unless someone who is an expert in locating memory 
leaks in the code can step forward and give us a complete 
stem-to-stern audit and give us hard confirmation one way or the 
other, we're not likely to get any further down this road.

If you can look at the code and tell us that you're definitely 
finding memory leaks, then I'm sure that the core developers will 
look very closely at that.  Otherwise, I know that this is one of 
they things they're always on the lookout for, and they eliminate 
them as soon as they find them.

If you are, or you can get, a Linux performance tuning expert to look 
closely at your system and tell you exactly what is going on, we'd 
love to find out what they have to say.  But we're not Linux 
performance tuning experts ourselves, and it's hard for us to try to 
guess as to why you're seeing such strange behaviour when I certainly 
don't recall hearing any such reports from anyone else in a very long 
time.

The last time we had such reports, it was because someone didn't 
understand the nature of how Unix-like OSes work and how they 
aggressively try to cache everything in memory, which is why I wrote 
the FAQ entry that you do not find to be of any use.

I am not a Linux performance tuning expert, but I have a fair amount 
of experience in doing general purpose Unix performance tuning, and I 
have a certain amount of lower-level kernel knowledge of how the 
various components within most Unix-like OSes interact with each 
other.

My problem is that I don't fully understand how this knowledge could 
be transferred or translated into a Linux environment.

>   I wrote previous letter after server failure when 2 greedy qrunners
>   took 249 and 235 Mb. In that moment even crond couldn't fork and
>   mail delivery was aborted.

I don't know what's going on.  I didn't see it happen.  From what I 
have seen of what your tools are reporting, there's definitely some 
very strange stuff going on, but I can't tell if the problem is that 
the tool is broken and therefore it's not reporting useful 
information, or if there is something else going on.

Certainly, your tools should not be saying that there is literally 
zero memory that is active, and literally zero memory that is 
inactive, with over a gigabyte of RAM being marked as free.  That's 
absolutely the furthest away possible type of situation that we would 
expect to see, based on what you're reporting in terms of how much 
memory is being used by the queue runners.

>   After that I have increased memory limit to 2Gb and started Monit
>   daemon to prevent such failure.

That may help, but until you figure out why netstat is reporting such 
totally and completely bogus numbers, I really don't think you're 
going to get anywhere that is very useful.

I suspect, but I have no evidence to back up this claim, that the 
problem may be related to the fact that you're running under a 
virtualization system.

I would suggest trying to run Mailman and the MTA directly underneath 
the primary OS on the machine (frequently called "domain zero" or 
"dom0" in virtualization parlance), and see if that at least helps 
the tools produce information that makes more sense.

Running under dom0 may not solve the actual underlying problem of the 
Mailman queue runners sucking up so much RAM, but at the very least 
it would help reduce the complexity of the system we're trying to 
help you debug.

-- 
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>