[Mailman-Developers] Serious I/O contention issue

Mike Bradley mike at logomanager.co.uk
Wed Oct 15 08:15:35 EDT 2003


> -----Original Message-----
> From: Brad Knowles [mailto:brad.knowles at skynet.be] 
> Sent: 14 October 2003 23:15
> 
> 	Just checking, but have you seen the following FAQ 
> entries?  See:
> 
> 		<http://www.python.org/cgi-bin/faqw-mm.py?req=all#4.11>
> 		<http://www.python.org/cgi-bin/faqw-mm.py?req=all#4.12>
> 		<http://www.python.org/cgi-bin/faqw-mm.py?req=all#6.4>
> 		<http://www.python.org/cgi-bin/faqw-mm.py?req=all#6.6>
> 
> 	I figure you probably have already seen them, but I 
> wanted to be sure.

Thanks Brad, yep, I've gone through these.  The problem doesn't *appear*
to be with Postfix - as I mentioned, there are delays for every
operation, and that includes the web interface and commands that don't
involve Postfix.  Doing an strace on the Qrunner showed that most of the
time was being spent reading and writing the whole list from a file on
disk, so I think the bottleneck is there rather than with the mail
server.

One thing I did notice was that disable_dns_lookups=yes is recommended
for performance reasons - surely this would stop Postfix from working
altogether, as DNS lookups are needed to send mail (I tried switching
this option and mail delivery did indeed stop working!)

> 	What about the machine you're doing this on?  The filesystem? 
> How is postfix configured?
> 
> 	Clearly, there may well be lots of opportunity here to 
> tune your 
> filesystem for maximum performance.

This is an area that I have shied away from in the past, as our server
is managed, so my knowledge of how to tune the filesystem is very
sketchy!  I would say that the server is quite busy with lots of
database accesses, and there have been no noticable filesystem
performance problems in the past, no matter how much load I have put on.
It also had no problem dealing with virus scanning and bouncing 10
incoming Slapper viruses per second last month while running the rest of
the stuff I have.

> 	Depending on what your "SMTP_MAX_RCPTS" value is set 
> to, I would 
> imagine that this should be loaded and saved each time a message is 
> passed from your mailman qrunner to postfix.  The higher 
> SMTP_MAX_RCPTS, the less often this process should occur.  Of course, 
> others have found that SMTP_MAX_RCPTS should typically be set 
> somewhere between 2 and 10 (usually ~5) for best overall performance 
> (see the FAQ entries above).

The problem is that with a list the size of ours this is causing the
system to read and write to the disk almost continuously when more than
a few operations of any sort are queued (whether they involve sending
mails or not).  I mentioned that the qrunners appear to be designed to
cache instances of the mlist in memory, but then reload/save to disk
every operation regardless.  When I commented out the mlist.Load() in
the OutgoingRunner inner loop, that particular component zipped along
with no performance problems (though my lack of knowledge of Python
means that I can't tell if the mlist is shared and marshalled between
the qrunners, and thus I don't know if it is valid to skip this reload).

If this was indeed a valid thing to do, then it might also be OK to
delay Save() operations so that they didn't occur so often (i.e. every N
operations, when idle, or on the final release of the mlist).  As I
said, it APPEARS that the caching of the lists in Runner.py is intended
to work this way, but I don't know enough about it to say for sure!

Mike




More information about the Mailman-Developers mailing list