[Mailman-Developers] big list

Graham TerMarsch mailman@howlingfrog.com
Fri, 8 Mar 2002 17:50:55 -0800


On Friday 08 March 2002 15:07, you wrote:
> No: I said "it's going out fast", but I was confused: messages from
> other lists were going out fast. Those messages were already very slow,
> one every 20 seconds... must be the time needed to lock + read the
> list.db + unlock.

I'd expect that this is likely the problem.  I manage a list for a 
customer who has ~180k addresses in Mailman, and after solving MTA 
problems we found that the next biggest bottleneck was the Mailman data 
store.  It generally took 8-10 seconds on a Dual-Athlon-1600Mhz machine to 
process a single bounce message, just about all of which was chewed up in 
doing locking and IO (ok, probably just IO).

The actual size of the "config.db" file was close to 12MB when we started 
having major problems with it, never mind that when slurped in QRunner 
generally ate up 50-70MB of RAM while processing the queue.  With the CPU 
pegged at 95+% usage from QRunner, I'd totally expect that most of our CPU 
time was just spent spinning while slurping in and spitting out the list 
after each and every update was made to the list.

For our particular case, we found that by splitting out the larger list 
into a series of smaller lists (e.g. one for each letter of the alphabet), 
we were able to _substantially_ reduce the overhead involved and got 
processing back down to 5-6msgs/sec again.  I wouldn't necessarily say 
that what we've done is the "be all and end all" answer to this, though, 
it just pushed the problem back to a later date for us, giving us enough 
time to look at other things before it rears its ugly head again as some 
lists grow faster than others ("s" is always a big list).

For point of reference, we're running this on Mailman-2.0.8, using 
Postfix-1.1.3 as the MTA.  We used to have it running on Sendmail-8.12, 
which worked quite well and handled the load without any problems, but had 
a tendency to create MUCH more IO than Postfix does.  Being that I haven't 
yet gotten my dream IO system for this machine, the shift to Postfix was 
necessary to get msgs flying out the door faster.  There are still other 
bottlenecks in the system, but after our updates Mailman wasn't one of the 
major ones on the list any more.

I'll also note that our install has a number of the Bounce handlers 
removed, to make the processing chain shorter.  We ran many megs of debug 
logs spitting out information about which Bounce handler processed a given 
msg, and disabled all of those that were at the bottom of the list.  I 
think all we've got left in there right now is 'Postfix', 'DSN', and 
'Catchall', which catches more than 90% of the bounces we get sent back 
through the system.  Isn't perfect for catching and processing all of the 
bounces, but the reduction in pipeline length for the bounce processing 
made a noticable difference in performance.  This, however, I expect is 
related to the performance not of Mailman itself, but of Python in 
general, in the context of the regexes and the MIME libraries that Mailman 
is using.

-- 
Graham TerMarsch
Howling Frog Internet Development, Inc.   http://www.howlingfrog.com