[Mailman-Developers] big list
Graham TerMarsch
mailman@howlingfrog.com
Fri, 8 Mar 2002 17:50:55 -0800
On Friday 08 March 2002 15:07, you wrote:
> No: I said "it's going out fast", but I was confused: messages from
> other lists were going out fast. Those messages were already very slow,
> one every 20 seconds... must be the time needed to lock + read the
> list.db + unlock.
I'd expect that this is likely the problem. I manage a list for a
customer who has ~180k addresses in Mailman, and after solving MTA
problems we found that the next biggest bottleneck was the Mailman data
store. It generally took 8-10 seconds on a Dual-Athlon-1600Mhz machine to
process a single bounce message, just about all of which was chewed up in
doing locking and IO (ok, probably just IO).
The actual size of the "config.db" file was close to 12MB when we started
having major problems with it, never mind that when slurped in QRunner
generally ate up 50-70MB of RAM while processing the queue. With the CPU
pegged at 95+% usage from QRunner, I'd totally expect that most of our CPU
time was just spent spinning while slurping in and spitting out the list
after each and every update was made to the list.
For our particular case, we found that by splitting out the larger list
into a series of smaller lists (e.g. one for each letter of the alphabet),
we were able to _substantially_ reduce the overhead involved and got
processing back down to 5-6msgs/sec again. I wouldn't necessarily say
that what we've done is the "be all and end all" answer to this, though,
it just pushed the problem back to a later date for us, giving us enough
time to look at other things before it rears its ugly head again as some
lists grow faster than others ("s" is always a big list).
For point of reference, we're running this on Mailman-2.0.8, using
Postfix-1.1.3 as the MTA. We used to have it running on Sendmail-8.12,
which worked quite well and handled the load without any problems, but had
a tendency to create MUCH more IO than Postfix does. Being that I haven't
yet gotten my dream IO system for this machine, the shift to Postfix was
necessary to get msgs flying out the door faster. There are still other
bottlenecks in the system, but after our updates Mailman wasn't one of the
major ones on the list any more.
I'll also note that our install has a number of the Bounce handlers
removed, to make the processing chain shorter. We ran many megs of debug
logs spitting out information about which Bounce handler processed a given
msg, and disabled all of those that were at the bottom of the list. I
think all we've got left in there right now is 'Postfix', 'DSN', and
'Catchall', which catches more than 90% of the bounces we get sent back
through the system. Isn't perfect for catching and processing all of the
bounces, but the reduction in pipeline length for the bounce processing
made a noticable difference in performance. This, however, I expect is
related to the performance not of Mailman itself, but of Python in
general, in the context of the regexes and the MIME libraries that Mailman
is using.
--
Graham TerMarsch
Howling Frog Internet Development, Inc. http://www.howlingfrog.com