[Mailman-Users] Mailman 2.1.6 slowness...?

Sat Jul 16 17:16:42 CEST 2005

Short version -- all good tips; many thanks.  We're investigating them  
all, but as you suggested, we're changing one thing at a time.   
Checking them thoroughly takes a little while, so I probably won't be  
able to report anything intelligent back on our results for a week or  
so (gotta wait for peak usage, for one thing).

A few more detailed remarks below.

On Jul 16, 2005, at 9:51 AM, Brad Knowles wrote:

>>  Ah -- thanks for clarifying.  This is not really what is implied in  
>> the
>>  FAQ  
>> (http://www.python.org/cgi-bin/faqw-mm.py? 
>> req=show&file=faq04.012.htp)
>>  -- I read it to  imply that mailman is grouping messages to the MTA  
>> by
>>  domain in order to boost minimum number of deliveries to a given  
>> target
>>  domain.
>
> 	No.  He's talking about the network bandwidth that will be used
> by the MTA, once it has accepted all the messages and recipients from
> Mailman.  The MTA is forced to send no more than X recipients to a
> given target site, because no more than X recipients exist at that
> site.

Gotcha -- I understand that.

But the inference that I made was that if mailman did not group domains  
together in the 1000 member list shown in the example, then his numbers  
are not necessarily accurate.  Specifically, mails to @example.com may  
not be conveniently lumped into 1 or 2 transfers to the MTA -- in a  
worst case, they may be spread out across 1000/SMTP_MAX_RCPTS transfers  
to the MTA.  In this case, the MTA has no way of knowing that they are,  
in fact, the same message, and therefore may have to initiate  
1000/SMTP_MAX_RCPTS connections to the same remote MTA at example.com.   
Or, even if the MTA can tell that they are the same message, due to  
race conditions and/or CPU load, the MTA may be eagerly delivering  
messages to the remote MTA, and therefore still have to initiate  
1000/SMTP_MAX_RCPTS connections to the example.com MTA (e.g., if one  
message is fully sent to example.com's MTA before the next [identical]  
one arrives at the local MTA from mailman).

Hence, I assumed that he was implicitly saying that mailman was  
grouping domains when it transferred to the MTA (i.e., packed as many  
@example.com's into a single MTA transfer as possible [until exhausted]  
-- repeating for all like domains in the recipient list), and therefore  
could guarantee that there would only be 1 or 2 transfers to the remote  
MTA (based on the numbers in his example).

However, it's quite possible that my logic is incorrect here...  :-)

>> We have pumped our SMTP_MAX_RCPTS down to 5 (we had never changed it
>> [snipped]
>
> 	One thing I would encourage you to do is to change just one thing
> at a time, and see what the effects are.  With regards to reducing
> SMTP_MAX_RCPTS, I would encourage you to reduce the value by roughly
> half at each stage.  So, go from 500 to 250, 250 to 125, 125 to 62,
> 62 to 32, etc....  This way, you should get a better idea of what the
> real threshold is.

Will do.

>>  Right -- sorry, I didn't mean to imply otherwise.  We were not  
>> surprised
>>  by this, either.  I was trying to say that we've seen this behavior  
>> for
>>  a long time and didn't have any performance issues with it.
>
> 	This sort of thing happens all the time with all sorts of
> systems.  People will notice that their tires seem a little low, and
> there is some smoking coming out of the tailpipe, but they won't do
> anything about it until the car blows up or the tires come off the
> rims, etc....  That's when they take the car to the mechanic.
>
> 	With computers, people may notice that queues get really long,
> but they'll think that this is perfectly normal and acceptable, until
> something bad happens.  That's when they go looking for help.
> They've been seeing all the signs that something bad was likely to
> happen soon, but they didn't recognize them for what they were.

Indeed.  Delivery on our big lists had *always* been [relatively] slow;  
as you said, we always assumed that that was the way it was supposed to  
work.  But deliver for our small lists had always been fairly quick --  
when it changed to be fairly slow, that was an indication that  
something was wrong.

>>  The thought occurs to me that perhaps it wasn't our sendmail guys who
>>  changed something, but perhaps the guys in the  
>> anti-spam/virus-checking
>>  crew changed something (I believe they also check outgoing mails for
>>  some insundry list of things that they believe indicates  
>> spam/viruses).
>>  Hmm.  Need to go ping them, too...
>
> 	Yeah, gotta talk to them, too.  The recommended practice for
> mailing lists is to check messages on input, but don't try to check
> them on output -- after all, the messages were already demonstrated
> to be clean on input.
>
> 	You may or may not be able to do this at your site, but you
> should at least check with them.

Yes, that's exactly what I was thinking.  I'm not sure what our  
anti-spam/anti-virus stuff is doing, but I won't be able to talk to the  
guys who run that stuff until Monday.

> 	The problem could also be DNS or reverse DNS.  Those kinds of
> things can really slow down MTAs, as they check their incoming
> connections.  If a DNS server is flaking out, the MTAs could be
> taking much longer than they used to in order to do all the same
> sorts of checks that they've always been doing.

We checked into that, and seem to have a pretty reliable DNS connection  
(and its cached locally).

I don't think we're a victim of tarpit kinds of remote MTAs, but even  
if we are, lowering the SMTP_MAX_RCPTS should help with that, right?   
That is, if a recipient has a slow MTA, then *essentially* only the  
other (SMTP_MAX_RCPTS-1) other recipients will be penalized (because  
the others will be occurring in more-or-less parallel).  Is that right?

Thanks again!

-- 
{+} Jeff Squyres
{+} jsquyres at osl.iu.edu
{+} Post Doctoral Research Associate, Open Systems Lab, Indiana  
University
{+} http://www.osl.iu.edu/