[Mailman-Users] Mailman 2.1.6 slowness...?
Jeff Squyres
jsquyres at osl.iu.edu
Sat Jul 16 17:16:42 CEST 2005
Short version -- all good tips; many thanks. We're investigating them
all, but as you suggested, we're changing one thing at a time.
Checking them thoroughly takes a little while, so I probably won't be
able to report anything intelligent back on our results for a week or
so (gotta wait for peak usage, for one thing).
A few more detailed remarks below.
On Jul 16, 2005, at 9:51 AM, Brad Knowles wrote:
>> Ah -- thanks for clarifying. This is not really what is implied in
>> the
>> FAQ
>> (http://www.python.org/cgi-bin/faqw-mm.py?
>> req=show&file=faq04.012.htp)
>> -- I read it to imply that mailman is grouping messages to the MTA
>> by
>> domain in order to boost minimum number of deliveries to a given
>> target
>> domain.
>
> No. He's talking about the network bandwidth that will be used
> by the MTA, once it has accepted all the messages and recipients from
> Mailman. The MTA is forced to send no more than X recipients to a
> given target site, because no more than X recipients exist at that
> site.
Gotcha -- I understand that.
But the inference that I made was that if mailman did not group domains
together in the 1000 member list shown in the example, then his numbers
are not necessarily accurate. Specifically, mails to @example.com may
not be conveniently lumped into 1 or 2 transfers to the MTA -- in a
worst case, they may be spread out across 1000/SMTP_MAX_RCPTS transfers
to the MTA. In this case, the MTA has no way of knowing that they are,
in fact, the same message, and therefore may have to initiate
1000/SMTP_MAX_RCPTS connections to the same remote MTA at example.com.
Or, even if the MTA can tell that they are the same message, due to
race conditions and/or CPU load, the MTA may be eagerly delivering
messages to the remote MTA, and therefore still have to initiate
1000/SMTP_MAX_RCPTS connections to the example.com MTA (e.g., if one
message is fully sent to example.com's MTA before the next [identical]
one arrives at the local MTA from mailman).
Hence, I assumed that he was implicitly saying that mailman was
grouping domains when it transferred to the MTA (i.e., packed as many
@example.com's into a single MTA transfer as possible [until exhausted]
-- repeating for all like domains in the recipient list), and therefore
could guarantee that there would only be 1 or 2 transfers to the remote
MTA (based on the numbers in his example).
However, it's quite possible that my logic is incorrect here... :-)
>> We have pumped our SMTP_MAX_RCPTS down to 5 (we had never changed it
>> [snipped]
>
> One thing I would encourage you to do is to change just one thing
> at a time, and see what the effects are. With regards to reducing
> SMTP_MAX_RCPTS, I would encourage you to reduce the value by roughly
> half at each stage. So, go from 500 to 250, 250 to 125, 125 to 62,
> 62 to 32, etc.... This way, you should get a better idea of what the
> real threshold is.
Will do.
>> Right -- sorry, I didn't mean to imply otherwise. We were not
>> surprised
>> by this, either. I was trying to say that we've seen this behavior
>> for
>> a long time and didn't have any performance issues with it.
>
> This sort of thing happens all the time with all sorts of
> systems. People will notice that their tires seem a little low, and
> there is some smoking coming out of the tailpipe, but they won't do
> anything about it until the car blows up or the tires come off the
> rims, etc.... That's when they take the car to the mechanic.
>
> With computers, people may notice that queues get really long,
> but they'll think that this is perfectly normal and acceptable, until
> something bad happens. That's when they go looking for help.
> They've been seeing all the signs that something bad was likely to
> happen soon, but they didn't recognize them for what they were.
Indeed. Delivery on our big lists had *always* been [relatively] slow;
as you said, we always assumed that that was the way it was supposed to
work. But deliver for our small lists had always been fairly quick --
when it changed to be fairly slow, that was an indication that
something was wrong.
>> The thought occurs to me that perhaps it wasn't our sendmail guys who
>> changed something, but perhaps the guys in the
>> anti-spam/virus-checking
>> crew changed something (I believe they also check outgoing mails for
>> some insundry list of things that they believe indicates
>> spam/viruses).
>> Hmm. Need to go ping them, too...
>
> Yeah, gotta talk to them, too. The recommended practice for
> mailing lists is to check messages on input, but don't try to check
> them on output -- after all, the messages were already demonstrated
> to be clean on input.
>
> You may or may not be able to do this at your site, but you
> should at least check with them.
Yes, that's exactly what I was thinking. I'm not sure what our
anti-spam/anti-virus stuff is doing, but I won't be able to talk to the
guys who run that stuff until Monday.
> The problem could also be DNS or reverse DNS. Those kinds of
> things can really slow down MTAs, as they check their incoming
> connections. If a DNS server is flaking out, the MTAs could be
> taking much longer than they used to in order to do all the same
> sorts of checks that they've always been doing.
We checked into that, and seem to have a pretty reliable DNS connection
(and its cached locally).
I don't think we're a victim of tarpit kinds of remote MTAs, but even
if we are, lowering the SMTP_MAX_RCPTS should help with that, right?
That is, if a recipient has a slow MTA, then *essentially* only the
other (SMTP_MAX_RCPTS-1) other recipients will be penalized (because
the others will be occurring in more-or-less parallel). Is that right?
Thanks again!
--
{+} Jeff Squyres
{+} jsquyres at osl.iu.edu
{+} Post Doctoral Research Associate, Open Systems Lab, Indiana
University
{+} http://www.osl.iu.edu/
More information about the Mailman-Users
mailing list