[Mailman-Users] Config dump? - WAS Re: speed up mailman

Sun Feb 28 22:04:24 EST 2016

Ruben Safir writes:

 > When I read this, it just seems to me that you guys don't know how
 > the software works.

This is user-supported software.  Not everybody is all that expert,
but people contribute when and what they can.  And mild thread-
hijacking like asking if there's a way to get certain information
related to the thread isn't unusual.

 > I posted logs to postfix and what they are asking is frankly
 > impossible to acquire.

Then perhaps you have to live with the delays.  They (like us) ask for
information that is known to have been useful in diagnosing problems
in the past.  Without information, it's impossible to diagnose.

By the way, it is not unheard of that restarting the mail system
clears up backlogs just like that.  (Don't bet your house on it, but
it does happen.)  A typical reason is that in normal operation,
"exponential backoff" is used so that delays from retries increase
with each retry until you've tried for several days, but on system
startup there's provision for an attempt to "flush" the queue right
away, not worrying about when the next try for a message is scheduled.
If that succeeds, the backlog disappears.

Of course if there's a persistent problem, the backlog will reappear.

 > I can't get answers to basic questions such as

Short answer: you didn't ask.  In the posts I've seen, you just
announced that it wasn't working as you expected.

 > When the email comes to mailman, where does it go.

First it is delivered to the mailman program itself over a pipe,
according to the alias in the Postfix configuration.  If it's a post,
it is then saved in a .pck file in a queue directory (typically
/usr/var/mailman/queue/incoming/).  The incoming runner checks the
directory for changes, picks up the new .pck when it appears, decides
what to do with it after checking for spam, inserting footer and other
mailing list stuff like List-* headers, and finally puts in in one or
more queues (eg, archive, outgoing).  Those runners do the same
dance.  Typically the whole process occurs in under a second.

 > How does the MTA know to pick up the mail.

The Mailman outgoing runner connects to the well-known port for mail,
where the MTA is listening.  I think it's possible to configure
Mailman to use the "sendmail" program via stdin, but in modern systems
(specifically Postfix) it's much more efficient to use a socket, since
the sendmail program itself often just drops the message in a queue to
be processed by a daemon serving the outgoing queue.

 > It seems to process mail in sweeps, rather than in real time when
 > mail arrives.

What do you mean by "real time"?  If you look at what Postfix does,
it's just like Mailman: composed of several programs, each with a
specific responsibility, that receives a message as a file in a queue
directory, processes it, places the output in a file in another queue
directory, and only if that was successful, removes the input queue
file.  The final step is to connect to another mail server over the
network, but again the input queue file is not removed until the
remote server says "250 OK", which means that the message has been
saved as a file on the remote system.  This can be done quickly (my
system typically processes a post end-to-end in under a second, though
at most 20 subscribers), but often not in what communication engineers
mean by "real time".

In fact, when you see stuff like this:

 > 2016-02-28T10:27:00.724625-05:00 www postfix/smtpd[21374]: NOQUEUE: reject: RCPT from www.mrbrklyn.com[96.57.23.82]: 450
 > +4.1.2 <dyfet at gnutelephony.org>: Recipient address rejected: Domain not found; from=<hangout-bounces at nylxs.com>
 > +to=<dyfet at gnutelephony.org> proto=ESMTP helo=<www.mrbrklyn.com>

what you're seeing is that you have quite a few possibly invalid
addresses that you're trying to send to.  In fact, the first ten were
rejected without a single success -- I don't see how any mailing list
manager could deal efficiently with such a high rate of failure,
especially if it frequently involves DNS failures as this one does,
and several of the other log entries report the same outcome.  Note
that the failure is considered temporary.  IIRC, that means that DNS
lookup failed with no result, not that the relevant nameserver (the
.org rootserver in this case) said there was no such domain.  That
probably means multiple timeouts on DNS lookups, each of which might
take 30 seconds.  (I tried "host gnutelephony.org" myself, and got a
DNS timeout after 30 seconds.)  If you have two nameservers
configured, it seems likely that we have just found one common reason
it takes your Mailman 60+ seconds to process one queue entry.

Temporary failure also means that the queue file will remain, as the
system believes that retrying may succeed.  (Any mailing list manager
that does not preserve the message in this way is losing mail.)  I
don't know precisely what you mean by "sweeps", but the fact that
there are quite a few temporary failure queuefiles hanging around
would account for apparently unrelated posts being processed at the
same time.

 > That is seperate greps of TO and FROM'S to the maillist.  Obviously
 > it is very difficult to know what is happening by looking at the
 > logs.  Majordomo sends email straight to the MTA though aliases and
 > a pipe.

I certainly hope it wouldn't do that in the cases reported in the log
in your post, since the overwhelming majority are temporary failures.
It needs to be prepared to save the message to retry later.  I suppose
it's possible that Majordomo would have dealt with whatever the
situation actually is more efficiently, but I suspect that it would
have problems of some kind, and you just happened to change to Mailman
at a time when the list's environment became unstable.

I suspect that your problem involves your DNS, since so many of the
posts are rejected with "Domain not found".  Or it could be that you
just have an overwhelming majority of addresses with invalid domains,
or even that Postfix is misconfigured to report temporary failure
(450) when it should be reporting permanent failure (nonexistent
domain, 550).

By the way, when did you switch to Mailman?  Did you start
experiencing this problem immediately when you did?