[Mailman-Users] Posts from listowner address issue

Mark Sapiro mark at msapiro.net
Sun Dec 21 00:13:30 CET 2008


J.A. Terranson wrote:
>
>On Sat, 20 Dec 2008, Mark Sapiro wrote:
>
>> mailmanctl stop should have stopped the last instance started, but yes,
>> it isn't going to stop everything in this situation.
>
>Would a killall type of functionality be contraindicated in mailmanctl 
>stop?


It would have to know what processes to kill. The masters (mailmanctl
processes) know which runners they started, but when you signal a
master with bin/mailmanctl whatever, the specific master that get's
signaled is the one who's PID is in the data/master-qrunner.pid file
in *this one's* var_prefix. It could detect in a possibly OS dependant
way that there are other masters running, but it can't know that they
aren't from other disjoint Mailman instances on the same server, so it
doesn't know if they should be SIGTERMd or not.

When a "duplicate" mailmanctl is started in a way that overrides the
locks (or the script removes the locks first - I've seen scripts like
that), it overwrites the data/master-qrunner.pid file and the first
PID is lost.

I suppose data/master-qrunner.pid could be converted to a stack of
PIDs, but that's just a way of recovering from a situation that
shouldn't occur in the first place.

Actually, I think there is an issue in that mailmanctl -s start is only
supposed to ignore the lock if it was created by a PID which is no
longer running, and I'm not sure that code is correct. I have to look
at it some more.


>> >The files referenced were nowhere to be found, so picking them apart is a 
>> >non starter.  Looks like a race condition: Does mailman not check to see 
>> >if it's already running?
>> 
>> It does unless it is forced not to. The issue is that the check is via
>> lock files and init scripts tend to force override of the checks on
>> the theory that any lock files are residue from a prior boot.
>
>We do not use -s on the init script.


How did 5 sets of mailmanctl and qrunners get started? Have you figured
out how that happened?


>> Are you saying that fixing the multiple qrunner/Mailman instance issue
>> solved the missing mail problem? I'd be very surprised if that were
>> the case.
>
>Yes.  It appears to have completely resolved it.


Well, as I said I'm surprised.  I'm glad it is resolved, but I'm at a
loss to explain why having multiple runners serving the same queue
entries would cause non-delivery of a post to a subset of the list
members. Apparently it either did or there was some other issue that
was fixed by stopping everything and restarting.


>> Also, you might do
>> 
>> bin/list_members --regular --nomail=enabled ccm-l | grep -i missing_adr
>
>Returns a null


Perhaps you misunderstood. 'missing_adr' was supposed to be the address
(yours) that wasn't being delivered. If it was that in the above, that
means the address is not a regular member with delivery enabled so it
shouldn't be receiving posts.


>> just to be sure.
>> 
>> Then check Mailman's smtp log for an entry like
>> 
>> Dec 20 08:39:58 2008 (30746) <message-id> smtp to ccm-l for nnn recips,
>> completed in t.ttt seconds
>
><system brought up from maintenance>
>
>Dec 20 05:23:34 2008 (1368) <mailman.0.1229772212.558.ccm-l at ccm-l.org> 
>smtp to ccm-l for 1 recips, completed in 0.611 seconds


This is a Mailman generated notification of some kind.


>Dec 20 06:35:59 2008 (1368) <49550272967245C1A49355A8953D0437 at PANDESK> 
>smtp to med-jokes for 157 recips, completed in 23.833 seconds
>
><snip>
>
><somewhere in here is where I hand killed all the processes and restarted 
>mailman>
>
>smtp to med-events for 103 recips, completed in 14.170 seconds
>Dec 20 12:11:05 2008 (5688) <c39.49b55b86.367d8d7a at aol.com> smtp to 
>med-jokes for 157 recips, completed in 11.838 seconds
>Dec 20 12:11:28 2008 (5688) 
><68fd2c7c0812200550q32808396vf875b6ec66f9c02 at mail.gmail.com> smtp to 
>med-jokes for 156 recips, completed in 22.845 seconds
>
>157==correct, but one is unreachable right now due to cable cut.


More likely, the <c39.49b55b86.367d8d7a at aol.com> post was sent by
Mailman to all 157 members and the
<68fd2c7c0812200550q32808396vf875b6ec66f9c02 at mail.gmail.com> post was
a reply that had the OP in To: or Cc: so Mailman didn't send to that
address and only sent to the other 156.

The unreachable address should be delivered by Mailman to the MTA and
only detected by the MTA when it attempts delivery. If the MTA is
actually checking whether the address is deliverable during Mailman's
SMTP to the MTA, Mailman's performance will suffer greatly. Plus,
there would be something for this address in Mailman's smtp-failure
log.


>Dec 20 12:11:30 2008 (5688) 
><mailman.0.1229796653.5686.med-jokes at ccm-l.org> smtp to med-jokes for 1 
>recips, completed in 1.341 seconds
>Dec 20 12:11:31 2008 (5688) 
><mailman.0.1229796662.6005.med-jokes at ccm-l.org> smtp to med-jokes for 1 
>recips, completed in 1.532 seconds
>Dec 20 12:11:33 2008 (5688) 
><mailman.0.1229796681.6055.med-jokes at ccm-l.org> smtp to med-jokes for 1 
>recips, completed in 1.618 seconds
>Dec 20 12:12:03 2008 (5688) 
><mailman.0.1229796721.6065.med-jokes at ccm-l.org> smtp to med-jokes for 1 
>recips, completed in 0.603 seconds
>Dec 20 12:12:04 2008 (5688) 
><mailman.1.1229796721.6065.med-jokes at ccm-l.org> smtp to med-jokes for 1 
>recips, completed in 0.551 seconds


These 5 are all Mailman notices.


>< note that there are zero entries for CCM-L up to this point, despite 
>archives to the contrary, and replies which show distribution to users 
>[but not to poor old me :-(]  >


This says there was some problem with OutgoingRunner.  I don't know
what it would be, but if it is operating correctly, it will write the
smtp log and the post log for every post it processes. Presumably it
was sending posts because some people were receiving them (It seems
unlikely that the list replies would all have come in response to
off-list Ccs). But if it is sending posts and not logging them, it's
messed up somehow.

I suppose it could be due to a race condition between multiple runners
even though I don't understand exactly how, but why just one list?


<log entries snipped>
>
>Etc.  all seems pretty normal right now.
>
>> have to look at the MTA log to see what happened to the missing
>> recipient(s).
>
>I did that (but did not mention it, as...), but it never made it to the 
>MTA for my address.
>
>WTH?  I can think of no possible way for just one address of no special 
>significance other that it is also listowner to archive as if delivered, 
>but never to make it to the MTA or even to be logged by mailman...


I'm equally mystified.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list