[Mailman-Users] Bounce processing not working - Update

Mark Sapiro mark at msapiro.net
Wed Aug 12 18:41:33 CEST 2009


Lindsay Haisley wrote:

>I restarted (twice) the qrunner suite of processes from the system
>command line using the system init scripts (/etc/init.d/mailman) with
>two noticeable results.
>
>First, an egregious number of "Bounce action notifications" and "list
>unsubscribe notifications" went out on bounces for lists on which I'm
>listed as an owner, including the one that brought this problem to my
>attention.  Some notifications date back a couple of months so this is
>apparently a problem of some duration.


I would have to see the /etc/init.d/mailman script to know for sure,
but I'm guessing there is something in it that recovers old, stale
bounce-events-ppppp.pck files. These files were left behind with the
offending bounces when the 2.1.11 bug threw the exception that caused
BounceRunner to die without saving the updated list with the bouncing
member removed.

Note that this bug, addressed in my earlier reply, only occurs when
bounce_you_are_disabled_warnings = 0.


>Second, many subscribers to the problem list received multiple copies of
>the most recently queued post.  Could this be because I stopped and
>restarted the qrunners several times?  Why would this cause multiple
>copies to be sent?


Yes, it could be. You stopped Mailman which signalled OutgoingRunner to
stop in the middle of delivering the post. If somehow OutgoingRunner
was SIGKILL'd, it would have stopped mid-delivery and when mailman
restarted, the backup out queue entry was recovered and the post was
delivered to all list members, some of whom had been delivered before.
However this is not what normally happens. It is supposed to be
SIGTERM'd and finish it's current delivery. Perhaps there's something
in the init.d script that will SIGKILL it if it doesn't stop soon
enough, or perhaps Mailman was restarted before OutgoingRunner
finished and the new OutgoingRunner 'recovered' the old runner's
backup queue entry, but this would result in everyone receiving a
duplicate unless something outbound of Mailman dropped the duplicate
message.


>I should also note that the bouncing subscribers were _still_ not
>unsubscribed, nor was the nomail flag set for those for whom a soft
>bounce was received.


This is the 2.1.11 bug addressed in my earlier reply.


>All qrunner processes were (and are still) running, or at least
>according to the process table.  Can these processes crash?  If so, what
>can I do to prevent this?  If I need to restart the qrunners, how do I
>avoid causing multiple copies of posts to be sent out?


Yes, qrunners can die. Just look at Mailman's qrunner and error logs.
Normally, when a qrunner dies, it is automatically restarted by
mailmanctl up to 10 restarts.

Duplicates are a pain, and every effort is taken to avoid or minimize
them, but if a runner dies, due to an uncaught exception, the message
is normally shunted and requires manual action to reprocess, and even
this normally doesn't result in duplicates.

Duplicates can occur when a runner is killed asynchronously by a system
crash, power failure or perhaps in your case, by your init.d script,
but normally, a simple "mailmanctl stop|restart" should just signal
the runners, and they shouldn't stop until finished with the current
task.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list