[Mailman-Users] mailman 2.1.14 stops sending mail

Millsap, James James.Millsap at chicagobooth.edu
Thu Apr 11 18:07:43 CEST 2013


Unfortunately It is difficult as this machine is critical to our operations, I don't have a whole lot of time to troubleshoot, before I must have it up and running. It usually takes around two days for this issue to come up.  -TERM will kill it, no need to use --KILL. This is built from source so no redhat packages.   This is what I have in the qrunner log. 

Apr 10 10:01:08 2013 (17606) ArchRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17606) ArchRunner qrunner exiting.
Apr 10 10:01:08 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17612) VirginRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17612) VirginRunner qrunner exiting.
Apr 10 10:01:08 2013 (17607) BounceRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17608) CommandRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17608) CommandRunner qrunner exiting.
Apr 10 10:01:08 2013 (17609) IncomingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17609) IncomingRunner qrunner exiting.
Apr 10 10:01:08 2013 (17610) NewsRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17610) NewsRunner qrunner exiting.
Apr 10 10:01:08 2013 (17613) RetryRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17613) RetryRunner qrunner exiting.
Apr 10 10:01:08 2013 (17604) Master watcher caught SIGTERM.  Exiting.
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17606, sig: None, sts: 15, class: ArchRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17608, sig: None, sts: 15, class: CommandRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17609, sig: None, sts: 15, class: IncomingRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17610, sig: None, sts: 15, class: NewsRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17612, sig: None, sts: 15, class: VirginRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17613, sig: None, sts: 15, class: RetryRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17607) BounceRunner qrunner exiting.
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17607, sig: None, sts: 15, class: BounceRunner, slice: 1/1)
Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:37 2013 (17604) Master watcher caught SIGTERM.  Exiting.
Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner exiting.
Apr 10 10:01:38 2013 (17604) Master qrunner detected subprocess exit
(pid: 17611, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1)
Apr 10 10:01:58 2013 (15858) CommandRunner qrunner started.
Apr 10 10:01:59 2013 (15859) IncomingRunner qrunner started.
Apr 10 10:01:59 2013 (15856) ArchRunner qrunner started.
Apr 10 10:01:59 2013 (15857) BounceRunner qrunner started.
Apr 10 10:01:59 2013 (15862) VirginRunner qrunner started.
Apr 10 10:01:59 2013 (15860) NewsRunner qrunner started.
Apr 10 10:01:59 2013 (15863) RetryRunner qrunner started.
Apr 10 10:01:59 2013 (15861) OutgoingRunner qrunner started.

-----Original Message-----
From: Mark Sapiro [mailto:mark at msapiro.net] 
Sent: Wednesday, April 10, 2013 3:59 PM
To: Millsap, James
Cc: mailman-users at python.org
Subject: Re: [Mailman-Users] mailman 2.1.14 stops sending mail

On 4/10/2013 8:43 AM, Millsap, James wrote:
> 
> mailman  15854     1  0 10:01 ?        00:00:00 /usr/bin/python /usr/local/mailman/bin/mailmanctl -s start
> mailman  15861 15854  0 10:01 ?        00:00:06 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s
> 
> I have to kill the outgoingrunner specifically.  The only thing I see in the logs is a lack of logging.  It has been running with stunning reliability on this machine for the last few years, so I am not sure what is going on.  Perhaps one of redhat's patches killed it.


Can you kill -TERM it or do you need to kill -KILL it?

Are you sure there's nothing relevant in Mailman's qrunner log (/var/log/mailman/qrunner if a rhel packaged Mailman)? Is there a current .bak file in the out queue (/var/spool/mailman/out/)

What does 'lsof' show for the process? You might be able to get something useful from 'gdb' or maybe see something like <http://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application>.

If I had to guess, I'd guess it gets hung waiting for an SMTP response from the MTA.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list