[Mailman-Users] Mail stuck in qfiles/in

Mark Sapiro msapiro at value.net
Sat Feb 3 02:31:13 CET 2007


Allan Trick wrote:
>
>There's nothing in the error log.  But qrunner's might have a 
>clue.  I'm not sure how to read this:
>
>Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit
>(pid: 29673, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting]


This says OutgoingRunner quit with exit status 1 with no signal. This
in itself is not too informative, but ...
>Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit
>(pid: 29273, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit
>(pid: 29276, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting]
>Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit
>(pid: 1548, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting]
>Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit
>(pid: 24311, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit
>(pid: 1546, sig: None, sts: 1, class: NewsRunner, slice: 1/1) [restarting]
>Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit
>(pid: 1544, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting]
>Jan 31 11:38:51 2007 (18640) IncomingRunner qrunner started.
>Jan 31 11:38:51 2007 (18643) BounceRunner qrunner started.
>Jan 31 11:38:51 2007 (18642) VirginRunner qrunner started.
>Jan 31 11:38:51 2007 (18641) ArchRunner qrunner started.
>Jan 31 11:38:51 2007 (18644) NewsRunner qrunner started.
>Jan 31 11:38:51 2007 (18639) OutgoingRunner qrunner started.
>Jan 31 11:38:52 2007 (18645) CommandRunner qrunner started.


At 11:38:51+, every runner except RetryRunner quit and was restarted.
Then all seemed OK for about 22 minutes.


>Jan 31 12:00:36 2007 (1541) Master qrunner detected subprocess exit
>(pid: 18639, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting]
>Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26785, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting]
>Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26786, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting]


Then 3 Outgoing runners died. The one that was started 22 minutes ago
(pid 18639) and two others (pids 26785 and 26786). Perhaps 26785 died
before logging its 'started' message and 26786 was started and did the
same thing.


>Jan 31 12:00:37 2007 (26787) OutgoingRunner qrunner started.
>Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit
>(pid: 18645, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting]
>Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit
>(pid: 18640, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit
>(pid: 18641, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting]
>Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit
>(pid: 18642, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting]
>Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit
>(pid: 18643, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26792, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting]
>Jan 31 12:00:38 2007 (26793) IncomingRunner qrunner started.
>Jan 31 12:00:38 2007 (26795) VirginRunner qrunner started.
>Jan 31 12:00:38 2007 (26794) ArchRunner qrunner started.
>Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26797, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting]
>Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26796, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26801, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:38 2007 (26798) CommandRunner qrunner started.
>Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26803, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26804, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26805, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26806, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26811, sig: None, sts: 127, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26806, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26811, sig: None, sts: 127, class: BounceRunner, slice: 1/1) [restarting]
>Jan 31 12:00:42 2007 (1541) Qrunner BounceRunner reached maximum 
>restart limit of 10, not restarting.
>Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26793, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26794, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting]
>Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26787, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting]
>Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26795, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting]
>Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26798, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting]
>Jan 31 12:00:46 2007 (26843) IncomingRunner qrunner started.
>Jan 31 12:00:46 2007 (26845) OutgoingRunner qrunner started.
>Jan 31 12:00:46 2007 (26844) ArchRunner qrunner started.
>Jan 31 12:00:46 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26846, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting]
>Jan 31 12:00:46 2007 (26847) CommandRunner qrunner started.
>Jan 31 12:00:46 2007 (26848) VirginRunner qrunner started.
>Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26843, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit
>(pid: 18644, sig: None, sts: 1, class: NewsRunner, slice: 1/1) [restarting]
>Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26847, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting]
>Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26844, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting]
>Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26958, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting]
>Jan 31 12:01:08 2007 (26955) IncomingRunner qrunner started.
>Jan 31 12:01:08 2007 (26956) NewsRunner qrunner started.
>Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26957, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting]
>Jan 31 12:01:08 2007 (26959) ArchRunner qrunner started.
>Jan 31 12:01:08 2007 (26962) CommandRunner qrunner started.
>Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26955, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26969, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26970, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26971, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:01:10 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26972, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting]
>Jan 31 12:01:10 2007 (1541) Qrunner IncomingRunner reached maximum 
>restart limit of 10, not restarting.
>Jan 31 12:01:10 2007 (1541) Master qrunner detected subprocess exit
>(pid: 26848, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting]
>
>......
>
>And it goes on and on like that.  Why would it not have been able to restart??


The fact that beginning at 12:00:37, the runners are dying as fast as
they can be restarted, in some cases it seems before even logging
their 'started' message which they do before actually beginning to
process their queues, seems to point to some external OS condition as
the cause. It is curious that RetryRunner seems to be exempt.

Other than thinking it probably isn't an internal Mailman thing, but
rather an external OS thing, I don't have any ideas.

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list