[Mailman-Users] is driving me crazy

Mark Sapiro msapiro at value.net
Mon Jan 23 02:18:48 CET 2006


ArteryPlanet.Net :: Manuel Kissoyan wrote:

>I remember i saw qrunners in the server process when the mailing list were 
>down but somehow when i restart mailman looks like the whole list were down, 
>because it starting send all the lists queued mails.


This only means that perhaps one runner was down and starting anew
created multiple copies of some. Hint: only run 'bin/mailmanctl -s
start' in an init file that runs on system (re)boot. Under normal
circumstances when running bin/mailmanctl manually, don't use '-s'.



>the qrunner logs at the same hour when the list gone down are:
>
>Jan 18 17:29:19 2006 (23105) VirginRunner qrunner started.
>Jan 18 18:15:51 2006 (10568) Master qrunner detected subprocess exit
>(pid: 23055, sig: None, sts: 1, class: VirginRunner, slice: 1/1) 
>[restarting]
>Jan 18 18:15:51 2006 (11632) VirginRunner qrunner started.
>Jan 18 18:16:03 2006 (480) Master qrunner detected subprocess exit
>(pid: 23105, sig: None, sts: 1, class: VirginRunner, slice: 1/1) 
>[restarting]
>Jan 18 18:16:03 2006 (480) Qrunner VirginRunner reached maximum restart 
>limit of 10, not restarting.



So VirginRunner isn't (wasn't on Jan 18) running. Are there any error
log entries from Jan 18 coincident with these subprocess exits?



>these are the last lines right now in the error log:

<snip>
>Jan 22 23:51:22 2006 qrunner(1449): OSError :  [Errno 2] No such file or 
>directory: 
>'/usr/local/cpanel/3rdparty/mailman/qfiles/in/1137973882.55214+4702c7f0c4fdea7d0473729ec90428cec740947e.pck'

<snip>
>Jan 22 23:51:23 2006 qrunner(20729): OSError :  [Errno 2] No such file or 
>directory: 
>'/usr/local/cpanel/3rdparty/mailman/qfiles/out/1137973882.55214+536fcf4d659766a32e9b94e92bfde66798394acb.pck'

<snip>
>Jan 22 23:51:23 2006 qrunner(20713): IOError :  [Errno 2] No such file or 
>directory: 
>'/usr/local/cpanel/3rdparty/mailman/qfiles/archive/1137973882.55214+331cf1e3f1102872474d59b9f53a1fb197f0316f.pck'

<snip>>Jan 22 23:52:41 2006 qrunner(21835): IOError :  [Errno 2] No
such file or 
>directory: 
>'/usr/local/cpanel/3rdparty/mailman/qfiles/bounces/1137973961.4684539+a97050f66bcd30b95df8d86fb97378a830687d6f.pck'


So, there are probably multiple copies of at least IncomingRunner,
OutgoingRunner, ArchRunner and BounceRunner.

Do "ps -fAw | grep 'python'" or however you spell the ps options on
your system to get all processes including command lines. There should
be exactly one each of mailmanctl, ArchRunner, BounceRunner,
CommandRunner, IncomingRunner, NewsRunner, OutgoingRunner,
VirginRunner and RetryRunner, except in the unlikely case that you are
processing your queues in slices in which case there should be one of
each runner for each unique slice.

If there are more, first do "bin/mailmanctl stop". Then if there are
any left, send them SIGTERM until they're all gone. Then start mailman
again.


>about...."Where are the posts going, i.e. which qfiles/* directories have 
>entries.", could you please clarify...the following are the directories in 
>/qfiles
>
>drwxrwsr-x   11 mailman  mailman      4096 May 28  2004 ./
>drwxrwsr-x   22 mailman  mailman      4096 Jul 20  2005 ../
>drwxrws---    2 mailman  mailman      4096 Jan 22 23:51 archive/
>drwxrws---    2 mailman  mailman      4096 Jan 22 23:52 bounces/
>drwxrws---    2 mailman  mailman      4096 Jan 19 01:41 commands/
>drwxrws---    2 mailman  mailman      8192 Jan 22 23:51 in/
>drwxrws---    2 mailman  mailman      4096 May 28  2004 news/
>drwxrws---    2 mailman  mailman     53248 Jan 22 23:51 out/
>drwxrws---    2 mailman  mailman      4096 Jan 22 22:37 retry/
>drwxrws---    2 mailman  mailman      8192 Dec 17 04:39 shunt/
>drwxrws---    2 mailman  mailman     36864 Jan 22 23:44 virgin/


And what is in the archive/, bounces/, etc. directories? More
importantly, when things are not working, in which of those 9
directories (queues) are the messages getting stuck?


>About "Also, what happens if you move the lists/LIST_NAME/digest.mbox file 
>aside? Does that help?"
>
>you mean delete that file? remember we already removed this list and 
>re-created it so that file was created new before it gone down.


The lists/LIST_NAME/digest.mbox file is where posts are collected for
an eventual digest. When it reaches digest_size_threshold size or when
cron/senddigests runs if digest_send_periodic is yes, it is used to
create the digest and then removed. I.e., under usual circumstances,
it is removed by Mailman every day.

The issue is that there are known cases when a somehow malformed or
badly encoded post has been posted and saved to digest.mbox, and this
has stopped processing for that list. This is why I suggested moving
it aside - i.e. moving it out of the lists/LIST_NAME/ directory to see
if that allows the list's processing to resume. This would indicate
the problem is a 'bad' post in digest.mbox. If moving the file aside
didn't help, then the problem is elsewhere.

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan




More information about the Mailman-Users mailing list