[Mailman-Users] Mailman stuck : mailmanctl dead with messages in /qfiles/in

Mark Sapiro mark at msapiro.net
Tue May 1 04:14:48 CEST 2012


On 4/30/2012 5:19 PM, Jérôme wrote:

> Mark Sapiro a écrit:
>>
>> How about /var/log/mailman/qrunner ?
> 
> Each day, I have something like this :
> Apr 28 03:16:33 2012 (17099) OutgoingRunner qrunner caught SIGHUP.  Reopening
> logs. Apr 28 03:16:33 2012 (17094) ArchRunner qrunner caught SIGHUP.
> Reopening logs. Apr 28 03:16:33 2012 (17097) IncomingRunner qrunner caught
> SIGHUP.  Reopening logs. Apr 28 03:16:33 2012 (17093) Master watcher caught
> SIGHUP.  Re-opening log files. Apr 28 03:16:34 2012 (17095) BounceRunner
> qrunner caught SIGHUP.  Reopening logs. Apr 28 03:16:34 2012 (17101)
> RetryRunner qrunner caught SIGHUP.  Reopening logs. Apr 28 03:16:34 2012
> (17096) CommandRunner qrunner caught SIGHUP.  Reopening logs. Apr 28 03:16:34
> 2012 (17098) NewsRunner qrunner caught SIGHUP.  Reopening logs. Apr 28
> 03:16:34 2012 (17100) VirginRunner qrunner caught SIGHUP.  Reopening logs.


That is presumably a logrotate process.


> The day it stopped, I got this :
> Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGHUP.  Reopening
> logs. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGHUP.
> Reopening logs. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught
> SIGHUP.  Reopening logs. Apr 29 03:16:29 2012 (17093) Master watcher caught
> SIGHUP.  Re-opening log files. Apr 29 03:16:29 2012 (17097) IncomingRunner
> qrunner caught SIGTERM.  Stopping. Apr 29 03:16:29 2012 (17099)
> OutgoingRunner qrunner caught SIGTERM.  Stopping. Apr 29 03:16:29 2012
> (17097) IncomingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094)
> ArchRunner qrunner caught SIGTERM.  Stopping. Apr 29 03:16:29 2012 (17099)
> OutgoingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner
> qrunner exiting. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught
> SIGHUP.  Reopening logs. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner
> caught SIGHUP.  Reopening logs. Apr 29 03:16:29 2012 (17095) BounceRunner
> qrunner caught SIGHUP.  Reopening logs. Apr 29 03:16:29 2012 (17098)
> NewsRunner qrunner caught SIGHUP.  Reopening logs. Apr 29 03:16:29 2012
> (17098) NewsRunner qrunner caught SIGTERM.  Stopping. Apr 29 03:16:29 2012
> (17095) BounceRunner qrunner caught SIGTERM.  Stopping. Apr 29 03:16:29 2012
> (17096) CommandRunner qrunner caught SIGTERM.  Stopping. Apr 29 03:16:29 2012
> (17101) RetryRunner qrunner caught SIGTERM.  Stopping. Apr 29 03:16:29 2012
> (17100) VirginRunner qrunner caught SIGHUP.  Reopening logs. Apr 29 03:16:29
> 2012 (17096) CommandRunner qrunner exiting. Apr 29 03:16:29 2012 (17098)
> NewsRunner qrunner exiting. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner
> exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGTERM.
> Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner exiting.
> Apr 29 03:16:29 2012 (17100) VirginRunner qrunner exiting.


So here in addition to the normal SIGHUPs, presumably from logrotate,
you also have SIGTERMs possibly from a "bin/mailmanctl stop" although
you don't show a "Master watcher caught SIGTERM" entry.

Check your system cron log to see what was running at the time.


> Sorry for the mess, here. But I think you get the idea.
> 
> Seems to happen during a cron job.
> 
> Bug reports that could be related :
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=505638
> https://bugs.launchpad.net/mailman/+bug/265855
> 
>>> modified 
>>> /var/lib/mailman/Mailman/Handlers/SMTPDirect.py
>>> to add
>>> self.__conn.set_debuglevel(1)
>>
>> And yet you are not logging any smtp debugging in Mailman's error log.
>> There should be copious log information for every outgoing message.
> 
> There was. But it stopped. Last message for which I do have a lot of info is
> on Apr 22, one week before mailman stopped sending messages.
> 
> -rw-rw-r-- 1 list list      198 Apr 30 03:16 /var/log/mailman/error
> -rw-rw-r-- 1 list list        0 Apr 22 03:16 /var/log/mailman/error.1
> -rw-rw-r-- 1 list list        0 Apr 15 03:16 /var/log/mailman/error.2
> -rw-rw-r-- 1 list list 36541617 Apr 22 01:59 /var/log/mailman/error.3
> 
> Should there be anything relevant in there ?


No. That isn't relevant to this issue. Apparently, something reversed
the patch on April 22 which is just as well.

The curious thing is you are rotating logs weekly, but "reopening" them
daily. Either this is two different processes or your logrotate script
is a bit strange.


>>> Configuration
>>> -------------
>>>
>>> Not sure this is useful, but 
>>> /etc/mailman/mm_cfg.py contains
>>> MTA='LocalPostfix'
>>
>> The above line should cause significant problems when attempting to
>> create or remove lists. it MUST be one of
>>
>> MTA = 'Postfix'
>> MTA = 'Manual'
>> MTA = None
>>
>> 'Postfix' means generate aliases and virtual-mailman files for Postfix.
>> 'Manual' means display the necessary aliases
>> None means don't do anything with aliases when lists are created/removed.
> 
> I configured mailman 3 years ago. I don't remember everything but it comes
> from here :
> http://isp-control.net/documentation/howto/mail/setup_mailman


Which is wrong as it contains lines like

MTA=Postfix

and

MTA=localPostfix

If you put either of those lines literally in mm_cfg.py without quotes
around Postfix or localPostfix, Mailman won't run at all because every
Mailman process will encounter a fatal error on importing mm_cfg.

Apparently you figured that out as you have quoted it.

MTA='LocalPostfix'

and presumably you have named your edited module LocalPostfix.py rather
than localPostfix.py or it would be throwing errors when
creating/removing lists.

> Is it such a bad idea ?


No, because the howto you followed had you create
Mailman/MTA/localPostfix.py (or LocalPostfix.py), but how was I supposed
to know that?

See the FAQ at <http://wiki.list.org/x/OIDD>.


> I suppose it is unrelated, anyway.


Yes, it is unrelated.


> Good thing is there is a relatively recent bug opened on debian that might be
> closed if we managed to rootcause and solve this.
> 
> I just did a little bit of cleanup tonight, after I realized the server was
> almost full. At least the partition that hosts mailman queues and logs. Would
> we see something specific in case of lack of space ?


You would begin to see exceptions thrown for inability to create/write
files.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list