[Mailman-Users] Solved, sorry, was: Re: help wanted: debian woody mailman suddenly stopped with (seemingly) qrunner lock file problem
Ziegler Gábor
ziegler at alpha.tmit.bme.hu
Thu Dec 16 23:47:14 CET 2004
Hi everybody,
I am embarrassed, since RTFM was the solution: Possibly item 8.) of
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq03.014.htp
May all the spammers die a VERY lengthy and painful death then may they
go to hell forever :-(
Cheers,
Gabor
Ziegler Gábor wrote:
> Dear gurus,
>
> I run a fairly low-traffic mailman on a stock debian woody server, which
> suddenly stopped to work. I am clueless and looking for help. Details below
>
> My system:
> ---------
> Debian stable (stock deban woody, regularly updated from
> security.debian.org)
> Debianized stock mailman package v.2.0.11
> Debianized stock Exim package: version 3.35 #1 built 07-May-2004 08:25:17
>
> Symptomps:
> ----------
> A few days ago the server suddenly stopped to process incoming messages,
> they just accumulate in the qfiles subdir. Admin access via web is
> working, I can add users, etc. No pending mails reported by the web
> admin gui. Mails are accepted by the MTA w/o complaints, no mail goes
> out to lists, though. Nothing. The non-mailman-related SMTP traffic
> flows as normal.
>
> The server has been running for years w/o any real problem. Running
> out-of-disk-space has happened earlier, but cleaning-up some disk-space
> has always solved problems.
>
> Below comes the summary of my investigations. I am totally clueless
> about the problem any help is highly appreciated.
>
> I repeat: the server has worked for years, no (intentional) config
> changes has happened. There was, however, reports of the server running
> out-of diskspace by a list-admin, but that has been taken care already.
>
> Zeroth examination: disk space check:
> -------------------
> df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/hdb1 1.9G 1.8G 116M 94% /
> /dev/hdb5 3.9G 3.6G 163M 96% /home
> /dev/hdb3 1.9G 1.2G 738M 61% /var
> /dev/hdb6 3.9G 3.1G 724M 81% /usr/local
> /dev/hda1 7.6M 5.6M 1.6M 78% /boot
> /dev/hda2 4.7G 3.2G 1.2G 72% /archives-hda2
>
> Note, there is plenty of disk space in /var.
>
> First examination: SMTP works
> -----------------------------
> According to the logs exim delivers: just an example from the Exim's
> mainlog, showing a succesful delivery to mailman-list "nsht":
>
> 2004-12-15 08:34:11 1CeTfn-0002e9-00 <= XXXXXX at tmit.bme.hu
> H=david.tmit.bme.hu [152.66.246.102] P=esmtp S=1865
> id=Pine.GSO.3.96.1041215083328.21437A-100000 at david.tmit.bme.hu
> 2004-12-15 08:34:12 1CeTfn-0002e9-00 => nsht <nsht at leda.tmit.bme.hu>
> D=list_director T=list_transport
> 2004-12-15 08:34:12 1CeTfn-0002e9-00 Completed
>
> Furthermore, I actively use this Exim as my everyday default SMTP MTA,
> works just fine fine.
>
> Second examination: The messages seems to reach the qfiles directory.
> ----------------------------------------------------------------------
> There are various entries like this:
> f0fb10de9b998a5a185~aa29819f1395b9.db size:115 date:Dec 15 23:03
> f0fb10de9b998a5a185~a29819f1395b9.msg size:825 date:Dec 15 23:03
> The content of a .db file:
> leda:/var/lib/mailman/qfiles# cat -vte
> f0fb10de9b998a5a1858842d62aa29819f1395b9.db
> {s^F^@^@^@tolisti^A^@^@^@s^G^@^@^@versioni^B^@^@^@s^H^@^@^@listnames^D^@^@^@nshts^H^@^@^@filebases(^@^@^@f0fb10de9b998a5a18
>
> The content of the .msg file seems normal SMTP envelope and body
>
> The biggest .msg file in this directory is 6656 bytes, therefore
> disk-free-space cannot be the issue.
>
> Third examination: perms seems to O.K.
> --------------------------------------
> leda:/var/lib/mailman/qfiles# check_perms
> No problems found
>
> Fourth examination: checking database of the list of the reporting
> list-admin for list "nsht"
> --------------------------------------------------------
> leda:/var/lib/mailman/qfiles# check_db nsht
> /var/lib/mailman/lists/nsht/config.db is fine
> /var/lib/mailman/lists/nsht/config.db.last is fine
>
> Note, that no lists seems to work on the server (there are some tens of
> lists), neither "nsht" nor others.
>
>
> Fifth examination: checking crontab for mailman
> -----------------------------------------------
> leda:/var/lib/mailman/qfiles# cat /etc/cron.d/mailman
> 12,42 * * * * list [ -x /usr/bin/python -a -f
> /usr/lib/mailman/cron/run_queue ] && /usr/bin/python
> /usr/lib/mailman/cron/run_queue
> # */5 * * * * list [ -x /usr/bin/python -a -f
> /usr/lib/mailman/cron/gate_news ] && /usr/bin/python
> /usr/lib/mailman/cron/gate_news
> * * * * * list [ -x /usr/bin/python -a -f
> /usr/lib/mailman/cron/qrunner ] && /usr/bin/python
> /usr/lib/mailman/cron/qrunner
>
> Cron daemon is up and running. Qrunner script runs every minutes. See
> next examination
>
> Sixth examination: checking mailman logs
> --------------------------------------------
> Everything seems to normal, except that qrunner continually emits
> errors at each run to /var/lib/mailman/logs/error, such as these:
>
> Dec 16 00:06:02 2004 qrunner(18367): Traceback (most recent call last):
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/mailman/cron/qrunner", line 283, in ?
> Dec 16 00:06:02 2004 qrunner(18367): kids = main(lock)
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/mailman/cron/qrunner", line 253, in main
> Dec 16 00:06:02 2004 qrunner(18367): keepqueued =
> dispose_message(mlist, msg, msgdata)
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/mailman/cron/qrunner", line 121, in dispose_message
> Dec 16 00:06:02 2004 qrunner(18367): if
> BouncerAPI.ScanMessages(mlist, mimemsg):
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/mailman/Mailman/Bouncers/BouncerAPI.py", line 59, in ScanMessages
> Dec 16 00:06:02 2004 qrunner(18367): addrs = func(msg)
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/mailman/Mailman/Bouncers/Postfix.py", line 39, in process
> Dec 16 00:06:02 2004 qrunner(18367): more = mfile.next()
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/python2.1/multifile.py", line 123, in next
> Dec 16 00:06:02 2004 qrunner(18367): while self.readline(): pass
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/python2.1/multifile.py", line 95, in readline
> Dec 16 00:06:02 2004 qrunner(18367): if marker ==
> self.section_divider(sep):
> Dec 16 00:06:02 2004 qrunner(18367): File
> "/usr/lib/python2.1/multifile.py", line 159, in section_divider
> Dec 16 00:06:02 2004 qrunner(18367): return "--" + str
> Dec 16 00:06:02 2004 qrunner(18367): TypeError : cannot add type "None"
> to string
>
> My attempts to fix the seemingly lock file problem:
> --------------------------------------------------
> 1. Since the reporting list-admin claimed temporary ran-out-of-diskspace
> situation. I double checked the available free space.
>
> 2. I have stopped crond, inetd. I have checked that no python process is
> lurking around, then I have checked with "lsof" that any of the
> lock-files in /var/lib/mailman/locks/ are not held open by anyone. All
> lock files was older than several months(!). I have deleted all
> lockfiles. Restarted crontab, inetd. Qrunner still fails with the above
> error log.
>
> 3. as a last attempt i have sacrified my 135 days uptime :-( and I have
> rebooted the system, hoping that the Microsoft approach might help.
> The system rebooted just fine, but mailman (qrunner) still does not work.
>
> Now I am out of ideas.
> Any advice?
>
> Thanks:
> Gábor
>
More information about the Mailman-Users
mailing list