[Mailman-Users] Messages remain in spool/mailman/in

Mark Sapiro mark at msapiro.net
Thu Apr 24 16:38:14 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sebastian Hagedorn wrote:
|
| --On 23. April 2008 18:28:44 -0700 Mark Sapiro <mark at msapiro.net> wrote:
|
|>> Update: I just noticed that the number of messages in "in" is
|>> decreasing.
|>> Could it be that something just overwhelmed Mailman so that now there is
|>> a backlog that's slowly being worked through? I don't think we ever had
|>> such issues prior to today.
|>
|> There are two possibilities. One is that your IncomingRunner is sliced
|
| It's not. I didn't even know that feature until just now. Perhaps I
| should look into that.
|
|> The other is that somehow you are working through a flood of messages,
|> or some problem (maybe locks - see
|> <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.076.htp>)
|> is making IncomingRunner run slowly.
|
| Well, there were really some locking issues, see below.
|
|> A backlogged IncomingRunner is
|> rare. It's more usual to see a backlogged OutgoingRunner due to delays
|> in SMTP.
|>
|> Check all of Mailman's logs for any clues.
|
| Let's start with mmdsr:
|
| Hourly Summary of Posts


Note that the 'post' log which is the source of these data is written
when the message is sent, not when it arrives.


| -----------------------
| <snip>
|     126  11:00-11:59
|      97  12:00-12:59
|     121  13:00-13:59
|    1355  14:00-14:59
|     760  15:00-15:59
|     617  16:00-16:59
|    1293  17:00-17:59
|     207  18:00-18:59
| <snip>
|
| So starting around 2pm there was an unusually high number of posts. On
| other days there are never more than about 150 messages per hour. It
| turns out that one list by itself hat 3,518 posts yesterday. And that
| list had locking issues:
|
| Apr 23 16:28:35 2008 (13393) jura-pruefungsamt.lock lifetime has
| expired, breaking
| Apr 23 16:28:35 2008 (13393)   File "/usr/lib/mailman/bin/qrunner", line
| 278, in ?
| Apr 23 16:28:35 2008 (13393)     main()
| Apr 23 16:28:35 2008 (13393)   File "/usr/lib/mailman/bin/qrunner", line
| 238, in main
| Apr 23 16:28:35 2008 (13393)     qrunner.run()
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run
| Apr 23 16:28:35 2008 (13393)     filecnt = self._oneloop()
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/Queue/Runner.py", line 113, in _oneloop
| Apr 23 16:28:35 2008 (13393)     self._onefile(msg, msgdata)
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/Queue/Runner.py", line 171, in _onefile
| Apr 23 16:28:35 2008 (13393)     keepqueued = self._dispose(mlist, msg,
| msgdata)
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/Queue/VirginRunner.py", line 38, in _dispose
| Apr 23 16:28:35 2008 (13393)     return IncomingRunner._dispose(self,
| mlist, msg, msgdata)
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 115, in _dispose
| Apr 23 16:28:35 2008 (13393) mlist.Lock(timeout=mm_cfg.LIST_LOCK_TIMEOUT)
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock
| Apr 23 16:28:35 2008 (13393)     self.__lock.lock(timeout)
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/LockFile.py", line 306, in lock
| Apr 23 16:28:35 2008 (13393)     important=True)
| Apr 23 16:28:35 2008 (13393)   File
| "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog
| Apr 23 16:28:35 2008 (13393)     traceback.print_stack(file=logf)
| Apr 23 16:51:54 2008 (13389) jura-pruefungsamt.lock unexpected
| linkcount: -1
| Apr 23 16:51:54 2008 (13389)   File "/usr/lib/mailman/bin/qrunner", line
| 278, in ?
| Apr 23 16:51:54 2008 (13389)     main()
| Apr 23 16:51:54 2008 (13389)   File "/usr/lib/mailman/bin/qrunner", line
| 238, in main
| Apr 23 16:51:54 2008 (13389)     qrunner.run()
| Apr 23 16:51:54 2008 (13389)   File
| "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run
| Apr 23 16:51:54 2008 (13389)     filecnt = self._oneloop()
| Apr 23 16:51:54 2008 (13389)   File
| "/usr/lib/mailman/Mailman/Queue/Runner.py", line 113, in _oneloop
| Apr 23 16:51:54 2008 (13389)     self._onefile(msg, msgdata)
| Apr 23 16:51:54 2008 (13389)   File
| "/usr/lib/mailman/Mailman/Queue/Runner.py", line 171, in _onefile
| Apr 23 16:51:54 2008 (13389)     keepqueued = self._dispose(mlist, msg,
| msgdata)
| Apr 23 16:51:54 2008 (13389)   File
| "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 115, in _dispose
| Apr 23 16:51:54 2008 (13389) mlist.Lock(timeout=mm_cfg.LIST_LOCK_TIMEOUT)
| Apr 23 16:51:54 2008 (13389)   File
| "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock
| Apr 23 16:51:54 2008 (13389)     self.__lock.lock(timeout)
| Apr 23 16:51:54 2008 (13389)   File
| "/usr/lib/mailman/Mailman/LockFile.py", line 287, in lock
| Apr 23 16:51:54 2008 (13389)     self.__linkcount(), important=True)
| Apr 23 16:51:54 2008 (13389)   File
| "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog
| Apr 23 16:51:54 2008 (13389)     traceback.print_stack(file=logf)
|
| It turns out that our law school department informed all their students
| of some news. They did so by sending separate mails to each student,
| each time Cc'ing themselves on their Mailman list ...


Ouch!


| I will talk to
| them about that, but still I'd like to know how to avoid such a
| situation. Would slices have helped here?


No. Slicing would make things worse by increasing lock contention. Since
all the posts were for one list, and since the list is locked by
IncomingRunner for the entire duration of it's processing of a message,
slicing wouldn't help. It normally helps by increasing parallelism, but
only if the load is across multiple lists so the the sliced runners
aren't blocking each other.

In this case, IncomingRunner was doing the best that could be done. I
see that VirginRunner was involved in the locking, so perhaps these
messages were being held and moderator notices were being generated.
Then perhaps a moderator went to the admindb interface, which can keep a
list locked for a long time if it is processing a large list of requests.

- --
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)

iD8DBQFIEJtWVVuXXpU7hpMRAl4yAKDJ9yg2toE5zzDk8kk+e6f71hgW/QCcDnRN
AXHyImxm5q4nkvdkV3yjfdc=
=fb+L
-----END PGP SIGNATURE-----


More information about the Mailman-Users mailing list