[Mailman-Users] Problems with multi-machine slicing
Jeff Taylor
shdwdrgn at sourpuss.net
Sat May 24 16:52:00 CEST 2014
After doing some upgrades, I noticed yesterday that my multi-machine
setup is no longer properly slicing the queue between machines. I
probably missed something, but after going through all my notes on the
setup I cannot figure out what the problem in. Hopefully someone else
can spot the issue?
I have four mail servers. Three of them are supposed to slice the queue
between them, and the fourth machine is set as a backup to process any
remaining messages after 2 minutes. On the three slice machines, I have
patched mailmanctl as:
----------
def start_all_runners():
kids = {}
>>>
for qrname, count, machine, nummachines in mm_cfg.QRUNNERS:
for slice in range(machine, count, nummachines):
<<<
# queue runner name, slice, numslices, restart count
info = (qrname, slice, count, 0)
pid = start_runner(qrname, slice, count)
kids[pid] = info
return kids
----------
Each of these machines has a QRUNNERS section added to mm_cfg.py which
defines the slice of each machine -- 3,0,3 / 3,1,3 / 3,2,3
and contains the line: QRUNNER_MESSAGE_IS_OLD_DELAY = None
On the fourth (backup) machine, I have patched Switchboard.py as:
----------
if ext <> extension:
continue
when, digest = filebase.split('+')
>>>
now = time.time()
age = now - float(when)
# Only process defined 'old' entries.
if not (
hasattr(mm_cfg, 'QRUNNER_MESSAGE_IS_OLD_DELAY') and
mm_cfg.QRUNNER_MESSAGE_IS_OLD_DELAY and
age > mm_cfg.QRUNNER_MESSAGE_IS_OLD_DELAY):
continue
<<<
# Throw out any files which don't match our bitrange. BAW: test
# performance and end-cases of this algorithm. MAS: both
# comparisons need to be <= to get complete range.
----------
On this fourth machine I have added to mm_cfg.py:
QRUNNER_MESSAGE_IS_OLD_DELAY = minutes(2)
This machine has NOT had the slices patch added to mailmanctl, so there
is no QRUNNERS section in mm_cfg.py.
OK, so if I only have the backuo machine running, mailman will deliver
my test message after 2 minutes. That part works fine. However with the
three slice machines running, the first machine (3,0,3) sends ALL of the
messages out immediately. If I shut down the first machine and leave
the other two running, no messages are sent out until after the 2-minute
period, then the backup machine sends them. In other words, the queue
is not being sliced, and only the first machine is capable of sending
out list messages.
I have referenced back to the original article on this subject:
https://mail.python.org/pipermail/mailman-users/2008-March/060753.html
but it appears I did the correct changes. Has something changed in
newer versions of mailman that now prevent this technique from working
the same way? Or was there something more to getting slicing to work
that was not mentioned in that article?
More information about the Mailman-Users
mailing list