[Mailman-Users] Problem with qrunner and too much incoming mail

Marc MERLIN marc_news at valinux.com
Fri Nov 3 20:35:39 CET 2000


[I am not Ccing mailman-developers as this is not encouraged, but if someone
on both lists thinks it should be forwarded there, please feel free]


As some of you may know, sourceforge.net's mailing lists run on mailman.

With the  upgrade to  the 2.0  branch, when  deliveries where  switched from
being directly  handed out  to the MTA  to being spooled,  and picked  up by
qrunner, more  mail started getting  spooled than qrunner could  process per
minute.

The problem is due to qrunner being  single threaded by default and having a
global lock. Because  some mailing lists  have subscribers in  domains where
DNS is  slow and unreliable, the  MTA will hang  on those rcpt to  until DNS
resolves or timeouts, and qrunner won't be done in time.
After that, it's all downhill from there, more mail queues up, qrunner falls
even further behind, etc, etc...

We're currently playing with  MTAs to optimize this a bit,  but the real fix
is on the mailing list side.

Options:
- Forget about qrunner and switch back to direct delivery and queueing only
  when direct delivery fails. Unfortunately, I'm told this is buggy, and
  mail can be lost. Is this still true?

- Remove the locking in qrunner, run more than one qrunner at once, and hope
  for the best ;-)

- Have a multithreaded qrunner that processes 10 or 20 mails at once
  (talking to 10 or 20 instances of the MTa in parrallel)
  My understanding  is that python  2.0 has multithreading support  and that
  mailman has some  multithreading support. Is it something  that could help
  me and that we should be looking at?

- Other?


Thanks for your help, we have to fix this somehow or switch MLMs :-)
(or get killed by our users :-D)


Something else I'm looking at is load balancing.
One solution is to put X lists on each machine, but if you lose one machine,
you lose a portion of your lists.

Now, if I  have X machines that mount /var/local/mailman,  they'd be able to
service all the lists (config.db would  get locked correctly), but I'd still
be stuck with only one queue runner because of the global lock.
That said,  I *could* have mailman/data  and mailman/qfiles be a  symlink to
somewhere on the local disk, and patch qrunner to put its lock in data.
This would allow for independant queue  runners, but shared list configs and
shared locks on the list configs themselves.

Would that work?
Am I insane? :-)

Marc
-- 
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking
  
Home page: http://marc.merlins.org/   |   Finger marc_f at merlins.org for PGP key




More information about the Mailman-Users mailing list