[Mailman-Developers] Looking at performance again

Wed May 14 17:21:08 EDT 2003

I've been doing some (limited) performance testing lately, and I wanted
to share some numbers and get some feedback.  I've also been having fun
re-reading some old mm-dev threads related to performance. :)

I'm specifically looking for places to improve Mailman's raw
throughput.  I understand that MTA tuning can have a huge impact on the
system, but I think that subject's been hashed out quite well in the
past.  On the table are anything from low-hanging fruit hacks to
mm3-level redesigns.  What I actually can implement all depends. :)

I've been testing the following set up:

- Postfix 2.0.9 configured with a special test transport such that all
  email @example.com gets dd'd to /dev/null

- Postfix running on the same machine as Mailman 2.1.2+ and a second
  test with Postfix (similarly configured) on a separate, very
  unloaded, but less beefy machine sitting next to me on a 100Mb
  ethernet.

- RH9 2.4.20-9 kernel, 863Mhz Dell PIII, 512MB (1723 bogomips), ext3,
  a WDC IDE drive of some 2 y.o. vintage.

- Python 2.2.2 built from source

My list consists of 8000 members like abcdefg at example.com where the
localpart varies randomly.  I've tried deliveries of 10KB, 50KB, 220KB,
1MB of text/plain and a 220KB multipart/related snapshot of a web page
[1].  I have VERP and personalization both turned on.  I started looking
at memory usage, but I'm not so concerned about that now.  It may be
something to address later but I think it's "reasonable".

First the (approximate) numbers.  All deliveries are to 8000 members,
each with their own personalized copy.  SMTP_MAX_RCPTS is 500 unless
otherwise specified (minimal impact seemingly).

msgsz  type         time  msg/hour
-----  ----         ----  --------
10k    plain/text   6min   80k/hr
50k    p/t          9.5min 50k/hr (SMTP_MAX_RCPTS=5)
220k   p/t	    24m	   20k/hr
1MB    p/t          105m   4500/hr
220k   m/related    44m    10k/hr
220k   m/related    41m    11k/hr (SMTP_MAX_RCPTS=5)
220k   m/related    46m    10k/hr (remote MTA)

A few high-level bits:

- Disk i/o probably isn't much of an issue.  Once the message hits the
  out qrunner, it's only two files and all the personalization weaving
  happens in memory just before the message goes out on the socket.
  Since using a remote MTA was actually slightly slower, I'm guessing
  that MTA overhead in the /dev/null pipe is actually minimal (the
  remote machine is a 500MHz, 128 KB, 999 bogomips, mostly idle).

- email.Generator.Generator (and email.Parser.Parser) are good
  candidates for optimization.  You can see that with the 220KB
  messages, the fact that one has structure and the other doesn't,
  affects performance significantly.  That doesn't surprise me. ;)

- Even so, a factor of 100 in message size has a 20x hit on
  performance.  Part of that may be the way the personalization
  weaving gets done.  Right now, we make a copy.deepcopy() of the
  original message object model, then poke in the personalization
  parts in the headers and such, then do all the complex stuff in
  Decorate.py (footers, headers, etc.), then generate the flat text.
  Maybe we can speed things up by converting the message to flat text
  as early as possible and just doing string substitution at the point
  of weaving.

- What kind of a hit does the memberdb-in-a-pickle take?  Would things
  go faster if we stored the member data in a Berkeley, MySQL, or
  other real database?  I'd like to do some testing with my BDB member
  code and I'm wondering if the folks working on other member adapters
  have any performance feedback.

- XVERP might be interesting, but it seems useless for personalization.

- Do we win or lose with the process model, as compared to say, a
  threading model?  I've been wondering if our fears of the Python GIL
  are unfounded.  We could certainly reduce memory overhead by
  multi-threading, and we might be able to leverage something like
  Twisted, which is still in the back of my mind as a very cool way to
  get multi-protocol support into Mailman.

- Does our "NFS-safe" locks impose too much complexity and overhead to
  be worth it?  Does anybody actually /use/ Mailman over NFS?  Don't
  we sorta suspect the LockFile implementation anyway?  Would we be
  better off using kernel file locks, or thread locks if we go to a MT
  model?

Okay, now I'm rambling.  What is the lowest hanging fruit that we might
be able to attack?  I'm up for any other ideas people have.

-Barry

[1] wget -E -H -k -p -nH -nd -Pdownload <url>
    followed by a little Python script to multipart/related it