[Mailman-Developers] Massive changes (Long)

Barry A. Warsaw bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Wed, 10 Nov 1999 12:31:40 -0500 (EST)


Folks, I have some rather massive changes to check into the CVS tree,
which I'm going to explain below.  Grab a cup of coffee and sit back.
:)

I should note that while I've done a lot of testing of this stuff, I'm
sure I've missed things.  I do not recommend using any of this on a
production system, but I would definitely appreciate another set of
eyes and hands taking a look.  I think I have a way to selectively
migrate lists to a new Mailman installation one at a time, so I will
try some of my test lists on python.org.  If all goes well, I may make
y'all guinea pigs in the next round of experiments :)

These changes were precipitated by the observation of occasional poor
performance, and the intermittent reports of duplicate messages.  I
actually gathered enough data on python.org that implicate Mailman's
built-in bulk mailer.  So I wanted to experiment with alternative
delivery strategies.  In the process I got a new Ultra2 for python.org
and decided it was time to play with a different MTA, so I sucked down
Postfix and installed it instead of sendmail.

What I wanted was an architecture that would let us drop in different
delivery schemes easily, so that we could experiment with the best,
and also so sites could easily add whatever delivery machinery made
the most sense for them.  I strongly suspect that no one scheme that
Mailman coudl bundle will be right for everyone.

For a while now, I've hated the tangle of code that gets called when a
message is being delivered.  So I've essentially ripped all that out
for an architecture that I've talked about to a few people already.
The idea is that when a message comes into the system, it is handled
by a "pipeline".  In Python terms, the pipeline is simply a list of
modules, all of which have a function called `process' that takes two
arguments: a MailList instance and a Message instance.  MailLists you
know.  Messages are a subclass of rfc822.Messages with a little bit of
extra stuff for convenience.  Since rfc822.Message now supports a
writeable interface, I was able to get rid of Message.IncomingMessage,
Message.OutgoingMessage, and Message.NewsMessage.

Here's the entire contents of MailList.Post():

    # msg should be an Message.Message object.
    def Post(self, msg):
	self.IsListInitialized()
        # TBD: this is bogus and will later be configurable
        import Mailman.Handlers.HandlerAPI
        Mailman.Handlers.HandlerAPI.process(self, msg)
	self.Save()

You'll see that that is a little bit shorter than what it used to be
:).  `Handlers' is a new package inside Mailman which contains all the
modules for the pipeline.  HandlerAPI.process() basically runs down
the list of pipeline modules, calling each in turn.  The specs for the
<handler>.process() functions say that their return value is ignored,
and they raise an exception if processing of the message is supposed
to stop propagating.

Here's the current list of pipeline modules for normal message
delivery:

    pipeline = ['SpamDetect',
                'Approve',
                'Hold',
                'Cleanse',
                'CookHeaders',
                'ToDigest',
                'ToArchive',
                'ToUsenet',
                'CalcRecips',
                'Decorate',
                'Sendmail',
                'Acknowledge',
                'AfterDelivery',
                ]

The idea here is that each pipeline module does one small thing and
one thing only.  For example, Approve.process() tries to figure out if
the message has been pre-approved.  It will also look for an Approved:
header with a valid password.  All communication between handlers
happens via attributes on the message object, so in the case of the
Approve handler, if it finds a valid "Approve: password" header, it
sets "msg.approved = 1".  Other modules (notably SpamDetect and Hold)
will look for this attribute and short circuit if it finds a true
value.

In a similar vein, ToArchive and ToDigest will look for the `isdigest'
attribute.  If found and true, then the message is a digest so it
won't be sent to the archive, nor will it be appended to the currently
building digest.

Now the neat thing is the Sendmail module.  It's not clever at all, in
fact it just popen's a sendmail (or sendmail-compatible, e.g. postfix)
process, gives the recip addrs on the command line, and pipes the
message to the proc's stdin.  The one complication is that it splits
up the recipients if necessary to keep the command line length
manageable.  What I think is neat is that you could implement an
SMTPDeliver module which uses smtplib.py, or even convert to using
Mailman's current bulk mailer (which I haven't done), or a delivery
scheme of your own choosing.  It should be quite easy to drop in just
drop in this component while keeping everything else the same.  The
Sendmail module rqeuires that the msg object has an attribute called
`recips' which is a list of addresses to send the message to.  The
CalcRecips module sets this attribute.

A couple of things to notice.  First, proper delivery of the message
requires the correct order in the pipeline.  Put Acknowledge before
Sendmail and those who want acks of their messages would get them
before the message was actually sent.  Also, while this seems to work
well for my simple tests, I think I need to play with more aspects
under more realistic load to figure out whether this approach will
work in the long run.  Finally, there's the question of incoming
messages (say injected from Usenet, or via the -request address).  I
think a pipeline will work well here too, e.g. I envision a pipeline
of bounce detectors, but I'm not sure yet how this will integrate with
the outgoing pipeline.

A couple of other random but important notes.  I've also significantly
revamped the whole admin request stuff.  Held messages and
subscriptions are no longer kept in the list object.  This should make
your config.db files considerably smaller and faster to load, and will
hopefully improve overall performance.  All that information is now
kept in a separate database file, which is currently implemented using
Python's anydbm module (this might change).  The good thing is that
this file is only touched when a message or subscription is held, or
when you hit the admindb web page.

I've decided to finally make Python 1.5.2 the minimum supported
platform.  It's been out since April, is very stable, is available for
all platforms (and I think there are RPMs available), and is much much
MUCH better than any previous version of Python.  I'm tired of trying
to keep track of backwards compatibility issues.  So the next version
of Mailman will require at least Python 1.5.2.  I should note that
there will /not/ be a Python 1.5.3 -- Guido's decided that the next
release will be 1.6, but that will definitely not be out (other than
via the anonCVS tree) until next year some time.

So... I'm going to check all these changes in so that I can continue
testing.  I'll hopefully also be ripping out lots more unused code.
I'm not sure how much more time I'm going to be hacking on it, but I
need to get this stuff working for python.org so at least until then.

Apologies in advance to the I18N'ers for not getting to that stuff,
and for making so many changes under the hood.  If this stuff turns
out to work well, I'll try to spend some time getting those mods
integrated.

Thanks,
-Barry