[Mailman-Developers] Big checkins a'comin'!

Wed, 14 Feb 2001 21:57:12 -0500

Folks,

As you've probably guessed, I've been quite busy with other things[*]
lately.  But I /have/ been working on Mailman in my spare time.  I'm
now ready to check in some significant changes.  They seem to work
although they've only gone through limited testing, and I have not
even converted my own sites to using this snapshot.  Mostly I want to
checkpoint my CVS and give others a chance to play with the new stuff
and give feedback.

These are mostly architectural changes to the mail delivery subsystem,
along many of the lines that were discussed last year.  My next step
is to finish integrating the I18N stuff.  I'll probably work toward a
release after that and leave rewriting the web subsystem until the
following release.  I'm shooting for a working alpha 2.1 before the
Python conference early next month.

First, all messages internally are now mimelib.Message objects instead
of rfc822.Message objects.  In case you don't know, I've been working
on the side on a spanking new, from scratch Python package to better
handle RFC1341 (MIME) and RFC822 style messages.  I'm pretty happy
with it, and just released version 0.2.  If you plan to run the newest
snapshot of Mailman, you will need to download and install
mimelib-0.2.  For more information see

    http://barry.wooz.org/software/pyware.html

Installation is very easy due to Greg Ward's wonderful distutils
package.  Don't worry though, Mailman 2.1 will come bundled with
mimelib.

As an aside, take a look at ToDigest.py to see how much cleaner
creating MIME and RFC1153 digests is now!  Oh yeah, did I mention our
plain text digests are now RFC1153 compliant? :)

The second major change is the splitting of the qfiles queue into
several sub-queues.  Right now I've got incoming, outgoing, news,
archiver, and `virgin' queues (the latter being for messages conceived
internally by Mailman).  Here's how messages flow in the system:

(mailman) ------> qfiles/virgin -------+
                                       |
                                       v
smtpd ----------+--> qfiles/in -+-> qfiles/out -> smtp
                |               |
cron/gate_news -+               +-> qfiles/news -> nntpd
                                |
				+-> qfiles/arch -> (archiver)

Each qfile directory has an associated qrunner (e.g. IncomingRunner,
OutgoingRunner, NewsRunner, ArchRunner, VirginRunner) which is managed
by a master watchdog (the transformed cron/qrunner).  This is the
third major change: qrunner is now a long running process.  If it
finds that one of its subrunners has exited, it will restart it.  Thus
each subrunner needs no lock, although the master does hold a lock.

Further, it is possible (although as yet not fully tested), to create
more than one qrunner per qfile subdirectory.  It is best to create
2^N such qrunners, since the hash space will be divided up among the
parallel runners.  This should be random enough to allow for improved
performance on clusters or multiproc machines without a lot of
complicated coordination machinery.

This arrangement has another benefit: you could replace any of the
qrunners with your own processes to clear the queues.  The only one
that probably /has/ to be the Mailman qrunner is qfiles/in since
that's what performs the "moderate and munge" phase of message
disposal, and it is the only queue which requires a list lock (well,
ArchRunner does currently, but that's an artifact of the
implementation).  E.g. I could see a completely independent runner for
qfiles/arch to interface with external archivers, or a better, more
highly parallelized runner for the qfiles/out queue.

Notes: the NewsRunner may not work.  I can't tell if my new ISP is
just refusing postings, requires some authorization I'm not aware of,
or if the runner is actually broken.  I'll fix that later.  Also,
there is no separate BounceRunner yet, although there will be.  I know
some of you have thought about other queues, and I don't think it
would be too difficult to squeeze those in.

Each message in the queue is represented by two files, a .msg file
which is simply the plain text of the message in RFC822+envelope
format, and a .db file, the format of which is configurable on a
per-site basis.  Three formats are currently supported, Python
marshals, `ascii' (plaintext key/value pairs), and "bsddb native",
which are hash files as written by the default Python bsddb module.
It isn't difficult to add additional file formats.

Hmm, other than that, there's a few more bounce detectors.  Also, I'm
ditching the crufty md5/crypt munging of passwords and opting for an
sha1 hash always.  However, to support backwards compatibility
(i.e. the list passwords are not kept in plain text), if the sha hash
of the response doesn't match the challenge, we try crypt as a
fallback.

Remember, Python 2.0 is required!

Enjoy,
-Barry

[*] Primarily the Python 2.1 alpha releases, spending a week on a
company retreat, getting ready for the Python conference in Long Beach
March 5-8, and working on "paying gig" stuff, primarily a Berkeley DB
storage implementation for the Zope Object Database.  Anybody else
coming to the Python conference?