[Mailman-Developers] Components and pluggablility

Barry A. Warsaw barry@digicool.com
Thu, 14 Dec 2000 19:05:25 -0500


I like the idea of process queues, but I don't want to take the
federation-of-processes architecture too far.  Yes, we want a
component architecture, but where I see the process boundaries is at
the message queue level.

For the delivery of messages, I see Mailman's primary job as
moderation-and-munge.  Message come into the system from the MTA,
nntp-scraper, web-board poster, or are internally crafted.  All these
things end up in the incoming queue.  They need to be approved,
rewritten, moderated, and eventually sent on to various outbound
queues: nntp-poster, smtp-delivery, archiver, etc.  Some of these are
completely independent of the Mailman databases.  E.g. it is a mistake
that SMTPDirect is in the message pipeline in 2.0 because once a
message hits this component, it's future disposition is (largely)
independent of the rest of the system.

So in my view, when Mailman decides that a message can be delivered to
a membership list, it's dropped fully formed in an outbound queue.
The file formats are the interface b/w Mailman and the queue runners
and should be platform (i.e. Python) independent.  That way, I can
ship a simple queue runner that takes messages from the outbound queue
and hands them off to the smtpd, but /you/ could drop in a different
runner process that uses GNQS to distribute load across an infinitely
expandable smtpd server farm.

[Side note.  Here's another reason why I'm keen on ZODB/ZEO as the
underlying persistency mechanism for internal Mailman data: I believe
we can parallelize the moderate-and-munge part of message processing.
Because the ZEO protocols serialize writes at commit time, you could
have multiple moderate-and-munge processes running on a server farm
and guarantee db consistency across them.  What I don't know is how
ZEO would perform given a write-intensive environment (and maybe
Mailman isn't as write intensive as I think it is).  But even if it
sucks, it simply means that the moderate-and-munge part won't be
efficiently parallizable until that's fixed.]

>>>>> "JCL" == J C Lawrence <claw@kanga.nu> writes:
>>>>> "CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:

    JCL> There are five basic transition points for a message passing
    JCL> thru a mailing list server:

    | 1) Receipt of message by local MTA

    | 1a) passthrough of message via a security wrapper from MTA to
    | list server... (I think it's important we remember that, because
    | we can't lose it, and it involves a layer of passthrough and a
    | process spawning, so it's somewhat heavyweight -- but
    | indispensable)

No problems here, because I see these as being outside the bounds of
the MLM.  The MLM has an incoming queue and it expects messages in a
particular format (very likely just RFC822 text files).  These arrive
here via whatever tortuous path is necessary: MTA->security wrapper,
nntpd->news scraper, web board cgi poster, etc.

    | 2) Receipt by list server
    | 3) Approval/editing/moderation 

What I've been calling moderate-and-munge.

    | 4) Processing of message and emission of any resultant message(s)

Here's where the output queues and process boundaries come it.  Once
they're in the outbound queues, Mailman's out of the loop.

    | 5) Delivery of message to MTA for final delivery.

Again, that's the responsibility of the mta-qrunner, be it a simple
minded Python process like today's qrunner, or batch processing system
like you've been investigating.

These processes are not completely independent of Mailman though,
e.g. for handling hard errors at smtp transaction time or URL
generation for summary digests.  Some of these can be handled by
re-injection into the message queues (i.e. generate a bounce message
and stick it in the bounce queue), but some may need an rpc
interface.

    | 6) delivery of message to non-MTA recipients (the archiver, the
    | logging thing, the digester, the bounce processor....)

Each of these should be separate queues with defined process
interfaces, but again there may be synchronous information
communicated back to Mailman.  The archiver discussions we've had come
to mind here.

    CVR> and besides, they are basically independent, asynchronous
    CVR> processes that don't need to be managed by any of the core
    CVR> logic, other than handing messages into their queue and
    CVR> making sure they stay running.  same with, IMHO, storing
    CVR> messages for archives, storing messages for digests, updating
    CVR> archives, processing digests (but the processed digest is fed
    CVR> back into the core logic for delivery), and whatever else we
    CVR> decide it needs to do that isn't part of the core,
    CVR> time-sensitive code base. (in fact, there's no reason why you
    CVR> couldn't have multiple flavors of these things, feeding
    CVR> archives into an mbox, another archiver into mhonarc or
    CVR> pipermail, something that updates the search engine indexes,
    CVR> and text adn mime digesters... by turning them into their own
    CVR> logic streams with their own queues, you effectivley have
    CVR> just made them all plug-in swappable, because you're writing
    CVR> to a queue, and not worrying about what happens once its
    CVR> there. you merely need to make sure it goes in the right
    CVR> queue, in the approved format.

I agree!

-Barry