[Mailman-Developers] Re: Components and pluggablility

Barry A. Warsaw barry@digicool.com
Fri, 15 Dec 2000 01:17:58 -0500


>>>>> "JCL" == J C Lawrence <claw@kanga.nu> writes:

    JCL> Configuration of what exactly happens to a message is done
    JCL> by dropping scrpts/program in specially named directories (it
    JCL> is expected that typically only SymLinks will be dropped
    JCL> (which makes the web interface easy -- just creates and moves
    JCL> symlinks about)).

At a high level, what you're describing is a generalization of MM2's
message handler pipeline.  In that respect, I'm in total agreement.
It's a nice touch to have separate pipelines between each queue
boundary, with return codes directing the machinery as to the future
disposition of the message.

But I don't like the choice of separate scripts/programs as the basic
components of this pipeline.  Let me rotate that just a few degrees to
the left, squint my eyes, and change the scripts to Python modules,
and return codes to return values or exceptions.  Then I'm sold, and I
think you can do everything you want (including using separate scripts
if you want), and are more efficient for the common situations.

First, we don't need to mess with symlinks to make processing order
configurable.  We simply change the order of entries in a sequence
(read: Python list).  It's a trivial matter to allow list admins to
select the names of the components they want, the order, etc. and to
keep this information on a per-list basis.  Actually, the web
interface I imagine doesn't give list admins configurability at that
fine a grain.  Instead, a site administrator can set up list "styles"
or patterns, one of which includes canned filter sets; i.e. predefined
component orderings created, managed, named, and made available by the
site administrator.

Second, it's more efficient because I imagine Mailman 3.0 will be
largely a long running server process, so modules need only be
imported once as the system warms up.  Even re-importing in a one-shot
architecture will be more efficient than starting and stopping scripts
all the time, because of the way Python modules cache their
bytecodes (pyc files).

Third, you can still do separate scripts/programs if you want or need.
Say there's something you can only do by writing a separate Java
program to interface with your corporate backend Subject: header
munger.  You should be able to easily write a pipeline module that
hides all that in the implementation.  You can even design your own
efficient backend IPC protocol to talk to whatever external resource
you need to talk to.  I contend that the overhead and complexity of
forking off scripts, waiting for their exit codes, process management,
etc. etc. just isn't necessary in the common case, where 5 or 50 lines
of Python will do the job nicely.

Fourth, yes, maybe it's a little harder to write these components in
Perl, bash, Icon or whatever.  That doesn't bother me.  I'm not going
to make it impossible, and in fact, I think if that if that were to
become widely necessary, a generic process-forking module could be
written and distributed.

I don't think this is very far afield of what your describing, and it
has performance and architectural benefits IMO.  We still formalize
the interface that pipeline modules must conform to, probably spelled
like a Python class definition, with elaborations accomplished through
subclassing.

Does this work for you?  Is there something a script/program component
model gives you that the class/module approach does not?

-Barry