[Mailman-Developers] Re: Components and pluggablility

J C Lawrence claw@kanga.nu
Thu, 14 Dec 2000 23:00:02 -0800


On Fri, 15 Dec 2000 01:17:58 -0500 
Barry A Warsaw <barry@digicool.com> wrote:

>>>>>> "JCL" == J C Lawrence <claw@kanga.nu> writes:

JCL> Configuration of what exactly happens to a message is done by
JCL> dropping scrpts/program in specially named directories (it is
JCL> expected that typically only SymLinks will be dropped (which
JCL> makes the web interface easy -- just creates and moves symlinks
JCL> about)).

> At a high level, what you're describing is a generalization of
> MM2's message handler pipeline.  In that respect, I'm in total
> agreement.  It's a nice touch to have separate pipelines between
> each queue boundary, with return codes directing the machinery as
> to the future disposition of the message.

<nod>

> But I don't like the choice of separate scripts/programs as the
> basic components of this pipeline.  Let me rotate that just a few
> degrees to the left, squint my eyes, and change the scripts to
> Python modules, and return codes to return values or exceptions.
> Then I'm sold, and I think you can do everything you want
> (including using separate scripts if you want), and are more
> efficient for the common situations.

Fair dinkum, given the below caveat.

> First, we don't need to mess with symlinks to make processing
> order configurable.  We simply change the order of entries in a
> sequence (read: Python list).  It's a trivial matter to allow list
> admins to select the names of the components they want, the order,
> etc. and to keep this information on a per-list basis.  

<nod>

> Actually, the web interface I imagine doesn't give list admins
> configurability at that fine a grain.  Instead, a site
> administrator can set up list "styles" or patterns, one of which
> includes canned filter sets; i.e. predefined component orderings
> created, managed, named, and made available by the site
> administrator.

I'll discuss this later below (it comes down to a multi-level list
setup/definition deal).

> Second, it's more efficient because I imagine Mailman 3.0 will be
> largely a long running server process, so modules need only be
> imported once as the system warms up.  

I have been working specifically on the assumption that it will not
be a long running process, and that instead it will be automated by
cron starting up a helper app periodically which will fork an
appropriate number of sub-processes to run the various queues (with
simple checks to make sure that the total number of queue running
processes of a given type on a given host don't exceed some
configured value.  The base reason for this assumption is that it
makes the queue processing more analagous to traditional queue
managers, allowing the potential transition from Mailman's internal
(cron based) automation to a real queue manager semi-transparent.
The assumption in this was that the tool used to move a message
between queues was an external explicitly standa-alone script.  The
supporting reason being a that simple replacement of that script by
something that called the appropriate queue management tools for the
queue manager de jour would allow the removal of the Mailman
"listmom" and its replacement by the queue manager, be it LSF, QPS
MQM, GNU queu, or something else.

This is what I mean by "light weight self-discovering processes that
behave in a queue-like manner".  The processes are small and light.
They figure out what needs to be done locally per their
host-specific configurations, and then do that in a queue-like
manner.

What's this host-specific stuff?  More later.

ObNote: There actually need to be seperate and discrete tools for
both moving a given message into a specific queue (ie different
tools for inbound, pending, oubound, etc) and different tools for
injecting messages (that didn't exist before) into each queu.  Doing
it this way allows a site to roll part of the system to a queue
manager and allow the rest to remain default.  This could be done by
a single tool linked to different names, or passing the queue name
as an argument and allowing an easy call-out as above to a
module-wrapped external tool.

> Even re-importing in a one-shot architecture will be more
> efficient than starting and stopping scripts all the time, because
> of the way Python modules cache their bytecodes (pyc files).

I'm sold given the comment on the next paragraph.

> Third, you can still do separate scripts/programs if you want or
> need.  Say there's something you can only do by writing a separate
> Java program to interface with your corporate backend Subject:
> header munger.  You should be able to easily write a pipeline
> module that hides all that in the implementation.  You can even
> design your own efficient backend IPC protocol to talk to whatever
> external resource you need to talk to.  I contend that the
> overhead and complexity of forking off scripts, waiting for their
> exit codes, process management, etc. etc. just isn't necessary in
> the common case, where 5 or 50 lines of Python will do the job
> nicely.

Then we should provide a template python module that accepts the
approriate arguments passes them the a template external program,
and grabs its stdout and RC.  Configuring users could/would then
merely take this, rename it, and customise it and roll it in
transparently.

> Fourth, yes, maybe it's a little harder to write these components
> in Perl, bash, Icon or whatever.  That doesn't bother me.  I'm not
> going to make it impossible, and in fact, I think if that if that
> were to become widely necessary, a generic process-forking module
> could be written and distributed.

Umm, yeah.  Shame nobody thought of that.

> I don't think this is very far afield of what your describing, and
> it has performance and architectural benefits IMO.  We still
> formalize the interface that pipeline modules must conform to,
> probably spelled like a Python class definition, with elaborations
> accomplished through subclassing.

Bingo.

> Does this work for you?  Is there something a script/program
> component model gives you that the class/module approach does not?

Not inherently given a mathod for easy call outs as mentioned above.

Now onto the business of the host-specific configurations, what I've
been looking at is something as below.  The global list
configuration consists of the following directories and files:

  ~/cgi-bin/*      (MLM CGIs)
  ~/config         (global MLM config)
  ~/config.force   (global MLM config (can't change)
  ~/config.<hostname>  (config specifics for this host)
  ~/scripts/*      (all the tools and scripts that do things)
  ~/scripts/member/*        (membership scripts)
  ~/scripts/moderate/*      (moderation scripts)
  ~/scripts/pre-post/*      (scripts run before posting)
  ~/inbound/*      (messages awaiting processing by the MLM)
  ~/outbound/*     (messages to be sent my the MLM)
  ~/services/*     (the processes that actually run mailman)
  ~/templates/*    (well, templates)
  ~/groups/        (groups of list configs)
  ~/groups/default/                 (There has to be a default)
  ~/groups/default/...              (Basically a full duplicate of
                                     the root setup, mostly done as 
                                     symlinks)
  ~/groups/<groupname>/config       (deltas from ~/config)
  ...etc

Then on the list base:

  ~lists/<listname>/config     (list config as deltas from group config)
  ~lists/<listname>/group      (symlink to ~/groups/<something>)
  ~lists/<listname>/moderate/* (messages held for moderation)
  ~lists/<listname>/pending/*  (messages waiting to be processed)
  ~lists/<listname>/scripts/*  (what does all the work)

The assumption so far is that the queues were represented as
discrete files on disk, much like the current held messages in v2,
with file names mapping to address/function of the message (ie list
name plus command/request/post/bounce/reject/something) with
filename extentions for various meta data sets, etc, (this helps
keep things human readable).  There are aspects of this I'm not
happy with (eg for distribution lists on account of size (consider a
1M member list).

The idea is that the config files are simple collections of variable
assignments much like the current Defaults.py or mm_cfg.py.
Further, they are read in the following order:

  ~/config
  ~/groups/<groupID>/config
  ~/lists/<listname>/config
  ~/groups/<groupID>/config.force
  ~/config.force

Where the web interface would present the options that are locked by
a higher level config (ie in a force file) as present but
unconfigurable.  

Now, the next thing, outside of populating the initial root
directory with files (such as the various configures python modules
etc), everything else gets gone from the web.  One account has
access to the root and can create and edit groups etc.  Another
account has access to the list configs, and then of course there are
mdoerator-only accounts.  All of this of course gets exported thru
the standard authentication methods so thta it can get replaced by
<whatever>.

-- 
J C Lawrence                                       claw@kanga.nu
---------(*)                          http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--