[Mailman-Developers] (no subject)

J C Lawrence claw@kanga.nu
Tue, 12 Dec 2000 00:26:58 -0800


On Mon, 11 Dec 2000 23:43:07 -0800 
Chuq Von Rospach <chuqui@plaidworks.com> wrote:

> At 11:15 PM -0800 12/11/00, J C Lawrence wrote:

>> My intent so far is just "deliver no more than N mesages per
>> minute" per outbound aueue runner.  It knocks the peaks off the
>> problem, and the base structure ie easy to extend from there (and
>> I don't want to think about that now).

> and leaves it up to the admin to tune. That's probably fine for
> 3.0. full queue watching adn self-throttling can wait.  it's nice
> to have, but we probably shouldn't try to do everything at
> once. Just to leave the hooks for later...

Precisely.

>> I should note that my base design is very heavy in terms of
>> process forks (which happen to be quite light weight under Linux,
>> but that's another matter).

> There are definitely places for threads, but to be honest, I see
> some tendency of people to go thread-happy. it's the "new puppy",
> so everything needs to be designed around threads... Given the
> amount of I/O we have going on, the fork overhead is going to get
> lost in the noise in most cases.

That's my hope.

>> There's a directory full of scripts/programs.
>> 
>> Run them all, in directory sort order, on this message to
>> determine if we should do XXX with it.

> and who does this? this missing core policeman process, of course
> (grin).

Nope.  The individual process which somehow got nominated for
picking up a message sitting in a list pending queue.  So, it picks
up the mesasges, asks for its distribution list, gets it, and shoves
them both over into the outbound queue.  Later some arbitrary
outbound queue processor wins/gets control of that message, opens an
SMTP session, and shovels the message down to the list of RCPT TOs.

Nobody is responsible for more than their tiny area of the field.
There is a pseudo orchestra leader, but all he really does is fork
processes that go see if there is anything in the queues to process,
and if so, start on them.

> but -- I'd suggest against this approach. There are problems. to
> start, the approach is pretty darn I/O heavy. you'd be better off
> loading all of this stuff into an internal database, and making it
> a memory-resident table, not a disk-based.

Kinda tough for LDAP or SQL where the list of membersi is dynamic
and depends on the message itself (non-traditional lists).

But yes, it hurts.  The default case will be some sort of
local/cheap DB with a single process.  The idea is that the above
architecture is there should it be needed

> Administratively, it has some issues as well, since you're more or
> less requiring that someone with a CLI deal with a lot of the
> configuration -- or opening you up to all sorts of web-based
> attacks. 

Semi.  The idea is that the CLI guy installs the base set of scripts
that are potentially available for to a given list.  The list owner
then picks from that library for his list, and assmbles and orders
them (building a symlink table on dist) via his web interface (drop
and combo boxes).

> Instead, you store scripts, and the CLI admin manages that
> process, but configuration is within Mailman, and web based.

Precisely.

> i've been working on a new API for the for the
> moderator/autobounce/admin/anti-spam stuff. I'll post that in a
> day or so, what I have, because I think the way I'm putting it
> together is relevant to how I think the overall control system
> could be done.

I haven't really thought about bounce processing at all yet.

> You want to embed nothing (IMHO), because it reduces the
> complexity of all of the pieces and ofrces you to keep the
> interfaces clean and rigourous.

Yeah.

>> I don't see the different queues needing markedly different
>> designs, but needing to be able to have their processes supports
>> cleanly divisible.  The base structures end up markedly similar
>> after that.

> Other than, say, imagining a system wher earchives are on a
> different machine (or two), and the search engine on a third (or
> fourth), so you want to be able to distribute the processing
> cleanly.... And the realization that archives and digest stuff can
> be held into a low-priority queue and turned into idle-time
> processing tasks. A big plus if you've got a busy system a little
> closer to the edge than you like.

I haven't thought about system load sensitivities yet, but I don't
see any innate reason they couldn't be another variable thrown into
the, "What am I currently allowed to process" equation.

>> Process fork overhead is a problem I've not confronted yet.

> And I wouldn't worry about it much.  don't think it's going to be
> a problem, other than in the MLM->MTA interface where you might be
> doing a lot of spawning and forking to parallelize, VERP, or
> whatever. 

My idea for VERP is trivially simple:

  The member script which generate the list of RCTP TOs which are
  attached to a pending message will periodically add a second token
  (a hash value) after the email address, seperated by whitespace.

  Note: instead of text a DMB would work just as well, perhaps
  better.

  The process that then picks up a message from outbound notices the
  hash token and constructs a special envelope for that address
  only, using the hash string as +suffix to the envelope return
  address.

Want VERP all the time?  Members always generates hash values.  Or
just a percentage of the time, or as a function of how long it was
since we last caught a bounce from that address, or as a function of
how much we like that domain.

The idea is that VERPed messages are built on the instant of handing
them off to an MTA.

> And that can be minimized and avoided with some careful design. In
> the rest of the system, don't bother. When I'm talking about
> lightweight, I was meaning code compleixity and feature creep. You
> want to stuff as much into external code pieces that are brought
> in via queueing and messagings, and keep it out of the control
> piece.

Bingo.

>> BTW I'd like to have the MLM archive messages such that a member
>> can request, "SEND ME POST XXX" and have the MLM send it to him.
>> Ditto for digests.  This is in addition to any web archiving.

> and another flavor of digest, what I call the HTML-TOC. Simply a
> message full of digest info (poster, subject, maybe the first
> couple of lines), and a URL to pull it out of archives. Some folks
> want a digest to skim, some folks only want header data -- so why
> send all those bytes that won't be read?

Ahh, excellant point, Digest really should be an OOB process handled
by their own queue.  Yup.  Absolutely.

-- 
J C Lawrence                                       claw@kanga.nu
---------(*)                        : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--