[Mailman-Developers] (no subject)

Chuq Von Rospach chuqui@plaidworks.com
Mon, 11 Dec 2000 23:43:07 -0800


At 11:15 PM -0800 12/11/00, J C Lawrence wrote:
>
>I'm working on the principle that there is no core process, and thre
>are no musical conductors or other time beaters, just discrete nodes
>and processes competing for resources.

there has to be one, somewhere. It may be a policeman, directing 
traffic around the place, or it may be an overseer, of the inetd 
mode, or maybe even a watchdog, similar to init. But I doubt you can 
really build something (or want to) where you don't have something 
that comes first and decides who does what. Every orchestra has a 
conductor... it doesn't necessarily have to be a heavyweight from a 
code point of view, but something has to be there to make sure 
everyone else does their job.

>  > some of that is the MUA's problem, actually, but they get tied
>>  together. you don't, for instance, want an MLM who will dump 50K
>>  pieces of email an hour into the queues of an MUA that can only
>>  process 40K...
>
>I think you mean MTA above and below.

sigh. Yup. Sorry.


>
>My intent so far is just "deliver no more than N mesages per minute"
>per outbound aueue runner.  It knocks the peaks off the problem, and
>the base structure ie easy to extend from there (and I don't want to
>think about that now).

and leaves it up to the admin to tune. That's probably fine for 3.0. 
full queue watching adn self-throttling can wait.  it's nice to have, 
but we probably shouldn't try to do everything at once. Just to leave 
the hooks for later...

>I should note that my base design is very heavy in terms of process
>forks (which happen to be quite light weight under Linux, but that's
>another matter).

There are definitely places for threads, but to be honest, I see some 
tendency of people to go thread-happy. it's the "new puppy", so 
everything needs to be designed around threads... Given the amount of 
I/O we have going on, the fork overhead is going to get lost in the 
noise in most cases.

>   There's a directory full of scripts/programs. 
>
>   Run them all, in directory sort order, on this message to
>   determine if we should do XXX with it.

and who does this? this missing core policeman process, of course (grin).

but -- I'd suggest against this approach. There are problems. to 
start, the approach is pretty darn I/O heavy. you'd be better off 
loading all of this stuff into an internal database, and making it a 
memory-resident table, not a disk-based system. Administratively, it 
has some issues as well, since you're more or less requiring that 
someone with a CLI deal with a lot of the configuration -- or opening 
you up to all sorts of web-based attacks. Instead, you store scripts, 
and the CLI admin manages that process, but configuration is within 
Mailman, and web based.

i've been working on a new API for the for the 
moderator/autobounce/admin/anti-spam stuff. I'll post that in a day 
or so, what I have, because I think the way I'm putting it together 
is relevant to how I think the overall control system could be done.

>Now the default case could have those directories empty, meaning
>that Mailman will default to internal/cheap implementations, but its
>much easier to just have default implementations of the scripts for
>those directories and then punt normally.

again, I'd make as much as possible separate scripts, but have a 
default processing logic suite in the control data structures in the 
Mailman system internals.

You want to embed nothing (IMHO), because it reduces the complexity 
of all of the pieces and ofrces you to keep the interfaces clean and 
rigourous.

>There are a couple other list servers that demand that approach.
>The problem is that it really doesn't fit well with people/sites
>that don't control their own DNS.

yah. that's the rub.

>So, want LDAP?  Want SQL?  Want local DBM?  Want all three?  No
>problem.

I sure wouldn't mind being able to plug in someone else's code in 
that server -- but the reality is, it can't use 99% of a typical MLM, 
since it's all controlled upstairs by a corporate system, so it's 
overkill. That MLM is basically two scripts, one to eat a data set 
and generate the list setup, and another to authenticate and resend. 
that's all it does, so it's quite lightweight.

>I don't see the different queues needing markedly different designs,
>but needing to be able to have their processes supports cleanly
>divisible.  The base structures end up markedly similar after that.

Other than, say, imagining a system wher earchives are on a different 
machine (or two), and the search engine on a third (or fourth), so 
you want to be able to distribute the processing cleanly.... And the 
realization that archives and digest stuff can be held into a 
low-priority queue and turned into idle-time processing tasks. A big 
plus if you've got a busy system a little closer to the edge than you 
like.

>  > It also assumes that these non-delivery things are separate
>  > processes from teh act of making them available to those things,
>  > to keep (6) lightweight as possible.
>
>Process fork overhead is a problem I've not confronted yet.

And I wouldn't worry about it much.  don't think it's going to be a 
problem, other than in the MLM->MTA interface where you might be 
doing a lot of spawning and forking to parallelize, VERP, or 
whatever. And that can be minimized and avoided with some  careful 
design. In the rest of the system, don't bother. When I'm talking 
about lightweight, I was meaning code compleixity and feature creep. 
You want to stuff as much into external code pieces that are brought 
in via queueing and messagings, and keep it out of the control piece.

>That and distributed lock contention.
>Bother are pretty ugly with my current model.

Locks are a b-tch. period. Both because they don't go multi-machine 
well at all, and because whatever you choose it'll be missing or 
broken on various releases of various OSes.

>BTW I'd like to have the MLM archive messages such that a member can
>request, "SEND ME POST XXX" and have the MLM send it to him.  Ditto
>for digests.  This is in addition to any web archiving.

and another flavor of digest, what I call the HTML-TOC. Simply a 
message full of digest info (poster, subject, maybe the first couple 
of lines), and a URL to pull it out of archives. Some folks want a 
digest to skim, some folks only want header data -- so why send all 
those bytes that won't be read?

>I've been thinking about this.  I *REALLY* don't think there's much
>time sensitive code in a MLM.

The process of sending list mail is time sensitive, but most of the 
issues involving time tend to be in the MTA. On a typical MLM, a user 
might not notice if messages don't turn around in 5 minutes or 15, 
but if they're consistency turning aound at 30 minutes, many will. 
they may not even recognize why they're unhappy -- but many get 
unhappy.

and the worst aspects of this are out of everyone's control, since 
the biggest delays are caused by receiving sites, not teh sending 
site -- so you end up, if you need to, spending a lot of time 
minimizing the pain those sites cause you, through parallelism, 
domain sorting, etc.

>  > We're visiting the relatives. Cover us.
>
>I missed you.  Please wait while I reload.

Kevlar is your friend. back at 350 for an hour with a little garlic 
and garnish with chives.

-- 
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

We're visiting the relatives. Cover us.