[Mailman-Developers] (no subject)

Mon, 11 Dec 2000 20:49:36 -0800

At 7:51 PM -0800 12/11/00, J C Lawrence wrote:

>ObTheme: All config files should be human readable unless those
>files are dynamically created and contain data which will be easily
>and automatically recreated.

ObTheme: All configuration should be possible via the web, even if 
the system is misconfigured and non-functional. Anything that can NOT 
be safely reconfigured without breaking the system should not be 
configurable via the web. (in other words, anything you can change, 
you should be able to change remotely, unless you can break the 
ssytem. If you cna break the system, you shouldn't be allowed near it 
trivially...)

>   1) Using multiple simultaneous processes/threads to parallelise
>      a given task.
>
>   2) Using multiple systems running parallel to parallelise a given
>      task.
>
>   3) Using multiple systems, each one dedicated to some portion(s)
>      or sub-set of the overall task (might be all working in
>      parallel on the entire problem (lock contention! failure
>      modes!)).

that's my model perfectly, althought I think 2 and 3 are reversed. 
it's cleaner architecturally to go to divesting and distributing 
functionality before 'clustering'. In fact, I'm not sure clustering 
(which I'll use to term multiple mailman systems running in parallel) 
implies a system really, really large, when you realize that the 
primary resource eaters (like delivery) can effectively be infinitely 
distributed. I'm not sure how big a Mailman system you'd need ot 
require parallelizing the core process, as long as you can divest off 
other pieces to a farm that could grow without bounds. So maybe we 
don't need that next (complicated) step, and make it parallelized and 
distributable for everything except that core control process, but 
manage the complexity of that control process to keep everyting out 
of it exect the absolute necessity.

>Observation: MLMs are primarily IO bound devices, and are
>specifically IO bound on output.  Internal processing on mail
>servers, even given crypto authentication and expensive membership
>generation processes (eg heavy SQL DB joins etc) are an order of
>magnitude smaller problem than just getting the outbound mail off
>the system.

some of that is the MUA's problem, actually, but they get tied 
together. you don't, for instance, want an MLM who will dump 50K 
pieces of email an hour into the queues of an MUA that can only 
process 40K...

But in general, you're correct. Especially if you define DNS delays 
and SMTP protocol delays caused by the receiving machine to be 
"output" (grin)

>Sites with large numbers of lists with large numbers of members (and
>presumably large numbers of messages per list) are the pessimal
>case, and is not one Mailman is currently targeting to solve.

but if you define the distribution capabilities correctly, this case 
is solved by throwing even more hardware at it, and the owners of 
this pessimal case presumably have a budget for it. If you see 
someone tryting to run Sourceforge on a 486 and a 128K DSL line, you 
laugh at them.

>Observation: Traffic bursts are bad.  Minimally the MLM should
>attempt to smooth out delivery rates to a given MTA to be no higher
>than N messages/time.

The obverse of that is that end-users seriously dislike delays, 
especially on conversational lists. It turns into the old "user 
expectation" problem -- it's better to hold ALL mail for 15 minutes 
so users come to expect it than to normally deliver mail in 2 
minutes, except during the worst bulges... But in general, the MLM 
should deliver as fast as it reasonable can without overloading the 
MUA, which implies some kind of monitoring setup for the MUA, or some 
user-controlled throttling system. the latter unfortunately, implies 
teaching admins how to monitor and adjust, a support issue. The 
former implies writing an interface for every MTA -- a development 
AND support issue.

>20Million messages sitting in the outbound queue), that the MLM will
>give the MTA the opportunity to try and react intelligently rather
>than overwhelming it near instantly with all 20M messages dumped in
>the MTA spool over 30 seconds while the spool filesystem gags.

I will not make comments about qmail. I will not make comments about 
qmail. I will be good. I will be good. (grin)

>   1) Receipt of message by local MTA

1a) passthrough of message via a security wrapper from MTA to list 
server... (I think it's important we remember that, because we can't 
lose it, and it involves a layer of passthrough and a process 
spawning, so it's somewhat heavyweight -- but indispensable)

>   2) Receipt by list server
>   3) Approval/editing/moderation
>   4) Processing of message and emission of any resultant message(s)
>   5) Delivery of message to MTA for final delivery.

	6) delivery of message to non-MTA recipients (the archiver, the 
logging thing,
		the digester, the bounce processor....)

>#1 is significant only because we can can rely on the MTA to
>distinguish between valif list-related addresses and non-list
>addresses.

although one thing I've toyed with is to give a subdomain to the MLM, 
and simply pass everything to it (in sendmail terms, using 
virtusertable to pass @list.foo.bar to mailman@foo.bar). Then you 
take the MLM out of having to know what lists exist and 
administrative needs to keep that interface in sync. The downside is 
it doesn't fit the design of some users (but that can be fixed by 
education if we can prove why it's better), and you get into having 
to handle some MTA functions, such as DSN compatible bounce messages. 
I've more or less decided than when I rewrite my internal corporate 
mail list, I'll do that rather than generate alias listings (for, oh, 
12,000 groups) and teh hassles and overheads of all that. That'll be 
especially useful if we do waht I hope, which is set it up so the 
server has no data at all, but authenticates via LDAP to get list 
information on demand out of the corporate databases. There are some 
definite advantages to not knowing whether something exists until the 
need to know exists -- and as Mailman starts edging towards 
interfacing to non-Mailman data sources for list information, that 
ability grows in importance.

6) is the processesing needed to support other functions that act on 
messages. The idea is that instead of delivering to the MTA, we have 
a suite of functions that deliver the message ot whatever needs to 
process it. Those can be asynchronous and don't need to be as timely 
as (5), and have different enough design needs that I split them out 
from the MTA delivery (although traditionally, stuff like digests are 
managed by doing an MTA transfer out of the MLM and back in to a 
different program...)

It also assumes that these non-delivery things are separate processes 
from teh act of making them available to those things, to keep (6) 
lightweight as possible.

>Note: Bounce processing and request processing re not detailed at
>this point as their rate of occurance outside of DoS attacks is
>comparitively low and are far cheaper than list broadcasts in
>general.

and besides, they are basically independent, asynchronous processes 
that don't need to be managed by any of the core logic, other than 
handing messages into their queue and making sure they stay running. 
same with, IMHO, storing messages for archives, storing messages for 
digests, updating archives, processing digests (but the processed 
digest is fed back into the core logic for delivery), and whatever 
else we decide it needs to do that isn't part of the core, 
time-sensitive code base. (in fact, there's no reason why you 
couldn't have multiple flavors of these things, feeding archives into 
an mbox, another archiver into mhonarc or pipermail, something that 
updates the search engine indexes, and text adn mime digesters... by 
turning them into their own logic streams with their own queues, you 
effectivley have just made them all plug-in swappable, because you're 
writing to a queue, and not worrying about what happens once its 
there. you merely need to make sure it goes in the right queue, in 
the approved format.

>We don't want an over-arching API, or the attempt to solve the
>entire peoblem with either one hammer, or one sort of hammer.

I like hammers! My thumb doesn't, not since the divorce, at least...

kewl. good stuff here.

-- 
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

We're visiting the relatives. Cover us.