[Mailman-Developers] Problem with qrunner and too much incoming mail

Barry A. Warsaw barry@wooz.org
Mon, 6 Nov 2000 23:59:18 -0500 (EST)


[Removed mailman-users from the recips.]

Okay, well that went over like a lead balloon... :)

[Aside: it's time to start capturing these notes in a more useful
place than the mail archives.  I've started a ZWiki page which will
serve as "design central" for Mailman.  I did this partly to have
experience with Wikis, which everybody at my new employer simply raves
about.  Please visit

    http://www.zope.org/Members/bwarsaw/MailmanDesignNotes

and remember, you too can edit and contribute to these pages!  Don't
make me the bottleneck.  You can learn the little bit you need to know
about Wikis from that URL too.  Think collaborative web pages.]

I completely agree that we want a pluggable architecture for MLM->MTA
handoff.  We /almost/ have that now with the delivery pipeline, but my
mistake was in making the delivery module part of that pipeline.  What
Mailman's pipeline ought to do is the prep-work on the message only:
spam and privacy filtering, setting headers, updating per-list
counters, appending to digests, etc.  Anything that does not require
writing list-specific data could be pulled out of the pipeline.  I'm
thinking about specifically about nntp posting and the mta-handoff.

An API for the handoff is A Good Thing, and of course given that,
there's no reason why someone looking for a project couldn't write an
external, outgoing-only whizzymailer <wink> along the lines I
outlined.  Based on results that others are getting writing
pure-Python servers for other protocols, I think you might be able to
get some fairly impressive performance there, especially because this
would be outgoing only (it doesn't need to handle any incoming smtp
connections).  But I definitely didn't envision this to be the /only/
or even the /primary/ way for mail to be delivered, just another
option.

One of the things that such an approach would give us, is the ability
to do more direct bounce detection and handling, eliminating some of
the error prone bounce message parsing.  E.g. our whizzymailer would
know the details of Mailman so when it got errors during the smtp
transaction, it could update the db's directly.  This isn't as likely
to happen when we handoff to a localhost MTA, unless they support DSN
and we run them synchronously (which clobbers the current
architecture, as we're seeing).

Any API we come up with for MLM->MTA handoff should give us the
benefits of dsn without the problems.  I.e. it should be two-way.
Some combination of API and better architecture is probably
necessary.

>>>>> "JCL" == J C Lawrence <claw@kanga.nu> writes:

    JCL> What would be a really good approach without concern for code
    JCL> impact?  I suspect a modified form of the hash tree for queue
    JCL> storage (cf QMail's implementation minus the silly (for this
    JCL> use) inode specifics) with a slightly perverted form of your
    JCL> (Barry's) long running bulkmailer to process that hash queue.

Let's flesh that out a little.  What does data Qmail hash on?  Would
the hash tree be in-memory?  Would there be any disk persistence in
case of system failure?  Each message currently has two parts: message
content and metadata.  Would both be stored in the hash tree?  That
might get expensive for really big messages.  Maybe the message
content should be stored in a file and the metadata in the hash tree.
Then again, since most messages don't live for very long in the queue,
maybe the elimination of the disk i/o is worth a little instability or
larger memory footprint.

    JCL> I'd tend to make the bulkmailer actually an intermittently
    JCL> running item to help support for intermittently connected
    JCL> nodes.  Say something like:

    JCL>   Cron launches the bulkmailer.  The bulkmailer forks N
    JCL> children processing the queue.  The bulkmailer exits upon an
    JCL> empty queue.  Should cron launch a new bulkmailer when the
    JCL> prevvious incarnation hasn't exited yet, the new instance
    JCL> merely exits immediately.

Forking is pretty heavyweight, and threading has its problems too.
One of the things I like about the select-and-continuations-based
servers is that for i/o bound tasks, they aren't very difficult to
code efficiently.  Cron could be used to watchdog the process though.

>>>>> "CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:

    CVR> Instead of reinventing the MTA wheel, I think we're much
    CVR> better off coming up with an MTA -> MLM interface that's very
    CVR> flexible and highly configurable (most especially in how to
    CVR> deliver and how much to parallelize the infeed to the MLM),
    CVR> and then focus on how to tune the MTA and MLM through
    CVR> documentation.

    CVR> Splitting the inbound and outbound queue would be my first
    CVR> thing here, and probably split bounces into a third
    CVR> queue.

Great idea.  Each queue has it's own requirements, e.g. there's
definitely been complaints about the minimum 1-minute delay outgoing
messages.
    
    CVR> That's a pretty quick, easy optimization that makes
    CVR> sure the end user sees fast response without being bogged
    CVR> down by deliveries, and that's a huge perception issue. Then
    CVR> focus on parallelizing the delivery from mailman into the
    CVR> MTA, and make that configurable so each admin can tune it to
    CVR> their system and needs.

Agreed.  I also want that feedback for list-bound messages so that
Mailman can be notified directly from the MTA about certain types of
delivery failures.  I still worry about bottlenecks in synchronous
mode, even with a high degree of parallelism and shallow buckets.

Thinking out loud: what if the API had two channels, mlm->mta and
mta->mlm, let's call them outbound and inbound respectively.  The
outbound channel needs to contain the message text (or a disk file,
ownership of which is passed to the mta), a list of recipients, and an
set of metadata to associate with the message.  Metadata may include:
the list name, the list of error codes to report back to us, a VERP
flag, and possibly other opaque data.

Incoming is limited only to error reporting, e.g. a list of failed
addrs and their error codes, and the metadata reflected back for that
message.

    CVR> If someone wants a rhetoric on how to scale mail list servers
    CVR> infinitely, I'd be happy to explain how, since I've had to
    CVR> develop an architecture to do so.

If you write it up, I'll add it to the documentation.  At the very
least, let's add it to the ZWiki.

    CVR> I think we can build a Mailman that does this, at least for,
    CVR> oh, 95% of the universe out there, and the other 5% are going
    CVR> to have custom solutions anyway (or should!). What we don't
    CVR> want to do is screw up Mailman for the "typical" user to make
    CVR> it work for the big site; but we also don't want Mailman to
    CVR> get a reputation as a "small server only" system, because
    CVR> it'll cause people to reject it in
    CVR> implementations. Fortunately, I don't think you need to do
    CVR> that. It just needs some tweaking.

Completely agree.

    CVR> On reasonable hardware, definitely. That's basically how my
    CVR> current custom system works. right now, the number of
    CVR> parallel infeeds from mailman is 1. I'm willing to bet the
    CVR> delivery MTA is basically idle and bored.

Have you played at all with the threaded delivery in SMTPDirect?
Admittedly it's not integrated correctly with the rest of Mailman, but
I'm still curious if the notion is salvagable.
    
-Barry