[Mailman-Developers] Huge lists

Chuq Von Rospach chuqui@plaidworks.com
Wed, 24 May 2000 20:47:21 -0700


At 6:38 PM -0700 5/24/2000, J C Lawrence wrote:

>True.  My curiosity however is what MTA's do MX sorting, and more
>particularly, MX collapsing (eg for two different targets that share
>an MX's among their lowest level).  The potential gains there are
>likely not huge, but could be (guesstimate) noticable for high
>volume servers with broad standard deviations in their target lists.
>
>I'll have to check into that some time.


but -- as the experts say, the first $500 buys you 90% of the stereo 
response, and the rest of the money goes into getting you as close to 
100% as you can get. MX sorting is definitely far up into that 90% 
range, computationally and time expensive, and lots of other stuff 
can be done first, with more gain, and less effort. For most lists, 
the differential in performance between domain sorting and MX sorting 
is probably not statistically meaningful.

Maybe one thing we need is a definition of what Mailman is and what 
it isn't. Some kind of target for the size of lists it wants to 
reasonably support. If it's 5,000 users, it doesn't matter what you 
do. If it's 50,000, or 500,000, you definitely have different 
requirements.

So defining what mailman wants to solve can help us clear these 
things up. "Every list in the universe" is a laudable goal, but it'll 
probably delay shipping 2.0 for a decade or so... So I'd like to 
suggest some performance goals be defined, and then program to those, 
so we're all on the same page.

(being able to handle a moderately busy 25,000 user list, say 15-30 
messages a day, would probably cover 95% of the mailing lists in the 
universe, and still technologically well within reach... It'd be nice 
to be able to say "5 million subscribers in 2 minutes!", but focus on 
a solid "do most things for most folks" now, and add the high 
performance/huge list support in 2.5. But leave the hooks in, so we 
don't have to rewrite later....)

>  > Definitely. Since most of the "performance" issues involve the
>>  MTA, and the MLM only affects it based on how it stuffs things
>>  into the MTA.
>
>There gets to be a point however where it really exceeds Mailman's
>charter.

True, but a one page README.<mlm> page in the disto for each 
reasonably supported MLM isn't a bad thing, and better than what 
anyone has. Because one reality is that most MLMs are configured 
(especially out of the box) to manage incoming mail, and efficient 
handling of outgoing mail is very different. Some hints on dealing 
with those optimizations and tradeoffs can't hurt, and wouldn't have 
to be significant or huge efforts.

>   Mailman is a list server, not a training course on how to
>build and configure a high volume mail system.  While I don't think
>we've crossed or even approached that line, In general I'd rather
>spend time on Mailman than high end server considerations which are
>adequately (?) documented elsewhere.

I tend to agree -- but performance of mailman is inextricably tied to 
performance and interface with the MTA. If you ignore the MTA, your 
chances of making mailman work well are very small. and users will 
tend to blame mailman, because "sendmail worked fine before we 
installed mailman, so...."

>  > Right now, I generally recommend sites doing a lot of mail-list
>>  traffic...
>
>I generally recommend heartily against Sendmail for such sites.  I
>just don't see it as worth the extra effort (or obscurity) when
>newer MTAs such as Exim (wot I use currently), QMail or Postfix in
>general offer the same or better performance and configurability
>with the added benefit of human readable/auditable config files.
>
>While its a cheap logic, its easy to note that none of the very high
>volume commercial email sites out there are based on Sendmail
>(Critical Path, Hotmail, Onelist, EGroups, etc).

Valid points. But sendmail is a default-install in many 
installations, and so it's going to be what's avaialble. So helping 
people figure out how to best make use of it is important, sort of 
like refusing to let AOL users on a list. Yes, some AOL users can be 
problems, but AOL users also tend to be a huge part of an audience 
(on my machines, 15% isn't uncommon).

Postfix looks like a *real* win, but until I run it through its 
paces, I won't use it. But the people I know who do love it. And I've 
got other fish to fry before moving to postfix (and right now, I'm 
doing 400-500,000 an hour out of my mail system without trying too 
hard, using sendmail 8.9.3, and peaks approaching 900K. So eeking out 
more performance by swapping MTAs is not a priority)

>  > As someone who deals with email for a living...
>
>I should probably note at this point that I'm working for Critical
>Path on their mail systems.

As long as we're into disclosure, I run a bunch of hobby lists at 
plaidworks.com, but I also do most of the mail list stuff at Apple, 
where there's a combination of off the shelf (or actually, heavily 
hacked) majordomo and custom jobs, so my lists range from really tiny 
(10-12) to very, very large. The large system is custom coded, with 
the exception of the last remnant, which is bulk_mailer. I've 
completely replaced everything else, and bulk_mailer's replacement is 
going into test as soon as I finish it (and it'll fully VERP; 
although I had a bit of a scare last week when I was doing some 
throughput estimates and got some zeros wrong, and thought for a 
while that my total delivery was going to range into the terabytes. I 
was wrong, thank ghu, and it's merely in the range of 40-60 gigabytes 
per mailing....)

>Sorry, entirely different orders of magnitude there.  Notes is bad,
>certainly, and there few things even close to being as bad as Notes
>or CC Mail (tho they've gotten a lot better in recent years (which
>isn't saying much)), but Exchange/Outlook make them look positively
>angelic in comparison.

Notes is obnoxious, especially since return-receipt is an 
administrator controlled option, and not smart enough to NOT r-r 
mailing lists (or anything else), and I've found Notes administrators 
about as obnoxious as their software when you point things like that 
you. The only word I can use for Exchange is brutal. There are 
exchange sites out there who's idea of a bounce message is to return 
the mail to the "to:" line with only the Message-ID changed. you can 
imagine how much fun THAT is.

Those sites (fortunately rare, all broken, but at least two of them 
have been broken that way for four bloody years) my site simply 
blackholes.

-- 
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

And they sit at the bar and put bread in my jar
and say 'Man, what are you doing here?'"