[Mailman-Developers] Huge lists

Thu, 25 May 2000 10:32:41 +0100

[Personal Ccs deleted... list only this time]

Its nice to see you folks have been enjoying yourselves whilst I sleep. 
 However I now have the advantage, so will respond to the dozen or so 
messages in the last batch :-) ]

chuqui@plaidworks.com said:
> throwing hardware at a problem isn't always possible. but the place
> where rolling your own internal MTA starts becoming useful is when
> the list is big enough that the disk I/O involving the MTA starts
> becoming the significant limiter. With sendmail 8.9.x, that's fairly
> easy to run into. With sendmail 8.10, it seems to be better, and the
> multiple queue stuff solves a multitude of problems involving huge
> directory structures.

Wietse had some figures on MTA performance analysis which he used as 
part of the design process for Postfix.  He concluded that disk I/O was 
*the* limiting factor for an MTA - remember that to comply with the 
RFCs you have to commit incoming data to stable storage before 
acknowledging receipt (ie the positive reply to SMTP end of data) - in 
all current mainstream MTAs that means that the queue file has to be 
closed and synced.  Pushing data down to the rust and ensuring its 
there stably limits things drastically.   Wietse's tests should be on 
www.postfix.org

> VERP exacerbates the problem, since # of batches sent to the MTA
> equals the # of addresses, which explodes the number of control
> files, which... So at some point, it makes sense to deliver direct to
> recipient rather than build batches into the MTA, and completely
> avoid the disk I/O and deliver right out of the database to the
> receiving SMTP client. You could strongly parallelize the delivery
> setup because you'd do away with all of the MTA overhead, and do all
> sorts of fun things, like prioritize your delivery sorting and the
> like. 

If we have a million user list... and a message of a few K, I'm not 
sure I want to have a few GB of queue space taken up.  If some idiot 
sends a 1M attachment I doubt many of us have the TB spool space.

Having said that I *really* would like the possibility of the 
occaisional message (maybe even just the password reminders.. although 
I'd prefer a method where some messages if the list was in a state 
where it has recently seen bounces that it cannot tie to a particular 
subscriber) be sent out using VERP.  However then we also need to 
recode the MTA incoming handling to take that - aliases don't cut it 
any more.

------

The queueing stuff is interesting, although big list focused boxes are 
likely to not be the primary users of mailman - however if the exim 
list is anything to go by those (big list) users will be among the most 
vocal and contribute most ideas and code.  [I have worked on big mail 
systems, but not really big list systems]

claw@kanga.nu said:
> Sorting the RCPT TO list by domain costs us very little (esp if we
> sort on insertion), and can help users of dumb MTAs considerably.

Yup...

chuqui@plaidworks.com said:
> You could make a good argument that the best way to optimize is to
> create one mail batch per unique hostname, up to SMTP-MAX-RCPTS, at
> which point you split it into num_addrs/SMTP-MAX-RCPTS batches for
> that hostname, and then let the MTA sort if out from there.

Counter examples are always problems....  The biggest UK ISP group 
(several "virtual" ISPs use the same bulk ISP service set) has a few 
million users each of whom have their own domain name - so you will 
find that *.freeserve.co.uk (around 2 million domains) all goes to the 
same batch of MXes.   This means that a good approach (for this type of 
account naming) would be to pack in sets of addresses in reverse domain 
order until you had a batch of SMTP-MAX-RCPTS (obviously you 
additionally optomise this by also making sure that a single domain is 
not split over 2 batches unless the number of addresses in that domain 
are larger than a batch).

As for a quick description of exim queueing practices:-
  - Queues are processed in a basically random order... incoming 
    messages however *normally* have a delivery process invoked for
    them immediately after end-of-smtp-data (there is policy
    associated here - can be tweaked)
  - Each domain/address/message have retry hints associated with it
    if the retry time for a message/domain/address has not been hit
    then it is not taken further - so often a group of messages in
    the queue are skipped on each queue run because their retry
    time has not arrived
  - Exim resolves all undelivered addresses in a message
    and groups them by MX (lets ignore alternative delivery schemes 
here)
  - Each MX set has delivery attempted (there may be parallelism here)
  - If the MX set can be contacted then the message is shoved down the
    pipe, then the hints database is checked for other messages 
outstanding
    on that MX set - if so then the pipe is passed to another delivery
    process invoked on one of the waiting messages
  - If MX set was *not* successful then the hints are updated to say
    this message has addresses outstanding on that MX

So in the normal case each delivery process delivers only to the 
addresses in the message its dealing with - each message is independent 
so you may have several SMTPs to the same place for different messages. 
 If things clog up then hints help make things more efficient.  [these 
are hints - sometimes they are ignored, and trashing the hints db is 
quite OK].  This all works pretty well in practice.  You can if you 
want a particular type of efficiency rearrange things - ie make all 
messages resolve, but only deliver on queue runs, which means that 
messages for the same destination host are nearly always batched down a 
single SMTP session.

[On per-MTA documentation]
Lets start bullying^Wpersuading people to contribute some documentation 
on this stuff or pointers to existing MTA documentation that addresses 
this.  The question of MTA configuration for medium size lists is 
pretty common, so there must be tuning data around.   I guess I could 
collate if needed [sigh]

Big lists are a different issue - you need to *choose* your MTA and 
hardware within your constraints for that.   Tuning is probably a 
consultancy job for those.

chuqui@plaidworks.com said:
> There are  exchange sites out there who's idea of a bounce message is
> to return  the mail to the "to:" line with only the Message-ID
> changed. you can  imagine how much fun THAT is. 

More special bounce filters needed :-)
I *like* the way that mailman is now dealing with an impressive 
proportion of bounces.  I need to write an extra filter to make it drop 
delay warning messages, other than that theres very little stuff 
getting through to me in the way of bounces.

That particular one you mention should be blocked from the net - 
presumably their upstream is clueless too.

	Nigel.
-- 
[ - Opinions expressed are personal and may not be shared by VData - ]
[ Nigel Metheringham                  Nigel.Metheringham@VData.co.uk ]
[ Phone: +44 1423 850000                         Fax +44 1423 858866 ]