[Mailman-Users] Mailman performance / sends per hour

Sat Jul 26 12:03:57 CEST 2003

At 7:43 PM -0400 2003/07/25, Jon Carnes wrote:

>  Actually Brad, it looks like your knowledge of Sendmail is rather dated.
>  Sendmail has been doing this since 2001.
>
>    http://www.sendmail.org/~ca/email/doc8.12/RELEASE_NOTES

	This is old.  Check the RELEASE_NOTES for version 8.12.9 (which 
has a major security fix, and you are advised not to use any older 
version of 8.12), or 8.12.10.Beta2 (which I quote here and dated Jul 
1 05:08).  The only references I can find to the word "sort" anywhere 
in this file with regards to version 8.12 or later are:

8.12.7/8.12.7   2002/12/29
         Do not lookup MX records when sorting the MSP queue.  The MSP
                 only needs to relay all mail to the MTA.  Problem found
                 by Gary Mills of the University of Manitoba.
         Avoid problems with QueueSortOrder=random due to problems with
                 qsort() on Solaris (and maybe some other operating systems).
                 Problem noted by Stephan Schulz of Gruner+Jahr..

8.12.0/8.12.0   2001/09/08
         If the new option FastSplit (defaults to one) has a value greater
                 than zero, it suppresses the MX lookups on addresses when they
                 are initially sorted which may result in faster envelope
                 splitting.  If the mail is submitted directly from the
                 command line, then the value also limits the number of
                 processes to deliver the envelopes; if more envelopes are
                 created they are only queued up and must be taken care of
                 by a queue run.
         QueueSortOrder=Random sorts the queue randomly, which is useful if
                 several queue runners are started by hand to avoid contention.
         QueueSortOrder=Modification sorts the queue by the modification time
                 of the qf file (older entries first).

	Note that none of these make any mention whatsoever to tracking 
previous average delivery times for a recipient and using this as a 
predictor for future average delivery times, and therefore sorting 
the current input on this basis.

	But please check again to make sure I didn't miss something.  You 
know me, I've only been mucking about with sendmail since ~1991, my 
name only comes up in the full RELEASE_NOTES four times, I was only 
the sendmail FAQ maintainer from ~1995 to ~1997, and I could easily 
have forgotten or missed something.

>  Postfix has some very interesting features that make it much better to
>  use than Sendmail, but the one that sets it most apart in added
>  efficiency is its default queueing structure.

	You mean the hashed queues?  Yes, that's good, but sendmail can 
do better with the optional multiple queue structure.  With this 
option, sendmail gives you more control over how many queues are 
created at what depth, instead of giving you an arbitrary number of 
sixteen queue directories per hash level.  Since most filesystems 
start flaking out with more than about 1000 directory entries at a 
single level, you can flatten the sendmail queue structure 
significantly and still have fewer files per leaf directory node than 
postfix would allow.

	Moreover, it is the hashed queue structure that postfix uses, and 
the way it uses the disk for queue management by moving files from 
one directory structure to another, which causes the fundamental 
performance limitations which sendmail allows you to exceed.  Note 
that sendmail never moves files around on-disk, and therefore does 
not result in additional unnecessary synchronous meta-data updates.


	Indeed, with the safe asynchronous writes feature, sendmail can 
safely avoid causing any asynchronous meta-data updates at all for 
most cases, as the mail messages are small enough that they can be 
buffered in memory and delivered on the initial delivery attempt. 
Only large messages or messages that fail the initial delivery 
attempt end up getting written to disk at all, which means that 
sendmail can approach pure RAM/network I/O throughput speeds whereas 
postfix will always be bound by disk I/O.

>  I do agree with you though, that if the MTA (or Mailman) could
>  periodically sweep the MTA delivery logs and sort the domains from
>  fastest to slowest, there would be an increase in efficiency.

	This is the feature *I* was talking about, although I'd be 
inclined to do it on an individual basis and not a domain basis, 
since some individuals might have .procmailrc or other processing 
scripts on the remote end that might be significantly slower to 
process than other recipients within the same domain.

	For situations where this is not an issue at the remote end, the 
problem would largely solve itself because all those recipients would 
tend to sort together anyway.

>  For larger lists and Mailman, I have found that nothing beats using a
>  RAM disk and accessing the list database files via the mounted RAM disk.
>  The speed increase can be 100x faster.

	If you're going to be a professional spammer, then I would 
suggest using the professional spammer tools.

	Otherwise, if you're going to run a mailing list for normal 
people, then I would suggest that you pay attention to sections 5.3.3 
and 5.3.4 of RFC 1123 "Internet Host Requirements", which is also 
part of STD0003:

5.3.3  Reliable Mail Receipt

          When the receiver-SMTP accepts a piece of mail (by sending a
          "250 OK" message in response to DATA), it is accepting
          responsibility for delivering or relaying the message.  It must
          take this responsibility seriously, i.e., it MUST NOT lose the
          message for frivolous reasons, e.g., because the host later
          crashes or because of a predictable resource shortage.

          If there is a delivery failure after acceptance of a message,
          the receiver-SMTP MUST formulate and mail a notification
          message.  This notification MUST be sent using a null ("<>")
          reverse path in the envelope; see Section 3.6 of RFC-821 .  The
          recipient of this notification SHOULD be the address from the
          envelope return path (or the Return-Path: line).  However, if
          this address is null ("<>"),  the receiver-SMTP MUST NOT send a
          notification.  If the address is an explicit source route, it
          SHOULD be stripped down to its final hop.

          DISCUSSION:
               For example, suppose that an error notification must be
               sent for a message that arrived with:
               "MAIL FROM:<@a, at b:user at d>".  The notification message
               should be sent to: "RCPT TO:<user at d>".

               Some delivery failures after the message is accepted by
               SMTP will be unavoidable.  For example, it may be
               impossible for the receiver-SMTP to validate all the
               delivery addresses in RCPT command(s) due to a "soft"
               domain system error or because the target is a mailing
               list (see earlier discussion of RCPT).

          To avoid receiving duplicate messages as the result of
          timeouts, a receiver-SMTP MUST seek to minimize the time
          required to respond to the final "." that ends a message
          transfer.  See RFC-1047 [SMTP:4] for a discussion of this
          problem.


	In particular, this means that you can't use a RAM disk for this 
application.  You *could* use a battery-backed solid-state disk, so 
long as you could guarantee that it is configured in such a way that 
it will survive power loss, reboots, remounting, filesystem check, 
etc....  Of course, proper SSD is much, much more expensive than a 
simple RAM disk.

	The alternative is using sendmail with the above-mentioned safe 
asynchronous writes feature, which allows you to get full use of your 
RAM, at nearly RAM disk speeds, but to do so safely.

-- 
Brad Knowles, <brad.knowles at skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)