[Mailman-Developers] b6, postfix/qrunner super disaster

Barry A. Warsaw barry@wooz.org
Fri, 27 Oct 2000 15:40:35 -0400 (EDT)


>>>>> "CK" == Christopher Kolar <ckolar@admin.aurora.edu> writes:

    CK> I am wondering if qrunner got the error message and kept the
    CK> item in qfiles, but postfix also deferred delivery of the
    CK> message and kept it in the MTA mqueue -- growing by one copy a
    CK> minute until the server was able to successfully find the
    CK> recipients' hosts.

You're using SMTPDirect.py right?  Let's look at how deliver() works:

- It tries to create an smtplib.SMTP instance, passing in the hostname
  and port that you've specified in mm_cfg.py (or inherited from
  Defaults.py).

  This step could raise a socket.error or a general SMTPException.
  The assumption is that if that happens, the MTA never got the
  message and essentially delivery failed for all recipients.

- Next, the SMTP.sendmail() method is called to sent the message text
  to the list of recipients.  One of two things could happen here:

  a. an SMTPRecipientsRefused is raised, meaning that some but not all
     of the recipients had delivery problems.  The exception object
     has an attribute which contains the failing recipients.  The
     assumption here is that delivery failed to those recipients.

  b. the sendmail() method could return a list of failed recipients
     similar to (a) above.

- Each failed recipient has a corresponding error code describing why
  that recipient failed.  Each failed recipient is processed in turn:

  a. If the error code is >= 500 but <> 552, then the failure is
     deemed permanent according to RFC 821 and DRUMS.  That address is
     RegisterBounce()'d and discarded.

  b. Otherwise the failure is deemed temporary, so Mailman remembers
     the address for retry.

- If there are any retryable addresses, the message remains in the
  qfiles queue and retried with the temmporary failure recipients.

So, the only thing I can guess at is that Postfix is returning a
temporary failure code for recipients which it still tries to do
delivery.  Simon Coggins reports similar symptoms with sendmail, but
I've never seen them, and I suspect that the situation causing these
must be pretty rare.

So that's the idea behind SMTPDirect.py, but I still don't know enough
to understand what's causing the dups.  Could it be some
misunderstanding of the RFC 821 error codes?

-Barry