[Mailman-Developers] Handling Bouncebacks

Tue, 21 Aug 2001 23:02:27 -0400

>>>>> "JL" == James Leone <james_leone@yahoo.com> writes:

    JL> I'm interested in understanding how Mailman handles
    JL> bounces and seeing if my previous MLM experience will
    JL> allow me to make additions.

Cool!

    JL> I'm fresh to Python, but am familiar with Perl as well
    JL> as Java and feel compfortable looking at the code, but
    JL> need some reinforcement at the higher level of
    JL> architecture.

Given your experience, you should find Python a breeze.  I predict
total Pythonic conversion in, oh, two weeks. :)

    JL> These questions should be easy.  The email blast goes out and
    JL> the bads ones come back to a specified address.  Does Mailman
    JL> run as a daemon or are bouncebacks handles by piping through
    JL> the aliases file?

    JL> Basically, where is the entry point for disecting the message
    JL> and determining whether is looks like a bounced email or the
    JL> user's mailbox is just full.

First, some background.  I'm going to describe Mailman 2.1 here, since
that's where the architecture is most robust.

Every mailing list has a number of aliases installed which direct the
MTA to pipe the message through one of the Mailman scripts (via a thin
C security wrapper).  These scripts all dump the message into a queue,
and a daemon `qrunner' picks the messages up from the queue and
processes them.  It's done this way to ensure robustness, avoiding
nasty timeout issues and uncatchable kill signals from certain MTAs in
certain situations.

The list of aliases are roughly:

    mylist         - all user postings go here
    mylist-request - the command processor robot (i.e. `help')
    mylist-admin   - the bounce processor robot, then to owners
    mylist-owner   - list of human operators of the list
    mylist-join    - auto-subscribe robot
    mylist-leave   - auto-unsubscribe robot

(BTW, It's easy to add others if we wanted.)

So, all messages to mylist@dom.ain are resent with errors pointing to
mylist-admin@dom.ain.  Messages addressed to mylist-admin get dropped
in "command" queue, and processed by the CommandRunner.  For
historical reasons, the CommandRunner processes all messages sent to
mylist-owner and mylist-admin, and there is a metadata entry to tell
CommandRunner which address the message was sent to (i.e. `toowner' or
`toadmin' respectively).

All messages with the `toadmin' key, i.e. mylist-admin, first get run
through the bounce detector.  If the bounce format matches one of the
known patterns, any extracted addresses are entered into the bounce
database.  If no match was found, the message is handled exactly as if
it were addressed to -owner, i.e. it's sent to the list owners
directly.

The extensible bounce detection subsystem is implemented as modules in
Mailman/Bouncers, with Mailman/Bouncers/BouncerAPI.py as the main
entry point.  Specifically, the function BouncerAPI.ScanMessages() is
handed the message and the MailList instance.  It returns 1 if a
bounce was detected and registered, and 0 if not.

    JL> I see that there is a Bouncer class, and an example
    JL> api, but the comments are rather sparse.

The Bouncer class is really a mixin class for the MailList.  It
manages the bounce database and a few other bouncer functions, such as
sending out the messages when a bouncing address is disabled, etc.
It's fairly old code, and fairly crufty, and has been on my hit list
for a rewrite for a while now.  Note that the Bouncer class, as in
Mailman/Bouncer.py isn't the API for the bounce detection system.

Look at BouncerAPI.py for how the bounce detector gets run.  Python
doesn't (yet) have a widely accepted interface specification
formalism, but here's how the bounce detector (i.e ScanMessage())
works.

You'll see a list in the local variable `pipeline'.  Each of these
names a module in the Mailman.Bouncers package, and ScanMessage
imports each dynamically in turn.  In each module must be a process()
function which takes a Message object.  The process() function either
returns a list of addresses -- which signals a successful detection of
this bounce format -- or it returns a false value (note that in
Python, an empty list is considered false), in which case that
particular detector didn't find a match, and the next detector is
imported and run.  The first detector to extract an address wins.

Note also, that BouncerAPI.py is designed to run as a script for
testing purposes, but that's proven awkward, and the fledgling Mailman
unit tests provide a better way to test the individual bounce
detectors.

I hope that helps!
-Barry