Automated mail bounce handling.

Fri Dec 3 07:53:55 EST 2004

Hi All,

I'm writing an application that sends out emails, for workflow-item 
tracking purposes.

I am using a VERP-style addressing mechanism, whereby I send each 
message from an uniquely generated email address, so that I can relate 
bounces, notification failures, etc, to the original address to which 
the email was sent.

Problem is differentiating between the different types of message that 
can come back to that address. For example, if the message was 
undeliverable for "permanent" reasons, e.g. user moved on, invalid email 
address, etc, then I get back an email to the unique address, which 
needs to be parsed in order to find the reason for failure. But I also 
get back vacation emails to the same address, e.g. "I'm out of the 
office at the moment, I'll read your email when I get back". The former 
should be recognised by the application, so that appropriate action can 
be taken, i.e. ask admin for a new address. But the latter should be 
essentially ignored, because it doesn't affect whether the recipient 
actually received the email.

I've been researching a little, and found the following approaches:-

1. RFC 1839, which specifies header values giving easy-to-parse 
reason-codes for the return mail. But I'm uncertain as to how widely 
supported RFC 1839 is "in the wild".

2. The mailman approach, which is essentially the linear application of 
algorithmic matchers, each of which is given a chance to recognise 
bounced mails, in order to determine the nature of the bounce. AFAICT, 
this method involves a significant amount of coding, as it requires 
writing a new matcher for each MTA/MUA in existence (surprise, surprise, 
they all do it slightly differently). I see that the mailman 
distribution comes with a couple dozen matchers, which is obviously far 
from complete. I'd prefer not to go down this path, since I could end up 
writing hundreds of matchers, as I incrementally discover the different 
styles that real-world M[T|U]As use. And mailman is GPL, which is a 
problem in this case.

I am trying to think of more robust and less costly (in coding time) 
approaches. Maybe some form of text-matching algorithm, such as

1. Bayesian classification?
2. Keyword recognition?

I'd be grateful for any pointers or suggestions for existing python 
solutions to this problem.

TIA,

-- 
alan kennedy
------------------------------------------------------
email alan:              http://xhaus.com/contact/alan