[Mailman-Users] Giving away the secrets of 99.3% email delivery

Wed May 9 22:01:57 CEST 2012

Is this an appropriate place to discuss the broader topic of how to best
use Mailman? Now that we have it running well, we would like to take
additional steps to ensure that the list's emails are delivered as well as
they can be.

The 37Signals article caught my attention. I would enjoy knowing others's
thoughts about how to apply these (or other) suggestions to Mailman.

It seems to me that Mailman provides at least some of the intelligence (via
logs) that 37Signals custom developed on top of Postfix. Am I right? The
core suggestions seem to be universalL SPF records, DKIM signing, reverse
DNS entries, etc.. (And, btw, I don't yet know how to implement any of
those things except SPF records.)

 Giving away the secrets of 99.3% email
delivery<http://37signals.com/svn/posts/3096-giving-away-the-secrets-of-993-email-delivery>

We send a lot of mail for Basecamp <http://basecamphq.com/?source=svn_post>,
Highrise <http://highrisehq.com/?source=svn_post>,
Backpack<http://backpackit.com/?source=svn_post>,
and Campfire <http://campfirenow.com/?source=svn_post> (and some for
Sortfolio <http://sortfolio.com>, the Jobs Board <http://jobs.37signals.com>,
Writeboard <http://writeboard.com>, and Tadalist <http://tadalist>). One of
the most frequently asked questions we get is about how we handle mail
delivery and ensure that emails are making it to people’s inboxes.
Some statistics

First, some numbers to give a little context to what we mean by “a lot” of
email. In the last 7 days, we’ve sent just shy of 16 million emails, with
approximately 99.3% of them being accepted by the remote mail server.

Email delivery rate is a little bit of a tough thing to benchmark, but by
most accounts we’re doing pretty well at those rates (for comparison, the
tiny fraction of email that we use a third party for has had between a
96.9% and 98.6% delivery rate for our most recent mailings).
How we send email

We send almost all of our outgoing email from our own servers in our data
center located just outside of Chicago. We use Campaign
Monitor<http://campaignmonitor.com>for our mailing lists, but all of
the email that’s generated by our
applications is sent from our own servers.

We run three mail-relay servers running Postfix that take mail from our
application and jobs servers and queue it for delivery to tens of thousands
of remote mail servers, sending from about 15 unique IP addresses.
How we monitor delivery

We have developed some instrumentation so we can monitor how we are doing
on getting messages to our users’ inbox. Our applications tag each outgoing
message with a unique header with a hashed value that gets recorded by the
application before the message is sent.

To gather delivery information, we run a script that tails the Postfix logs
and extracts the delivery time and status for each piece of mail, including
any error message received from the receiving mail server, and links it
back to the hash the application stored. We store this information for 30
days so that our fantastic support team <http://smiley.37signals.com> is
able to help customers track down why they may not have received an email.

We also send these statistics to our statsd server so they can be reported
through our metrics dashboard. This “live” and historical information can
then be used by our operations team to check how we’re doing on aggregate
mail delivery for each application.
Why run your own mail servers?

Over the last few years, at least a dozen services that specialize in
sending email have popped up, ranging from the bare-bones to the
full-service. Despite all these “email as a service” startups we’ve kept
our mail delivery in-house, for a couple of reasons:

   - *We don’t know anyone who could do it better.* With a 99.3% delivery
   rate, we haven’t found a third party provider that actually does better in
   a way they’re willing to guarantee.
   - *Setup hassle* Most of the third party services require that you
   verify each address that sends email by clicking a link that gets sent to
   that address. We send email from thousands and thousands of email addresses
   for our products, and the hassle of automatically registering and
   confirming them is significant. Automating the process still introduces
   unnecessary delivery delays.

Given all this, why should we pay someone tens of thousands of dollars to
do it? We shouldn’t, and we don’t.

*Read more about how we keep delivery rates high after the jump…*
How we keep our mail delivery rates up

Lets be honest from the get-go. Mail delivery is more of an art than a
science. We’ve found that even when you “play by the rules”, there’s still
times when a major provider will reject all your mail without notice.
Usually it takes a couple emails to to the providers abuse address, and
things get resolved. In spite of these “out of our control” issues, we’ve
found a few things help us keep delivery rates up:

   1. *Constantly monitor spam
blacklists<https://raw.github.com/37signals/37s_cookbooks/edefbd17eeb8f25f6d42a01e7f207848fb23e49f/nagios/files/default/plugins/check_bl_async.pl>
   .* We have a set of Nagios alerts that regularly check if we’re listed
   on any delivery blacklists, and whenever they go off we take whatever
   corrective action we need to get back off the blacklist.
   2. Have valid SPF
<http://en.wikipedia.org/wiki/Sender_Policy_Framework>records. Don’t
impersonate your users. When running a web app like
   Basecamp <http://basecamphq.com/?source=svn_post>, which sends email
   that are generated by another user, it can be tempting to send the email
   from that user (e.g., so that a comment I wrote on Basecamp would appear to
   come from noah at 37signals dot com), which might make people feel more
   comfortable. Unfortunately, this is a surefire way to end up on spam lists,
   since you’ll likely be sending from an IP address that does not have the
   valid SPF records. And chances are, if the user’s domain does have
an SPFrecord, it doesn’t include your application’s IP.
   3. Sign the mail! DKIM and Domain Keys <http://www.dkim.org/>. Yahoo and
   Gmail both score signed email higher.
   4. Dedicated and conditioned email sending IPs.
   5. Configure reverse dns
entries<http://en.wikipedia.org/wiki/Reverse_DNS_lookup>.
   Most of the “big boys” won’t accept mail from your servers if your reverse
   dns entries don’t match. You might need your IP provider to help with
   setting up these records.
   6. Enroll in feedback
loops<http://en.wikipedia.org/wiki/Feedback_loop_%28email%29>.
   We haven’t automated our parsing of feedback, but a daily / weekly review
   of feedback loop emails helps us know when there’s an unhappy user, or
   other problem. Too many complaints and you’ve got trouble.

 A problem we haven’t solved

By far the biggest cause of failed email delivery we see is due to bad
email addresses that were entered in to the system—problems like ‘
joe at gmal.com’ or ‘sue at yahooo.com’. By and large, these pass a regular
expression check for email addresses, but aren’t actually valid addresses.
There’s no perfect solution here, but we’ve been experimenting with
checking for valid DNS records or actually attempting to connect to the
mail server as part of the validation of an email address, and with
notifying people within the application when we aren’t able to deliver mail
to them.
A few tools

   - MX Toolbox <http://www.mxtoolbox.com> is a great site for doing a
   quick check on your mail servers and your customer’s mail servers.
   - Sender Score <https://www.senderscore.org/> is really a marketing tool
   for Return Path, but it can be used to get insight about how some of the
   “big boys” are scoring your sending IPs.
   - Postmark <http://spamcheck.postmarkapp.com/> offers a web tool
and APIto get the SpamAssassin score for a message, which can be
helpful for
   identifying things you can improve to boost delivery rates.