[Mailman-Developers] DNS and mailing lists -- an idea

Jay R. Ashworth jra@baylink.com
Sun, 25 Nov 2001 00:22:39 -0500


This may well be properly a proposal that is germane to *all* MTA's,
not just those which support MLM's, but it will probably help the
latter more, (and I hang out here :-), so I figured I'd bounce it off
you guys first.

I just got caught in a DNS crash.  My personal domain is hidden
primaried on a loop at the office of the guy I subcontract for (as is
his domain, and a couple other things, but that's not material).

He's closing office, and we're going back to working out of our houses,
and the ISP apparently yanked the loop a week and a half early.
Without notice.

I didn't notice it until the public secondary for my domain timed the
zone out and started returning SERVFAIL messages to people sending me
mail... in consequence of which, I didn't get any mail for 3 and a half
days.

Luckily, it was a slow week, thanks to Thanksgiving (:-), so I only got
200 messages to sort through when the majority of it came in this
afternoon -- but even that was only because I finally got ahold of the
secondary DNS server op, and had him plug in the last copy of my master
file (which he used to maintain -- luckily, he's a *good* op, he hadn't
nuked it).

The *reason* that that fix worked, and my having set up a new server
and fixed my zone glue *didn't*, of course, is those ugly things called
"caching servers", and that's why you're all listening to this
(assuming, of course, that you still are... ;-)

I've been told that it's good practice for heavily trafficked mail and
web servers to have local (ie: on the same machine) caching DNS servers
to speed up mailer DNS lookups, and therefore load the networks less.

It occurred to me, riding home tonight from a concert and doing the
After Action in my head from the whole fiasco, that there might be
something productive in a script that could crawl the pending mail log
looking for signs that some large batch of mail is pending for bad
reasons.  Caching accidents happen to everyone occasionally... and if
one happes to your list with 4000 people on AOL, it could get ugly.

If such a script could restart the associated name daemon, flushing
it's cache, it would save a lot of delay and concern for various groups
of people, I think.  Assuming you could create a good heuristic for
deciding when to dump the cache... which shouldn't be too hard, since
a false dump isn't too painful.

So, am I missing something obvious here?  Will an MTA *without* a
caching server actually be affected less (because it's making direct
lookup calls to the zone servers in question)?  It's an area,
admittedly, in which I'm a touch weak... which is why I'm asking
y'all.

Cheers,
-- jra
-- 
Jay R. Ashworth                                                jra@baylink.com
Member of the Technical Staff     Baylink                             RFC 2100
The Suncoast Freenet         The Things I Think
Tampa Bay, Florida        http://baylink.pitas.com             +1 727 804 5015

   "If you don't have a dream; how're you gonna have a dream come true?"
     -- Captain Sensible, The Damned (from South Pacific's "Happy Talk")