[Mailman-Developers] Several big changes slated for 2.1.5 (long)

Mon Feb 9 11:30:41 EST 2004

I have some rather big changes ready for MM2.1.5 that I wanted to
describe and get feedback from you.  While I have this stuff working and
ready to be checked in, we will definitely need some beta testing before
unleashing on the world.  I hope you'll be able to help with that.  I
think these changes are important enough to put into 2.1 rather than
waiting for any future major release.

The first big change is the most externally visible one.  I believe the
current scheme for bounce processing is unusable in today's world of
MyDoom sender forgeries, anti-virus front-ends on remote SMTPs and the
like.  On python.org we've seen many cases where people are getting
unexpectedly bounce disabled, even though they receive all legitimate
traffic to a mailing list.  What's happening?  It's simply that, while
we have spam and virus defenses in place on python.org, some crap still
gets through.  Imagine I'm on a busy list and I forward barry at python.org
through my home ISP, which has a virus and spam detector on that
address.  Now say that list gets 100 msgs/day and 1% of those messages
are false negative spams.  The message gets onto the list, but my ISP
catches them and rejects them, which triggers a bounce and thus my
score's just been increased by 1.  I only need one sneaky spam per day
to get me bounce disabled, even though most of the mail is legit and
gets through.

So I've implemented a revised scheme that we've talked about before,
based on what I believe ezmlm does.  All the bounce parameters are still
in effect, however when a member's bounce score reaches the threshold,
we now send a specially prepared probe message containing a VERP'd
sender with an unguessable token.  When we send the probe, the member's
bounce score is reset.  If the probe bounces, then we disable the member
and do the normal reminders.  If the probe doesn't bounce the member
would stay enabled and their score starts accumulating from zero again. 
A benefit of this rewrite is that we can include in the probe, the last
bouncing message as a sample to the user so they can start to get a clue
as to why they're getting bounce scored.

This change has prompted an internal rewrite of the pending database. 
Previously the entire site had a single pending.pck file for all actions
requiring confirmation by the user -- held subscription cancellation,
subscription, unsub, and change of address confirmations, and bounce
re-enable confirmations.  This was a problem for several reasons,
including that every list had to block on acquiring the lock for this
file.

Now, each list has its own pending.pck file and while the list lock must
be acquired to update this database, at least this doesn't block other
lists from doing things.  The upgrade script attempts to migrate the
single shared pending.pck file to the individual list files, but the
conversion is difficult because the associated list is not stored with
most of the records in that file.  I do my best, but it's possible that
some pending actions may get lost.

The other big change is a purely internal one, but it may affect the
work flow for some admins.  I've changed the qfiles file format so that
only one file is used per message.  Previously we had one file for the
message and one file for the metadata.  Now, a single pickle file is
used with the first object in the pickle being the message object and
the second being the metadata dictionary.  This approach has several
advantages.  The code is simpler, there are no race conditions
opportunities, we can't possibly have orphaned data files, and probably
most importantly we now only need half the inodes we did before.  In
addition, I've decided to turn on fsync'ing for this new qfile all the
time, so storage should be more reliable too.

The downside is that I've removed the ability to set a METADATA_FORMAT. 
We use Python pickles and that's it.  I doubt many people have been
using (or were even aware of) the alternatives, although I've had the
occasional bug report on them so I know that number is non-zero.  The
other downside for some people is that the behavior of
SAVE_MSGS_AS_PICKLES=False will change.  When that non-standard setting
is used, we'll still write everything to a pickle file, but we'll use
text pickles instead of the more efficient (but not human readable)
binary pickles.  Also, we'll write the message object as a pickled
string instead of a pickled object.  Again, this will be less efficient
because we'll have to parse the message every time it's dequeued, but
this option will still allow people to edit queued messages with a
normal text editor, albeit less conveniently.

I think this trade-off is worth it.  The upgrade script will combine any
existing qfiles so you won't have to clear your queue when upgrading. 
To be safe, you /will/ have to stop Mailman, your MTA, and your web
server before upgrading (but this was always recommended practice).

I intend to commit these changes to CVS within a week and will probably
release a 2.1.5 alpha.  This will touch a lot of files, but it will
hopefully make the system more efficient and usable.  Once this is done
I hope to have more time to start addressing other bugs and issues in
the 2.1 branch.

Again, when everything's checked it, please test things out as much as
possible, especially if you are using older Python versions.  I've
tested primarily with Python 2.3.3 but I was careful not to use any
feature that isn't supported in Python 2.1.3.  I might have missed
something though.

-Barry