[Mailman-Developers] Easing the burden of upgrades

Barry A. Warsaw bwarsaw@CNRI.Reston.Va.US (Barry A. Warsaw)
Wed, 24 Jun 1998 18:39:40 -0400 (EDT)


We recently had an problem on python.org while upgrading to a new
release of Mailman that started me thinking about a better way.  I'll
describe what I think is the problem, sketch out a proposed solution,
and throw it out to y'all to discuss.  Ken is as much responsible for
the good ideas in this message as I am (blame me for the lousy ones);
thanks to him for sitting down and thrashing this out first.

The fundamental problem with a system like Mailman is that it is
extremely difficult to test.  The project is obviously not mature
enough to have much of a test suite (if any), and writing one that
tests all the interactions between MUA, MTA, Web browser, server,
Python, and Mailman will be daunting to say the least.  It'll be
fantastic when we have even the framework for such a beast, but until
then...

So most of our testing involves creating and managing little toy
lists, with us flogging the most noticeable features of Mailman to
make sure the common stuff hasn't broken.  The problem is that for
sites using Mailman in an operational system, flag day (i.e. the day
the upgrade to a new version occurs) can be pretty traumatic if we
missed something crucial, but peculiar about a site.  So I'm really
concerned with how to make life less stressful for the operations
folks who are relying on Mailman for their bread and butter.  New
features, fixes, etc. are strong incentive for those people to
upgrade, but the fear of breakage (resulting in thousands of angry
members) probably highly outweighs that incentive.  Can we make the
transition to a new version more controlled?

You could go with a low tech approach of installing new versions
temporarily with a different $prefix, using symbolic links to share
list databases and templates, and hacking /etc/aliases as lists are
converted to the new version.  The one insurmountable problem that I
see is the CGI URL.  You can't share two different CGI bin dirs
without exposing this to users through the list URLs.  This is IMO, a
showstopper; the most visible aspect of the system should be the most
stable.

Ken and I came up with the following architecture, and I'd like to see 
what y'all think:

From a site administrators point of view, every Mailman installation
has two `parities': a current parity and a future parity.  Every list
is associated with one of these parities; at any stable point in time
every list is under the current parity.  The system itself is
associated with a parity, also the current parity at stable time.

Now a new release comes out and Mailman automatically installs into
the future parity (more on this below).  None of the lists are
automatically switched though.  The site admin can switch the parity
of individual lists.  Let's say there's a command called `upgrade'.
So

    upgrade toylist

would switch `toylist' to the future parity.  The site admin could run 
with this for a while, and get all warm and fuzzy about the new
release.  He would then

    upgrade reallist1

and repeat the process.  Let's say he's upgraded three of his thirty
lists and now has a lot of confidence in the new version.  He then
does

    upgrade *

This converts all the lists to the future parity.  There's one more
twist however: the "system" itself is still running on the current
parity even though all the lists are running on the future parity.
One more command

    commit-parity

would now commit to the new release; the future parity becomes the
current parity and the current parity becomes the future parity.
Maybe this auto-upgrades any still current parity lists.  Data such as
the list databases and templates would live outside the installed
parity source code subdirs (more below).

If at any time before the commit-parity is run, the site admin gets
cold feet, he can

    downgrade reallistx

or

    downgrade *

to return the list or lists to the current parity.

Now, when Mailman is installed, it always installs to the future
parity, BUT ONLY IF ALL THE LIST PARITIES AND THE SYSTEM PARITY ARE
CURRENT.  If it ever notices that some lists are set to the future
parity, but the system is still at the current parity, Mailman refuses 
to install.

There would probably be a command to view the parities for lists and
the system.  I think the implementation would not be that difficult.
A single file containing the parity status for each list and the
system would be about the only database you'd need.  The installed
tree would change a bit.  You'd probably have two directories inside
$prefix, one that contains the current parity code and one that
contains the future parity code.  The list databases would live
outside these two trees, directly under $prefix.  List specific
templates would be moved out of the templates directory, into
$prefix/lists.

I'd love to get some feedback from people.  Is this really a problem
that needs to be solved?  Does this proposal solve the problem in a
useful way?  Is the abstraction clear?  Is this just total overkill?

-Barry