[Mailman-Users] After migrating list to new server, config.pck database gets truncated by web UI

Thu Dec 29 11:16:47 EST 2016

(Note PS at bottom!)

Hi.  I'm prepping to migrate a bunch of lists (one at a time, due to
huge number of lists and huge size of archives) from one server to
another, and I've hit a snag with the first list I'm trying.  After
migrating the list (as described below), I can go to the lists admindb
page on the new server and get the list of pending requests that was on
the old server, but immediately the request database gets truncated.
(It stays a valid pickle file, it just gets all the requests emptied out
of it, so the file itself is much shorter but non-zero length.)

This happens *when I load* the admindb page, not when I submit the form.
 That seems really weird to me, since I'd expect a page load would only
read the pickle, not write it.

The old server is running Debian and Mailman 2.1.13 (from the Debian
package).  The new server is running Ubuntu and Mailman 2.1.16 (from the
Ubuntu Trusty package; we need to run Trusty for now for complex and
uninteresting reasons; I'd rather run 2.1.18, and may look into running
that on Trusty once I get the basic migration issues resolved).

Relevant UIDs and GIDs (www-data:www-data and list:list) are the same on
both systems.

Short version:  I rsync -aSHov /var/lib/mailman/lists/$listname/
new-server:/var/lib/mailman/lists/$listname and similarly copy the
public and private archives (preserving symlinks as needed).
check_perms on both systems reveals similar errors which look cosmetic
(things like rotated logs, temporary directories where I've copied
things, and the like), but I haven't yet let it run to completion
because of the volume of our archives.  Then I change host_name via the
web interface and m.web_page_url interactively with withlist (using
fix_url seems not to work when changing http: to https:) and m.Save().

One *possibly* relevant detail is that the new host doesn't currently
have a valid certificate.  (It's using the old host's cert, and I
manually allow the exception in my web browser for testing.)  But for
Mailman 2, the only http{,s} traffic should be sent from my browser, right?

This kind of has the feel of a permissions problem, but clearly the CGI
scripts can read from and write to the request.pck database.  (And
changes to the list config data in config.pck seem to be working
normally.)  As I said, check_perms hasn't run to completion yet because
it's plowing through the (already pre-rsync'ed) archives, but it got
through the things in /var/lib/mailman/lists and didn't find anything
wrong with this list.

There's nothing interesting in the Mailman logs (which Debian/Ubuntu put
in /var/log/mailman), and the only thing in the Apache error logs is a
warning that the cert it has configured doesn't match its hostname.

Anybody have any ideas?

Jay

PS -- I composed this all last night.  Today, the behavior has changed:
This morning, a new message was received by the list (forwarded from the
old list server to the new list server, and added to request.pck on the
new server by the new Mailman installation).  Now, when I load the
admindb page, the old requests (which were in the request.pck copied
from the old server) are all immediately thrown away (although displayed
in the admindb form) but the new request which came in this morning
remains.  So it kind of looks like something about the old requests
causes the list to think they're invalid and discard them when it loads
them.  I initially saw this behavior with "require_explicit_destination"
on and "acceptable_aliases" empty, but turning off
"require_explicit_destination" and putting just the local part of the
list address in "acceptable_aliases" doesn't make any difference.

-- 
Jay Sekora
Linux system administrator and postmaster,
The Infrastructure Group
MIT Computer Science and Artificial Intelligence Laboratory