[Mailman-Users] Is it possible that a list loses its members?

Mark Sapiro mark at msapiro.net
Tue May 20 22:52:46 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sebastian Hagedorn wrote:
| -- Mark Sapiro <mark at msapiro.net> is rumored to have mumbled on 20. Mai
| 2008 10:52:34 -0700 regarding Re: [Mailman-Users] Is it possible that a
| list loses its members?:
|
|> | I'm at a loss to explain what might have happened. Any ideas?
|>
|>
|> I can't explain what happened, but the only way I can see something like
|> this happening is if there was a failure in list locking.
|
| Right, that's what we thought ourselves. The member adding is logged and
| we verifed that the notification mails got sent, but it appears that
| both those actions occur *before* the members are actually added.


Actually, no. The members are added before the log is written and the
notices sent. See the ApprovedAddMember method in Mailman/MailList.py.
What has to happen is after the members are added and the config.pck is
saved, some other process has to overwrite the config.pck with one that
doesn't have the added members (at least assuming you don't have a
custom member adaptor that keeps membership elsewhere).


|> I.e. if some
|> other process was updating the list at the same time that add_members
|> was running and the other process locked the list first, then
|> add_members locked the list, added the members and saved and unlocked
|> the list, and finally the other process saved the list without the
|> members.
|
| The other process would probably(?) be the config_list command. Some
| more detail. We found the following just before the list was created:
|
| May 09 14:54:25 2008 (10586) admin.py access for non-existent list:
| listname
| May 09 14:54:25 2008 (11122) admin.py access for non-existent list:
| listname


These are apparently from attempted web access to the admin interface
for the list before it is created.


| The member adding happened just two minutes later:
|
| May 09 14:56:29 2008 (13495) listname: new xxx.xxx at uni-koeln.de,


Two minutes is forever in terms of what we're talking about. Even two
seconds is a long time.


| Now the script we use to create new lists does this:
|
| new_list();
| write_config_file();
| config_list();
| add_members();
| demod_members();
| send_mail();
|
| new_list() calls newlist, config_list() calls config_list, and
| add_members() calls add_members - not much of a surprise, I guess.


Are these synchronous calls?  What would have to happen is something
like the following:

config_list is called. It instantiates the list with a lock. It then
reads its input and updates its in memory copy of the list.

While this is going on, add_members is called. It instantiates the list
with a lock., but somehow this fails to wait for config_list to
relinquish its lock. add_members adds the members and saves and unlocks
the updated list.

Then config_list finishes processing and saves its updated list which
doesn't have the members.

This scenario is hypothetical. It depends on list locking to fail
somehow at a low level. It can't result from a coding error in one of
the processes as long as they use standard list methods because a
process can't lock a list without obtaining/refreshing the latest list
data, and a process can't save a list that isn't locked.

Another possible cause is if there was a temporary read error on
config.pck by the process immediately following add_members causing it
to fall back to config.pck.last, but this would be logged in Mailman's
error log.

What do demod_members() and send_mail() do. Is there anything there that
manipulates config.pck other than by standard MailList.MailList methods?


|> I don't think this can result from a simple failure of some process to
|> lock the list since an unlocked list cannot be saved, and locking a list
|> refreshes the data. If there is an issue, it has to be in the locking
|> mechanism itself, but this seems sound and there are no known issues
|> with this, although coincidentally (and I'm sure it's just a rare and
|> strange coincidence) I saw an apparent locking failure last week. See
|> <http://mail.python.org/pipermail/mailman-developers/2008-May/020190.html
|> >.
|>
|> So now we have two reports of possible locking failures. We'll have to
|> keep watching.
|
| There were instances before where listowner talked of strange phenomena,
| but there never was any proof. I admit I never believed them ...

- --
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)

iD8DBQFIMzoeVVuXXpU7hpMRAmGPAJ9gBdnMFqUsIxqre3iIkEZR9UaykwCgr5Xb
cNC/t7HWoeRzG777k5kGmyM=
=s98Q
-----END PGP SIGNATURE-----


More information about the Mailman-Users mailing list