[Mailman-Developers] Faulty Member Subscribe/Unsubscribes

Mark Sapiro mark at msapiro.net
Thu Sep 29 22:06:46 CEST 2011


Andrew Case wrote:
>
>Traceback (most recent call last):
>  File "/usr/mailman/bin/add_members", line 258, in <module>
>    main()
>  File "/usr/mailman/bin/add_members", line 238, in main
>    addall(mlist, nmembers, 0, send_welcome_msg, s)
>  File "/usr/mailman/bin/add_members", line 135, in addall
>    mlist.ApprovedAddMember(userdesc, ack, 0)
>  File "/usr/mailman/Mailman/MailList.py", line 948, in ApprovedAddMember
>    assert self.Locked()
>AssertionError


The above is definitely a problem, but I can't see how it can occur
unless there is some race condition at the level of the file system.


[...]
>There was a locking issue with testlist2 foo4, which is fine since it
>doesn't report back as successful.  But you'll also notice that foo10
>wasn't listed as a subscriber even though it appeared as though that
>subscribe was successful.


Yes, I see that and both are cause for concern. I am more concerned
about the AssertionError with testlist2 foo4, because I can't see how
that can happen without some file system anomaly.


>Here's some errors on the very next run where I'm subscribing 10 people to
>each list as well:
[...]
>That's my (1-5%) failure.  But when I also comment out the following:
>
>>>             #if mtime < self.__timestamp:
>>>             #    # File is not newer
>>>             #    return None, None
>
>It seems to work each time (I ran 3 tests in a row, all with expected
>results).


The code above is a decision about whether we need to reload the list
object from the file system based on the file system time stamp vs.
our internal time stamp.

The original code said "if mtime <= self.__timestamp:". Since these
time stamps are in whole seconds, that test (to skip loading) could
succeed even if the file time stamp was a fraction of a second newer
than the internal time stamp. Thus, the bug.

The fix is to make the test "if mtime < self.__timestamp:" meanin we
only skip loading if the file time stamp is strictly less than the
internal time stamp, but in your case, if the clock on the MTA/qrunner
machine is a bit faster than that on the machine running the script,
the internal time stamp of the qrunner process could be a second ahead
of the file time stamp set by the add_members process even though on
an absolute scale it is older.

So, I think we're concerned here about clock skew between the machines,
and in that case, commenting out the code completely as you have done
makes sense.

I have filed bug reports at
<https://bugs.launchpad.net/mailman/+bug/862675> and
<https://bugs.launchpad.net/mailman/+bug/862683> about these issues in
preparation to fixing them.

Now I don't know whether to stick with the "if mtime <
self.__timestamp:" test which will work on a single server or to
reload unconditionally as you have done which seems to be necessary in
a shared file system situation with possible time skews.

As far as the AssertionError is concerned, I don't know what to make of
it. It appears to be a file system issue outside of Mailman, so I
don't know how to deal with it. I think the code in Mailman's LockFile
module is correct. If you are willing to try debugging this further,
you could set

LIST_LOCK_DEBUGGING = True

in mm_cfg.py and restart your qrunners and try to reproduce the
exception. This will log copious information to Mailman's 'locks' log
which may help to understand what happened.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Developers mailing list