[Mailman-Developers] problem with view other subscriptions..

Harald Meland Harald.Meland@usit.uio.no
15 Jun 2000 02:56:41 +0200


[David Smead]

> Greetings,
> 
> I believe that the mv command is atomic.  Instead of unlinking a
> lock file, why not move it?

Both unlink(2) and rename(2) are (supposed to be) atomic system calls,
AFAIK.  The problem occurs whenever there is a stale lock, and the
need to break that lock is raised.

An example of how the race condition can manifest itself might help
explain:


Say that two processes, A and B, want lock L.  They both discover that
the lock is so old that it is stale.

First, process A breaks the lock.  Whether this is done by means of
unlink(2) or rename(2) is not relevant, the point is that after the
breaking there is no longer any file occupying the lockfile's position
in the file system.

Next, process A tries to get the lock, and succeeds.

Now the problem arises: Process B still thinks the lock is stale, and
breaks it.  Process A has lock attempt has already returned
successfully, so process A still thinks it is the lock owner.

Finally, process B tries to get the lock -- and succeeds.

Now, both processes believes they own the lock.  We should aim at
reducing the probability of such situations.


One thing which would (slightly) reduce the risk of race condition is
to put a long(ish) delay right after a process has broken a lock, so
that the process doing the lock breaking is less likely to obtain the
lock.  The idea is that such a delay would cause B's breaking attempt
to occur before A retries lock retrieval.

However, this would only help with race conditions involving only two
processes -- with e.g. three processes, process A could break, process
B get the lock, and then process C could break B's fresh lock.


Thanks for trying, though -- I would love it if we were able to come
up with a failsafe filsystem-based locking scheme (with support for
breaking stale locks).
-- 
Harald