[Mailman-Developers] about qrunner and locking

Thomas Wouters thomas@xs4all.net
Fri, 8 Dec 2000 09:36:11 +0100


On Thu, Dec 07, 2000 at 05:46:26PM -0800, Marc MERLIN wrote:
> On Thu, Dec 07, 2000 at 04:22:34PM -0800, Marc MERLIN wrote:
> > and the fact that 3 mails (out of 2000) didn't make it to my mailbox.

> Oh, I get it,  I use the exim stat config.db to look  for a list config, and
> exim happened  to stat for the  file exactly when mailman  deleted config.db
> and replaced it with config.db.last I guess.
> Can mailman ensure that config.db is always here? (I suppose mv isn't really
> atomic, is it?)

Over NFS, almost nothing is atomic. And even if you do grab an atomic
operation, you can get ....

> (IOError :  [Errno 116] Stale NFS file handle:
> '/var/local/mailman/lists/test/config.db')

I'm afraid that there isn't a good solution to your problem, right now. In
all honesty, and I say this with all my professional years of experience in
this area, NFS sucks large granite elephant testicles through a very thin
straw. (To butcher a Pratchett quote, "NFS is like a vampire; it bites, it
sucks, and it leaves you lifeless")

Which is really a pity, since Barry did a lot of work to get NFS locking
right. But that isn't going to help much unless the other parts of Mailman
can also handle this properly as well. And replacing a file by another is a
very tricky operation, over NFS. It's entirely OS-dependant (or rather
NFS-implementation-dependant) what will lead to a stale NFS handle. Some
OSes do it if a file is moved or deleted (or replaced) and another process
still has the file open. Some do it when they have the file cached. Some do
it when the moon's full, or at least something I haven't been able to figure
out :)

I was able to get NFS-locking to work properly, but I was only locking on
one machine, because I didn't have a machine to spare to run the
web-interface. When you work with different machines, you can get all kinds
of problems with attribute-caching and data-caching and what not. Not to
mention OS bugs, which pop up especially under heavy load. I've seen quite a
few of those, too. At work here, we're really hoping NFSv4 will be
universally adopted, though it'll probably bankrupt our supplier of black
goat's blood and headless chickens. (But hey, there's still SCSI!)

Probably the best solution is to write a network daemon to do the locking &
config info, rather than rely on NFS and locking over NFS... I'm not sure
howmuch work that is, though. Actually, maybe an even better solution, and
probably about as much work, is to put all the mailman config stuff into a
separate database, and just allow connections from several points. 

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!