[Mailman-Developers] Locking (oh no!)

David Smead smead@amplepower.com
Thu, 22 Jun 2000 22:00:53 -0700 (PDT)


Dan,

I exchanged a few emails with Harald Meland on the lock problem.  I've
used the fntl package under python with success, however, I understand
that mailman is using the link file system as an atomic command.  BTW I
read recently that the Linux kernel is going to change its behavior for
link in the next release, so that may be a problem waiting to arrive.

I've meant to take a look at the locking code, but after losing my disk
drive on my prime workbox I've been wrapped around the axle instead.
My first suggestion, which may already be the case, is to have one place
where locks and unlocks occur in the code.

But, since a process can always be killed, or die from who knows what, a
final solution has to be a `lock server', where a process that connects to
the server can request a lock or unlock, and a granted lock has an
expiration time after which it is released and a subsequent call to unlock
from the owner is ignored.  As long as stale locks are cleaned when the
system is rebooted, (when the lock server is started or restarted), then
things should be about as robust as possible.

Performance will take a hit with the server approach, but doesn't it
always when robust behavior is demanded?

If I get my workbox here back up to speed tomorrow, I'll try to look into
the lock problem some more.


Sincerely,

David Smead
http://www.amplepower.com.
http://www.ampletech.com.

On Thu, 22 Jun 2000, Dan Mick wrote:

> So, with the latest CVS (really, I swear this time), it seems
> trivial for me to create a locking deadlock.  This has happened
> four times in the last five minutes.  
> 
> Scenario:
> 
> 1) go to admin/ link
> 2) while admin page is downloading, go to admindb/ link (on same
> page)
> 
> The admin process dies, but still holds the lock; the admindb process
> spins waiting for the lock.
> 
> Granted, it's easier for me than some to get to admindb while admin
> is still downloading, but...
> 
> Is something about the browser/webserver combination killing admin
> in such a way so that its lock cleanup stuff isn't working?
> Who's supposed to clean up the lock if the process dies an early
> death for some reason?  (Or is this where the lock has to be 
> broken?)
> 
> 
> 
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers@python.org
> http://www.python.org/mailman/listinfo/mailman-developers
>