[Mailman-Users] Big problems with stale lockfiles on large list...

Sat Apr 28 00:19:32 CEST 2001

On Friday 27 April 2001 13:46, Gergo Soros wrote:
> > Running Mailman-2.0.1, with Python 1.5.2 on a RedHat 6.2 machine along
> > w/Apache-1.3.14 and Sendmail-8.11.0, and am having some serious grief
> > with stale lockfiles on one of our lists.  List contains ~60k
> > addresses on it, and has constant traffic to the WWW administration
> > pages.  Not high volume for sending msgs though (only one or two a
> > day) as its a
> > broadcast/announce list.
>
> We also experienced similar problems in the past with large lists. We
> found that if the admin is accessing the long and slow-loading members
> admin pages, and does not wait for the page to complete (e.g. clicks one
> of the admin links or hits refresh or stop) the lock will remain. As a
> temporary solution we have instructed our admins to wait for all pages
> to completely load, no stale locks since then. This is on a Cobalt RaQ4i
> (Redhat) with Apache 1.3x. Hope this helps,

Is probably related to what we're experiencing, but I'm finding that even 
just due to the volume of hits on the "subscribe" page, that we're having 
this problem.  While testing, I made sure that we had _no_ accesses to 
admin pages, and that all of the "subscribe" hits that were coming through 
waited for the complete response and didn't time out or have the "stop" 
button pressed.  Even in this scenario, I still ended up with stale locks 
lingering around.

>From the info that I'm seeing in the archives of this list, I'm getting 
the impression that the general response is "remove the stale lock files". 
 Although this might work in freeing up the WWW forms so that they can run 
again, it doesn't really seem to address the issue of _how_ the locks are 
ending up stale/lingering in the first place.  For now I'm able to go in 
and manually remove the locks, but at this point I'm doing this at least 
once an hour (if not more), and while its stalled I've got people trying 
to access the list that just can't get in (which isn't good).

Would it be correct to say that if the CGI process dies for some unforseen 
reason (e.g. Apache kills it off because the user pressed the "stop" 
button or the HTTP connection timed out), that the lock from that process 
gets left around as a lingering lock?

-- 
Graham TerMarsch