[Mailman-Users] mailman still wedges on FreeBSD

Justin Wells jread at fever.semiotek.com
Tue Sep 21 19:45:53 CEST 1999


I've upgraded to the most recent snapshot, and it's a big improvement: 
the wedged processes no longer cause a deadlock. They still get wedged,
and eat up my memory, but now the webpage continues to function, and 
mail gets through. 

This is much better, but I would like to have no wedged processes. 

They wedge in "select" state, along with a zombie, which is the
same as before. The difference now is that there are not an additional
10-15 processes wedged in "lockf" state--so it looks like at least
these wedged processes aren't holding onto the lock now. 

There's nothing in the log indicating anything unusual at that time. 

Here is what I see in my process table during a typical wedge:

  fever:~$ ps -auxwww | grep python
  daemon 16440 31.8  0.0     0    0  ??  Z    -         0:00.00  (python)
  daemon 17202 29.8  0.0     0    0  ??  Z    -         0:00.00  (python)
  daemon 16439  0.0  1.0  2952 1228  ??  I    11:48AM   0:00.31 /usr/local/bin/python /local/mailman/scripts/post webmacro
  daemon 17201  0.0  1.0  2956 1232  ??  I     1:21PM   0:00.31 /usr/local/bin/python /local/mailman/scripts/post webmacro
 
There's also two zombies present, in addition to the wedged processes.

Here's the files the younger wedged process has open:

  bash-2.03# lsof -p 16439
  COMMAND   PID   USER   FD   TYPE     DEVICE SIZE/OFF   NODE NAME
  python  16439 daemon  cwd   VDIR   0,131072      512      2 /
  python  16439 daemon  rtd   VDIR   0,131072      512      2 /
  python  16439 daemon  txt   VREG   0,131077   426448 255570 /usr/local/bin/python1.5
  python  16439 daemon  txt   VREG   0,131077    63652 182570 /usr/libexec/ld-elf.so.1
  python  16439 daemon  txt   VREG   0,131077   151057 277905 /usr/lib/libreadline.so.3
  python  16439 daemon  txt   VREG   0,131077    15084 277800 /usr/lib/libtermcap.so.2
  python  16439 daemon  txt   VREG   0,131077   115780 277779 /usr/lib/libm.so.2
  python  16439 daemon  txt   VREG   0,131077    12965 278069 /usr/lib/libdescrypt.so.2
  python  16439 daemon  txt   VREG   0,131077   583043 277814 /usr/lib/libc_r.so.3
  python  16439 daemon  txt   VREG   0,131077    13176 104686 /usr/local/lib/python1.5/lib-dynload/cStringIO.so
  python  16439 daemon  txt   VREG   0,131077    49516 104687 /usr/local/lib/python1.5/lib-dynload/cPickle.so
  python  16439 daemon    0u  PIPE 0xc720d540    16384        
  python  16439 daemon    1w  VREG   0,131076        0 158753 /var (/dev/wd0s1e)
  python  16439 daemon    2w  VREG   0,131076        0 158753 /var (/dev/wd0s1e)
  python  16439 daemon    3u  PIPE 0xc720dea0    16384        ->0xc720ee40
  python  16439 daemon    4u  PIPE 0xc720ee40    16384        ->0xc720dea0
  python  16439 daemon    5r  VREG   0,131079     3775 317843 /local/mailman/scripts/post
  python  16439 daemon    6u  VREG   0,131079    43749  24292 /local/mailman/logs/error
  python  16439 daemon    7u  VREG   0,131079    10390  24298 /local/mailman/logs/post
  bash-2.03# 


I would guess what's happened here is the process is deadlocked 
waiting waiting for the zombie (16440) on its pipe.

The events that took place are this:
 
After processing several messages to the list successfully, some
post at 11:48 wedged, apparently after delivering mail to part
(possibly all) of the subscribers. The next post to the list came
through at 1:21PM and also wedged, again after delivering mail to
part of the subscriber list.

I think that if I do nothing, all subsequent posts to the list will
leave a wedged process, after apparently working. If I wipe out the
wedged processes, subsequent posts will work without leaving anything
behind.. for awhile, then it will happen again.

What can I do to investigate this and find out what is going on?
I'm assuming that no-one can tell what's happening just based on 
this, or I would have got an answer last time. 

I'm running FreeBSD 3.2 with Python 1.54, with mailman CVS updated
from the archive and reinstalled last night.

Justin



On Fri, Sep 17, 1999 at 03:34:27PM -0400, Barry A. Warsaw wrote:
> 
> >>>>> "JW" == Justin Wells <jread at fever.semiotek.com> writes:
> 
>     JW> Sorry to be so persistent about this, but every 2nd day I've
>     JW> had to unwedge mailman, and I still haven't go an answer from
>     JW> this list.
> 
>     JW> platform: FreeBSD 3.2, Python 1.5.2, mailman 1.0
> 
>     JW> This sounds a lot like the locking problem described in
>     JW> README.BSD, however my version of python is up to date.
> 
> Try upgrading to the current CVS snapshot of Mailman.  It uses a more
> portable (and consistent) locking mechanism.
> 
> -Barry
> 
> ------------------------------------------------------
> Mailman-Users maillist  -  Mailman-Users at python.org
> http://www.python.org/mailman/listinfo/mailman-users




More information about the Mailman-Users mailing list