[Mailman-Users] Mailman no longer working or working very slowly

Mark Sapiro mark at msapiro.net
Thu Feb 18 00:46:28 CET 2010


Steven Jones wrote:
>
>Is there a way forward? I have to hand this on as I have to fix something else...will come back to it in about 4 hours from now....
>


OK, As you see, I too have been off line for a while.


>
>-----Original Message-----
>From: Mark Sapiro [mailto:mark at msapiro.net]
>Sent: Thursday, 18 February 2010 3:56 a.m.
>To: Steven Jones; mailman-users at python.org
>Subject: Re: [Mailman-Users] Mailman no longer working or working very slowly
>
>Steven Jones wrote:
>>
>>Yesterday after 5 years of operation ou mailman application has "died" it seems to be barely running taking 3 or more hours to process lists...previous load was <0.2 now its around 1 with a python process absorbing 1 CPU constantly.
>
>
>Which python process - i.e.which qrunner.
>
>============
>how do I identify such a "qrunner?"
>============


  ps -fwu mailman

will list all the Mailman processes and the commands that invoked them
which include the qrunner names.


>Also look in Mailmans qfiles/* directories to see which one has a large
>number of messages.
>
>==========
>its empty,
>
>[root at vuwunicosmtp004 mailman]# cd /var/spool/mailman/
>[root at vuwunicosmtp004 mailman]# ls -al
>total 12
>drwxr-xr-x    3 root     root         4096 Sep  6  2005 .
>drwxr-xr-x   16 root     root         4096 Feb 17 15:12 ..
>drwxrwsr-x    2 root     mailman      4096 Mar 22  2007 qfiles
>[root at vuwunicosmtp004 mailman]# cd qfiles/
>[root at vuwunicosmtp004 qfiles]# ls -al
>total 8
>drwxrwsr-x    2 root     mailman      4096 Mar 22  2007 .
>drwxr-xr-x    3 root     root         4096 Sep  6  2005 ..
>[root at vuwunicosmtp004 qfiles]# ls -al
>total 8
>drwxrwsr-x    2 root     mailman      4096 Mar 22  2007 .
>drwxr-xr-x    3 root     root         4096 Sep  6  2005 ..
>[root at vuwunicosmtp004 qfiles]#
>
>==========


Then your qfiles are somewhere else. Even if the queues are empty,
there will still be a directory per queue

archive  bad  bounces  commands  in  news  out  retry  shunt  virgin

(well maybe not 'bad' but all the others)


>
>>We are running on RHEL3-32bit and the errors are,
>>
>>============
>>
>>Error log for Mailman (vuwunicosmtp004.vuw.ac.nz:/var/log/mailman/error)
>>
>>says:
>>
>>
>>
>>RuntimeError: maximum recursion depth exceeded
>>
>>
>>
>>Feb 17 07:48:26 2010 (13368) SHUNTING: 1266281245.229378+bd3f1d42e27ad38cf532b809460a0b0a8aef00e7
>>
>>
>>
>>The last number is a message ID in Mailman queue
>>
>>/var/mailman/qfiles/shunt/1266281245.229378+bd3f1d42e27ad38cf532b809460a0b0a8aef00e7.pck
>>============
>
>
>How many of these are there?
>
>========
>seems a lot.....
>
>The error log is significantly bigger because of them,
>
>[root at vuwunicosmtp004 mailman]# pwd
>/var/log/mailman
>[root at vuwunicosmtp004 mailman]# ls -l
>total 81400
>8><-----
>-rw-rw-r--    1 root     mailman  79321195 Feb 18 08:05 error
>-rw-rw-r--    1 root     mailman      6997 Feb 13 16:42 error.1
>-rw-rw-r--    1 root     mailman     11798 Feb  7 04:02 error.2
>-rw-rw-r--    1 root     mailman      7142 Jan 31 03:19 error.3
>-rw-rw-r--    1 root     mailman      3313 Jan 24 01:54 error.4
>8><-----
>========
>
>
>This may be unrelated. Is there a
>traceback with the above error? What is it?
>
>===========
>this?
>===========
>File "/usr/lib64/python2.2/copy.py", line 186, in deepcopy
>    y = copierfunction(x, memo)
>  File "/usr/lib64/python2.2/copy.py", line 283, in _deepcopy_inst
>    state = deepcopy(state, memo)
>  File "/usr/lib64/python2.2/copy.py", line 186, in deepcopy
>    y = copierfunction(x, memo)
>  File "/usr/lib64/python2.2/copy.py", line 246, in _deepcopy_dict
>    y[deepcopy(key, memo)] = deepcopy(x[key], memo)
>  File "/usr/lib64/python2.2/copy.py", line 186, in deepcopy
>    y = copierfunction(x, memo)
>  File "/usr/lib64/python2.2/copy.py", line 219, in _deepcopy_list
>    y.append(deepcopy(a, memo))
[...]
>RuntimeError: maximum recursion depth exceeded
>
>Feb 18 08:05:23 2010 (22075) SHUNTING: 1266281265.9402289+108ada57b8e680da231d7a75a9bf50e08bbce3fe
>[root at vuwunicosmtp004 mailman]#


Yes, that. Something is really hosed. Have you tried just restarting
Mailman?

What's at the start of that traceback leading up to the first call to
deepcopy?

This could also be a some kind of list object corruption leading to a
circular reference. Hard to say without more information.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list