[Mailman-Users] Problem with archrunner using large %'s of cpu (read faq & archives)

Scott Lambert lambert at lambertfam.org
Fri Oct 31 21:52:34 CET 2003


On Fri, Oct 31, 2003 at 09:40:11AM -0500, Jon Carnes wrote:
> On Fri, 2003-10-31 at 09:26, Jay West wrote:
> > I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the port. MTA
> > is sendmail 8.12.8p1
> > 
> > Very frequently I will see the ArchRunner process using 99+ % of cpu. I have
> > searched the archives and found lots of messages about qrunners using large
> > percentages of cpu, but they all seem to talk about the fixes being related
> > to actual mail processing (sendmail), not archRunner. I am assuming that if
> > the problem was mail delivery or reception I would be seeing the large cpu
> > use on a different qrunner process. My issue is specific to the archrunner
> > process which I don't find much on in the archives/faq.
> > 
> Well you've pegged it.  That was a bug in version 2.1.2 which is fixed
> in 2.1.3.  The patch for 2.1.2 should still be available - you could
> probably patch your running system and just leave it at that (an upgrade
> will bring the patch in anyway).

I still see this problem with Mailman 2.1.3 for a high-volume list.

  PID USERNAME PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
66428 mailman   64   0   168M   147M CPU1   0 376.7H 99.02% 99.02% python2.3

That's the archiver process.  There are 1318 messages in the archive
queue...

12:00:28 Fri Oct 31 # truss -p 66428
break(0x114f6000)                                = 0 (0x0)
break(0x1302c000)                                = 0 (0x0)
break(0x114f8000)                                = 0 (0x0)
break(0x13030000)                                = 0 (0x0)
break(0x114fa000)                                = 0 (0x0)
break(0x13034000)                                = 0 (0x0)
break(0x114fc000)                                = 0 (0x0)
break(0x13038000)                                = 0 (0x0)
break(0x114fe000)                                = 0 (0x0)
break(0x1303c000)                                = 0 (0x0)
break(0x11500000)                                = 0 (0x0)
break(0x13040000)                                = 0 (0x0)
break(0x11502000)                                = 0 (0x0)
break(0x13044000)                                = 0 (0x0)
break(0x11504000)                                = 0 (0x0)
break(0x13048000)                                = 0 (0x0)
break(0x11506000)                                = 0 (0x0)
break(0x1304c000)                                = 0 (0x0)

Once I kill off the mailman queue runners and clean up the several lock
files for this mailing list, it runs just fine and manages to empty the
archive queue.

Two days worth of mailman cron jobs were still stuck in the process list.

Supposition: Maybe they were blocked by the list's lockfile?

So, it seems that the archRunner process went off the deep end somewhere
between two and three days ago.

I have the htdig patches for 2.1.3 installed.  Which might be germane...

-- 
Scott Lambert                    KC5MLE                       Unix SysAdmin
lambert at lambertfam.org      





More information about the Mailman-Users mailing list