[Mailman-Developers] FYI -- problems with my new install...

Chuq Von Rospach chuqui@plaidworks.com
Sun, 29 Oct 2000 16:22:57 -0800


This is more a cautionary tale then a real problem, but it brings up 
a couple of issues to chew on.

I started having major problems with mailman when I brought 
lists.apple.com live (I'll have more to say about this site later, 
since there are a couple of things I need to look into more fully 
before I core-dump on that install...)

The main problem was that I was getting huge numbers of messages to 
the -admin addresses that were blank. Zero. First, I thought it was a 
corrupted list database. then I thought it was a corrupted request 
database. Then I thought it was a corrupted message in the qfile dir 
that was causing corrupted messages to multiply. Then I didn't know 
what to think, so I just started taking the syste apart piece by 
piece and running qfile messages through ONE AT A TIME to see where 
the probelm came from. My favorite way ot spend a weekend, that's for 
sure... (grin)

End result -- one minor configuration error in the mailer. One of the 
hostnames I use wasn't set up as a local name, so sendmail kept 
erroring out trying to talk to itself in one special case. But the 
bigger issue was -- the system was doing exactly what I told it to do.

I use demime to strip incoming e-mail to the text part. this works 
really pretty well. At some point, however, instead of just attaching 
demime to the posting and -request address, I also added it to the 
admin address.

Most incoming bounces now are in MIME format. End result: they come 
to the -admin address, the mime gets stripped, and an empty message 
results. Since it's no longer a bounce message, it gets sent to the 
admin. load in a fairly dirty subscriber list and start sending 
messages -- and you get 10K blank message in your mailbox in the 
morning.

Cautionary note: after you double-check all your configuration files 
for problems, make syure you double-check all the custom stuff you 
did that you did it right. The "good" thing about this particular 
problem is that while I was busy mailbombing myself and my admins all 
weekend fighting this beast, to the end user, the site worked fine... 
If you HAVE to have problems, problems that arne't visible to the end 
user are preferable...

But it brings up a couple of issues I see with qrunner.

first, it seems like qrunner re-stats the qfiles dir and reloads its 
idea of what needs to be run. This creates a problem when you have 
lots of messages, since it's not processing things FIFO --  I found 
that some older messages were simply NEVER being run, because however 
qrunner was choosing messages out of qfiles, it wasn't choosing them. 
On a busy system, this can be a problem. I suggest instead that 
qrunner start up, grab the list of messages to run, and run them, 
oldest first, then exit. Let the next Qrunner handle what comes in in 
the meantime. That way, things are run more of a FIFO, and you don't 
get into the lost-stepchild queue file problem.

second, qrunner isn't good at letting me know what it's doing. If I'm 
trying to figure out what it's processing, it's not telling me. When 
trying to debug a possible corrupted file, that's a real hassle. It'd 
be nice if it put something in qfiles that told me what fileset it 
was working on, just so I can whack at it if I need to.

all in all, it's been a, um, fun weekend. But I now have demime doing 
what it's supposed to be doing, and it is working a LOT better. And 
it explains (in retrospeect) why, knowing the subscriber lists were 
dirty, I wasn't seeing very many bounces... (grumble. That should 
have been a hint. Hindsight is fun...)

*now* it's stable... (I think)

-- 
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

Be just, and fear not.