[Mailman-Users] Postfix/Mailman breaks (somewhat) upon sending out to list of 300+ users

Ted M Harapat ted-mailman at mob.net
Fri Nov 9 21:04:11 CET 2001


Hi Jon. 

(These emails always get so damned long, but its hard to tell in details whats 
going wrong or how to fix anything otherwise. Sorry.)

Well.... my father entered them into the list. And he did it through the web 
inteface. I think thats considered manually. I don't know what the 54rd one is 
but I didn't look because the list was entered alphabetically and when it sends 
out, mailman (and eventually postfix) sent them out in chunks based on domain 
name (as all good MTAs should). And those domains were alphabetically all over 
the list (since it wasn't sorted by domains). 

As for the system. Its brad new (sort of). Its a brand new linux install (of a 
current distro released less than 2 months ago) on a old Dual Pentium Pro 200 
box with over 75GB of disk (almost all using ReiserFS), only abuot 7GB being 
used right now. And over half the disk (especially where the system files 
reside at) is UltraWide SCSI 10k rpm. Memory is at 320MB with 256MB swap. The 
system runs lots of services but doesn't even come close to running out of CPU, 
disk space, I/O, memory, or administrator patience (most of the time).

Here's what else I've come up with. (And I wish more people would talk about 
fixes to lists after they post questions and then figure it out and never 
report back.) This has to do with the qrunner lock files. It appears that like 
so many others, the Mailman tool (or something) messes up something and it 
creates a lockfile in /home/mailman/locks that corresponds to a qrunner process 
running. And you can't process anymore queue till that's completed. 

So you can do one of two things... wait until it finishes which could take 
close to forever. I never have waited it out. Or you can kill it manually and 
delete the qrunner lock files. Then rerun qrunner. Sometimes it processes the 
smaller (non 300+ user) lists and those go out, but the remaining 257 users of 
my dad's list are still in those hard to read qfiles. Somehow I know that's 
slowing up the system and so those qrunner processes never run and the lock 
files never go away. 

So I stayed up much later that I should have last night and here's what I did:

I first adjusted the smtpd_recipient_limit variable in /etc/postfix/main.cf 
from 100 to 1000. That didn't appear to help. So then I set root's cron to run 
every 5 minutes to rerun the Postfix supplied "/etc/rc.d/init.d/postfix 
restart" command. This reloads everything and seems to allow that qrunner 
process to complete(?) or just die. But seeing as the lock file goes away and 
it restarts the next minute with cron calls qrunner, there's a small chance 
that it will process mail going to the other smaller lists on my server. (This 
is part of the mystery - how does restart postfix release that qrunner file to 
send out it's mail finally!?) So at that point I was tired and wanted sleep. So 
I went to bed and left root cron doing that restart. And in the morning all 
mail going to the small lists (which I'm subscribed to) went out and I received 
them all. That was my plan, I was happy. I was hoping someone would reply to my 
messages with some magical fix for my dad's list.

Then my dad called me all excited this morning saying that both emails to his 
list of 300 went out to the remaining 257 list members (thats 514 emails 
total). But he said they didn't go out till nearly 7am. And I hadn't changed 
anything since around 1:30am. So it took 5.5 hours for it to process all of 
that?! If so, was it that just that I was stopping that stuck qrunner process 
every five minutes or was it a combination of that plus the new 
smtpd_recipient_limit variable that maade it go? 

Oh, and Jon, I did find your shell script (the one about Jeff B and a 
misconfigured browser) in a list archinve with the for loop showing how to 
delete and kill the qrunner locks and processes. I touched up some of the 
syntax for my OS and ls output. But that didn't work for me. It just seemed to 
never process anything. Perhaps I was impatient.

So.... I'm still stuck with qrunner locking files even though I do have this 
temporary work around. Oh, and the lock files are gone now that my entire list 
has proccessed the qfiles. I imagine that I could stop restarting postfix so 
often. 



Living with a mysterious fix,

-ted



Quoting Jon Carnes <jonc at haht.com>:

> Just out of left field, but when you put in the mailing addresses for
> your
> dad's list, how did you do it? Manually, or did you feed them in via
> "add_members"?  Check the list and see what the 53rd and 54th address
> are,
> then go back to your import list and check out those addresses for
> errors...
> 
> Might not help, but it's certainly something to look at while you are
> waiting for inspiration.
> 
> Another thought - how much space does your server have available (df)? 
> How
> is your memory on your server (top)?  Could you be running out of
> resources?
> 
> Jon Carnes
> ----- Original Message -----
> From: "Ted M Harapat" <ted-mailman at mob.net>
> To: <mailman-users at python.org>
> Sent: Thursday, November 08, 2001 3:17 PM
> Subject: [Mailman-Users] Postfix/Mailman breaks (somewhat) upon sending
> out
> to list of 300+ users
> 
> 
> > Hello all. Any suggestions and ideas on this problem are welcome.
> >
> > First of all, I recently switched to postfix after many years with
> sendmail and
> > qmail. I really like it. So then I got majordomo working with it,
> then
> someone
> > suggested I try Mailman and I really love this system. Thank you FSF
> &
> Python!
> >
> > So it's easy enough to set up with a little looking around. I set it
> up
> and set
> > up 4 lists on it. 3 of them with under 20 people on it, and one (for
> my
> father)
> > over 300 people. The smaller lists work perfectly. That is, until my
> father
> > sends out to his list of 300+ people. According to the mail logs,
> the
> incoming
> > mail to the server is received to the list and it is sent back to my
> father for
> > approval (as I have set it up intentionally). So he approves it, and
> it
> sends
> > emails out to exactly 53 of the 300+ members. Then it stops sending
> with
> no
> > errors (I've checked all of postfix's and syslog's logs). Not only
> that,
> but
> > everything else with Mailman is then foobarred. Now when any of the
> smaller
> > lists send to it, postfix records receiving it but then it doesn't do
> the
> next
> > step such as sending it out to the users on the list or going to the
> admin
> for
> > approval. It (Mailman) just stops dead cold. No more outgoing traffic
> or
> errors
> > explaining why.
> >
> > So, upon examining every log I could think of, I finally just Reload
> postfix
> > using the included scripts. Then, all of a sudden, everything starts
> > processing. All mail waiting to go to the admins for approval or
> waiting
> to go
> > to the end users on all the different lists are suddenly sent out as
> quickly
> > and efficiently as everything normally goes with Mailman. All mail
> except
> the
> > remaining 240+ people on my dad's list. Mail to those listmembers
> has
> > disappeared.
> >
> > So.... I decided to try this again. Same thing. Dad sends mail out to
> the
> list,
> > it goes to exactly 53 people and dies again, and makes Postfix go
> goofy
> again.
> > Outside of this combined Postfix/Mailman problem, the mail server acts
> as
> > normal, processing all other traffic. I almost suspect that this is
> much
> more
> > of a Mailman than postfix problem. I think I'm good enough with
> postfix to
> know
> > if its a problem there and it doesn't appear to be. But I can't be
> sure
> because
> > it is partially resolved just by my reloading the MTA.
> >
> > Strangely enough, these 53 users are the exact same 53 from the
> first
> time.
> >
> > I've checked over mailing lists for the last few months reading a lot
> of
> things
> > (since what an email of this subject would be called) and didn't
> find
> anything.
> > And nothing on this is in the FAQs or Manuals that I can tell.
> >
> > Anyone have any idea? Anything at all?
> >
> >
> > HELP!
> >
> >
> > -ted





More information about the Mailman-Users mailing list