[Mailman-Developers] Re: Slow Performance on semi-large lists

David Champion dgc@uchicago.edu
Wed, 13 Dec 2000 20:28:18 -0600


I shifted this to mailman-developers because I want to talk about
changes in qrunner that D.J. Atkinson brought up.


On 2000.12.13, in <Pine.SOL.4.05.10012131455360.22660-100000@babu.pcisys.net>,
	"D.J. Atkinson" <dj@pcisys.net> wrote:
>
> I posted a message over the weekend where I saw qrunner only processing
> part of the queue.  It turned out that there were three messages in the
> queue with 3 unresolvable names each.  (3 messages to the same list)
> Each of these queued files took 400 seconds to time out, by which time, we
> were past the default max qrunner process length (15 minutes), and qrunner
> exited.
>
> I've of course now increased the process length to 30 minutes, and
> everything seems to be OK.  But that's only temporary, I'm sure.  As list
> volume builds, it will become a problem again.  It would be great if there
> were a more graceful way of dealing with this than currently exists.

How about altering qrunner's algorithm to split the queue on timeout,
appending the head of the queue to the tail?

A - fails
B - succeeds
C - fails
D - fails/unprocessed; qrunner times out
E - unprocessed
F - unprocessed

With this change, your next queue runner will process this queue:

E
F
A
C
D

Eventually (ahem) the queue will contain only those batches which are
hard to deliver, and they'll be re-ordered with each run to give equal
attempts over time.

Actually, that's not true if the queue is reduced to containing only A,
C, and D, and qrunner always times out on D; D will never get the same
time as A and C.  Leaving D at the head of the queue (that is,
splitting the queue ahead of the current batch, rather than behind it)
solves that problem until the case occurs in which D contains enough
bad or slow addresses to stop the queue even though it's first.  Two
solutions to this: 1) never stop qrunner during the first queued batch
(always wait for it to exit); or 2) split the queue ahead or behind of
the current batch randomly.

Does this seem to anyone else to solve the problem?  I haven't looked
at the code yet, so this is just cursory thought.

--
 -D.	dgc@uchicago.edu	NSIT	University of Chicago