[Mailman-Developers] Problems if shunting fails

Mark Sapiro msapiro at value.net
Fri Feb 23 00:24:15 CET 2007


Mark Sapiro wrote:
>
>Because of the changes in 2.1.9 to prevent message loss in case of
>disaster, there is now a .bak file left in the 'in' queue. When
>IncomingRunner restarts, it recovers the .bak file and the whole
>scenario repeats until the master reaches MAX_RESTARTS on
>IncomingRunner and we are left with no IncomingRunner and the .bak
>file still in the 'in' queue.
>
>In order to fix this, I suggest we protect the shunt enqueue in a try.
>I have worked up a patch for this which is attached. This patch also
>adds a 'preserve' argument to Switchboard.finish such that if it is
>called with preserve=True, instead of removing the .bak file, it just
>renames it with a .psv extension. These changes ensure that
>IncomingRunner doesn't exit, and no .bak file is left to cause a
>subsequent problem while still preserving the original queue entry for
>further analysis if possible.
>
>I would like some feedback on whether or not this is the right approach.


Since posting the above a couple of weeks ago, Another situation has
come up on mailman-users with a different scenario. The thread is at
<http://mail.python.org/pipermail/mailman-users/2007-February/055809.html>.
In this case, the problem was an unparseable message, and part of the
issue was due to the user not having email 2.5.8 installed in
pythonlib. Correcting this avoided the problem in her case, but there
is still an underlying issue here in that the only exception that we
catch is email.Errors.MessageParseError, and at least in this case,
email.message_from_string threw a ValueError.

The other part of this, is we used to just give up on this message when
the exception occurred as we didn't have anything else we could do.
Starting with Mailman 2.1.9, we still have the .bak queue entry which
can be 'preserved' as above. Thus, I have modified the previous
suggested patch, and a new patch is attached which catches any
exception from switchboard.dequeue() and logs and preserves the queue
entry.

Thus, with this patch, we have two different ways in which a queue
entry can be 'preserved' for analysis. One is when dequeue() throws an
exception and the other is when the attempt to enqueue() in the shunt
queue throws an exception.

The question I would like to discuss, is what is the best way to
preserve the queue entry for analysis. The patch just renames the .bak
file to .psv and leaves it in the original queue. This could
potentially over time accumulate a lot of .psv files in the 'in' or
other queues and impact processing.

We can't shunt the entry in the normal way because in some cases at
least, shunting has already thrown an exception. I can think of three
things to do.

1) Just rename the entry .psv and leave it in the original queue.
2) Rename it .psv and attempt to move it to the shunt queue.
3) Rename it .psv and attempt to move it to the bad queue.

I would like to get other people's thoughts on this.

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Developers mailing list