[Mailman-Users] Stuck OutgoingRunner

Yasuhito FUTATSUKI futatuki at poem.co.jp
Tue Feb 6 22:43:18 EST 2018


On 02/07/18 01:01, Mark Sapiro wrote:
> On 02/06/2018 03:51 AM, Sebastian Hagedorn wrote:
>>
>> --On 4. Februar 2018 um 12:54:43 +0900 Yasuhito FUTATSUKI
>> <futatuki at poem.co.jp> wrote:
>>>
>>> As far as I read the code, if OutgoingRunner catch SIGINT during waiting
>>> for response from the MTA, the signal handler for SIGINT in qrunner set
>>> flag to exit from loop, then socket module raise socket.error for EINTR,
>>> but SMTP module retry to read from socket and waiting for response until
>>> receiving response or connection closing (from MTA side or by error).
>>> Thus it cannot reach to the code to exit if the connection is kept alive
>>> and MTA send no data.

I'm sorry, above is partly wrong, it is not smtplib.SMTP object to continue
reading but socket module itself.(on Python 2.7.14, socket._fileobject.readline())
But it does not affect main subject.

>> Thanks. I think that might be a possible explanation, but what could
>> cause a SIGINT to be sent to the OutgoingRunner?
> 
> 
> The above is an explanation of why the runner doesn't exit when it
> receives a SIGINT or SIGTERM from the master when you restart or stop
> Mailman and why you have to SIGKILL it. It suggests that what's
> happening when it's hung is it's waiting for a response from the MTA.

thanks to explain for my intension.

In fact,

On 02/02/18 19:26, Sebastian Hagedorn wrote:
> root at mailman3/usr/lib/mailman/bin]$ strace -p 1677
> Process 1677 attached
> recvfrom(10, ^CProcess 1677 detached

indicates the OutGoingRunner process 1677 was still in recvfrom(2)
system call (perhaps called from recv(2)) for FD 10, and

> [root at mailman3/usr/lib/mailman/bin]$ lsof -p 1677
> COMMAND    PID    USER   FD   TYPE   DEVICE SIZE/OFF   NODE NAME
> python2.7 1677 mailman  cwd    DIR    253,0     4096 173998 /usr/lib/mailman
> python2.7 1677 mailman  rtd    DIR    253,0     4096      2 /
> ...
> python2.7 1677 mailman   10u  IPv6 46441320      0t0    TCP mailman3.rrz.uni-koeln.de:55764->smtp-out.rrz.uni-koeln.de:smtp (ESTABLISHED)

indicates its FD 10 was ESTABLISHED connection to the MTA.


If the MTA is hanging up (or very slow progress) in application layer and
keeping alive TCP connection in lower layer, client using smtplib
without specifying timeout, like current SMTPDirect handler in Mailman,
must wait for response or the MTA dying.

Unfortunately smtplib for Python 2 before 2.6 don't have way to specify
timeout. It uses a socket in blocking mode unless seting default timeout
by using socket.setdefaulttimeout() before calling smtplib.SMTP.connction().
For Python 2.6 and above, it can be specified on create smtplib.SMTP object.

-- 
Yasuhito FUTATSUKI <futatuki at poem.co.jp>



More information about the Mailman-Users mailing list