[Python-Dev] socket.SOL_REUSEADDR: different semantics between Windows vs Unix (or why test_asynchat is sometimes dying on Windows)
Trent Nelson
tnelson at onresolve.com
Fri Apr 4 22:24:49 CEST 2008
Interesting results! I committed the patch to test_socket.py in r62152. I was expecting all other platforms except for Windows to behave consistently (i.e. pass). That is, given the following:
import socket
host = '127.0.0.1'
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((host, 0))
port = sock.getsockname()[1]
sock.close()
del sock
sock1 = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock1.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock1.bind((host, port))
sock2 = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock2.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock2.bind((host, port))
^^^^
....the second bind should fail with EADDRINUSE, at least according to the 'SO_REUSEADDR and SO_REUSEPORT Socket Options' section in chapter 7.5 of Stevens' UNIX Network Programming Volume 1 (2nd Ed):
"With TCP, we are never able to start multiple servers that bind
the same IP address and same port: a completely duplicate binding.
That is, we cannot start one server that binds 198.69.10.2 port 80
and start another that also binds 198.69.10.2 port 80, even if we
set the SO_REUSEADDR socket option for the second server."
The results: both Windows *and* Linux fail the patched test; none of the buildbots for either platform encountered an EADDRINUSE socket.error after the second bind(). FreeBSD, OS X, Solaris and Tru64 pass the test -- EADDRINUSE is raised on the second bind. (Interesting that all the ones that passed have a BSD lineage.)
I've just reverted the test in r62156 as planned. The real issue now is that there are tests that are calling test_support.bind_socket() with the assumption that the port returned by this method is 'unbound', when in fact, the current implementation can't guarantee this:
def bind_port(sock, host='', preferred_port=54321):
for port in [preferred_port, 9907, 10243, 32999, 0]:
try:
sock.bind((host, port))
if port == 0:
port = sock.getsockname()[1]
return port
except socket.error, (err, msg):
if err != errno.EADDRINUSE:
raise
print >>sys.__stderr__, \
' WARNING: failed to listen on port %d, trying another' % port
This logic is only correct for platforms other than Windows and Linux. I haven't looked into all the networking test cases that rely on bind_port(), but I would think an implementation such as this would be much more reliable than what we've got for returning an unused port:
def bind_port(sock, host='127.0.0.1', *args):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host, 0))
port = s.getsockname()[1]
s.close()
del s
sock.bind((host, port))
return port
Actually, FWIW, I just ran a full regrtest.py against trunk on Win32 with this change in place and all the tests still pass.
Thoughts?
Trent.
________________________________________
From: python-dev-bounces+tnelson=onresolve.com at python.org [python-dev-bounces+tnelson=onresolve.com at python.org] On Behalf Of Trent Nelson [tnelson at onresolve.com]
Sent: 04 April 2008 17:07
To: python-dev at python.org
Subject: Re: [Python-Dev] socket.SOL_REUSEADDR: different semantics between Windows vs Unix (or why test_asynchat is sometimes dying on Windows)
I've raised issue 2550 to track this problem. I've also provided a patch on the tracker to test_socket.py that reproduces the issue. Anyone mind if I commit this to trunk? I'd like to observe if any other platforms exhibit different behaviour via buildbots. It'll cause all the Windows slaves to fail on test_socket though. (I can revert it once I've seen how the buildbots behave until I can come up with an actual patch for Windows that fixes the issue.)
http://bugs.python.org/issue2550
http://bugs.python.org/file9939/test_socket.py.patch
Trent.
________________________________________
From: python-dev-bounces+tnelson=onresolve.com at python.org [python-dev-bounces+tnelson=onresolve.com at python.org] On Behalf Of Trent Nelson [tnelson at onresolve.com]
Sent: 03 April 2008 22:40
To: python-dev at python.org
Subject: [Python-Dev] socket.SOL_REUSEADDR: different semantics between Windows vs Unix (or why test_asynchat is sometimes dying on Windows)
I started looking into this:
http://www.python.org/dev/buildbot/all/x86%20W2k8%20trunk/builds/289/step-test/0
Pertinent part:
test_asyncore
<snip>
test_asynchat
command timed out: 1200 seconds without output
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1
remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process
]
I tried to replicate it on the buildbot in order to debug, which, surprisingly, I could do consistently by just running rt.bat -q -d -uall test_asynchat. As the log above indicates, the python process becomes completely and utterly wedged, to the point that I can't even attach a remote debugger and step into it.
Digging further, I noticed that if I ran the following code in two different python consoles, EADDRINUSE was *NOT* being raised by socket.bind():
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('127.0.0.1', 54322))
However, take out the setsockopt line, and wallah, the second s.bind() will raise EADDRINUSE, as expected. This manifests into a really bizarre issue with test_asynchat in particualr, as subsequent sock.accept() calls on the socket put python into the uber wedged state (can't even ctrl-c out at the console, need to kill the process directly).
Have to leave the office and head home so I don't have any more time to look at it tonight -- just wanted to post here for others to mull over.
Trent.
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/tnelson%40onresolve.com
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/tnelson%40onresolve.com
More information about the Python-Dev
mailing list