The joys and jilts of non-blocking sockets
Robert Amesz
rcameszREMOVETHIS at dds.removethistoo.nl
Sat May 5 09:46:16 EDT 2001
I've recently been doing a little work with sockets, more in particular
non-blocking sockets, and I'm sorry to say the standard Python
documentation isn't really too helpful here. I feel this is a mistake:
without documentation people like me will experiment to find out how
things work, and we may end up relying on features which are either
different for different platforms, or not guaranteed to work with
different versions of Python, or both. This is not good. I've
documented my experiments in the hope that will be useful to others and
also to elicit some comments, in particular where other platforms or
versions of Python are concerned.
I've also studied timeoutsocket.py for some hints and pointers about
socket behaviour, and this is a good source of information about some
of the quirks of non-blocking sockets, so I'd like to thank
Timothy O'Malley for that.
Even so I'd like to take the opportunity to point out a few bugs and a
design flaw in version 1.15 (the latest version I was able to find).
One of the bugs is/are a set of missing commas in lines 142, 143, 144
and 147: without those commas tuples aren't tuples, I'm afraid. (My
guess is those were lists originally.)
The other, slightly larger bug is that error code 10022 (in
TimeoutSocket.connect()) is taken as an indication that the connection
has been made, while in fact the connection has been refused (see below
for more details about that).
The design flaw is that the module makes non-blocking sockets behave
like blocking ones: this just doesn't make sense to me. Arguably, using
both types of sockets in a single application shouldn't be too common,
but as it - very cleverly - replaces the normal socket-module once
imported, it really should handle the non-blocking case too. Not that
it's hard: in fact, it's almost trivial, but it should be done.
But let's concentrate on socket behaviour itself. The observations
below have been done on an Windows 98 machine, they might be different
on other Windows versions, and they certainly *will* de different on a
different OS, like UNIX or MAC-OS. I'm using Python version:
Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
Exceptions are shown on an indented line as they are displayed by
'print' or the stack trace, and on the next line you'll find the
symbolic name(s) for that code, as defined in module 'errno'. The codes
starting with 'WSA' are from the Windows sockets .dll.
CONNECTING THE SOCKET
---------------------
If the host exists and can be reached, connecting to it using a non-
blocking call *always* leads to this exception:
(10035, 'The socket operation could not complete without blocking')
10035 = EWOULDBLOCK WSAEWOULDBLOCK
This message does not give you *any* information about the status of
the connection: the machine may be busy connecting, the connection may
have been made, or the connection may have been refused.
If the connection is refused (i.e. there's either no service listening
to the port you're trying to connect to, or no new connections are
being accepted on that port), trying to receive (or send) data through
that socket will, once again, produce the same exception (10035), so
that won't really help you to find out what your connection status is.
Using getpeername() is more helpful:
(10057, 'Socket is not connected')
10057 = ENOTCONN WSAENOTCONN
Unfortunately, the manual says this function doesn't exist on all
platforms, so portable code should try to avoid it.
But don't despair: if a connection is refused trying to connect again
to the same host using the same socket will yield the following
surprising exception:
(10022, 'Invalid argument')
10022 = WSAEINVAL
Well, it accepted the parameter(s) before, so what's that all about?
Furthermore, it doesn't make any difference if you change the port
number, you'll get the same result. Using a different hostname which
points to the same IP-address doesn't change anything either, but using
a different hostname *does*, strangely enough.
As sockets can't be re-used anyway this isn't something to look into
too deeply. After a close() all further operations on that socket are
are expressly forbidden, and in fact impossible. Yes, I just had to
try! Although the exception you get when you try to reconnect is pretty
puzzling:
AttributeError: 'int' object has no attribute 'connect'
What happens here is that the internal socket object has been replaced
by the int 0. But please don't rely on behaviour like that.
Ok, back to connecting. Because I did my testing on a single machine I
wasn't able to catch the socket system in the middle of the connection
handshake, so I can't tell if trying to connect at that time will yield
another exception, but trying it after establishing the connection will
result in this very predictable exception.
(10056, 'Socket is already connected')
10056 = EISCONN WSAEISCONN
Hurrah, we're connected! Or are we? Well, not neccesarily: the
connection may have been broken already. The system doesn't seem to be
able to tell an idle connection from a broken one (this may be part of
the nature of TCP/IP), and you need to do something with the stream to
find that out, as you'll see below.
On the other hand, if you try to connect to a non-existent IP-address
you'll see this exception:
(10065, 'No route to host')
10065 = EHOSTUNREACH WSAEHOSTUNREACH
If the hostname couldn't be resolved, this is what you get:
('host not found',)
What, no error code? That's right, and this could be an issue if you
expect a number in the first position of this tuple-like exception, or
anything at all in the second position.
SENDING DATA
------------
Pretty straightforward, really. Just do MySocket.send(data), and if
there's nothing wrong with the connection it will either work, or
raise:
(10035, 'The socket operation could not complete without blocking')
10035 = EWOULDBLOCK WSAEWOULDBLOCK
In the documentation it states that the function returns the number of
bytes actually sent, but I've never observed this number to be
different from the amount you're trying to send, even when it's a big
chunk of data. When trying to flood a connection with data with small
bits of data (I didn't read the data on the receiving end) it would
raise the above exception after about 18K of data was 'sent', but if
you try to send() larger (even much larger) chunks of data the first
call always works, and only subsequent calls raise the exception. This
behaviour might not be portable, though.
If you try to send data when the connection has been broken the
following exception is raised:
(10054, 'Connection reset by peer')
10054 = ECONNRESET WSAECONNRESET
RECEIVING DATA
--------------
Doing a MySocket.recv(max_length) can certainly result in some
unexpected behaviour: if the connection is good, and there's some data
waiting, the data will be returned. That's not the surprising bit. When
the connection is good, and there's no data waiting, you'll get the
ubiquitous
(10035, 'The socket operation could not complete without blocking')
10035 = EWOULDBLOCK WSAEWOULDBLOCK
exception. That, too, isn't surprising. What *is* surprising, however, is
that when the connection has been broken on the other end, no exception
is raised whatsoever, but the recv() function will keep returning zero-
length strings. I wonder if that behaviour is intentional? Or portable,
for that matter. As this is the only way that I know of telling a dead
connection from a live one when receiving data, we're forced to rely on
this strange behaviour, but I'd prefer the ECONNRESET-exception to would
be raised.
SOCKET EXCEPTIONS
-----------------
Sockets raise exceptions of type socket.error, and like any other
exception that's a class. But you might be forgiven for thinking that
it's a tuple because for all intents and purposes it behaves like one.
(I presume this is for historic reasons, to make sure older code will
keep working as expected.) It looks that way in the traceback, and if e
is the exception you've caught you can look at e[0] (the number of the
error) and e[1] (the associated message). This rule has one exception,
however, and that is the ('host not found',) exception, which has the
error message in the first position, and doesn't have a second position.
Strange beasts, those sockets. Under Windows, anyway.
Robert Amesz
--
APPENDIX - socket error codes from the 'errno' module
10004 = WSAEINTR
10009 = WSAEBADF
10013 = WSAEACCES
10014 = WSAEFAULT
10022 = WSAEINVAL
10024 = WSAEMFILE
10035 = EWOULDBLOCK WSAEWOULDBLOCK
10036 = EINPROGRESS WSAEINPROGRESS
10037 = EALREADY WSAEALREADY
10038 = ENOTSOCK WSAENOTSOCK
10039 = EDESTADDRREQ WSAEDESTADDRREQ
10040 = EMSGSIZE WSAEMSGSIZE
10041 = EPROTOTYPE WSAEPROTOTYPE
10042 = ENOPROTOOPT WSAENOPROTOOPT
10043 = EPROTONOSUPPORT WSAEPROTONOSUPPORT
10044 = ESOCKTNOSUPPORT WSAESOCKTNOSUPPORT
10045 = EOPNOTSUPP WSAEOPNOTSUPP
10046 = EPFNOSUPPORT WSAEPFNOSUPPORT
10047 = EAFNOSUPPORT WSAEAFNOSUPPORT
10048 = EADDRINUSE WSAEADDRINUSE
10049 = EADDRNOTAVAIL WSAEADDRNOTAVAIL
10050 = ENETDOWN WSAENETDOWN
10051 = ENETUNREACH WSAENETUNREACH
10052 = ENETRESET WSAENETRESET
10053 = ECONNABORTED WSAECONNABORTED
10054 = ECONNRESET WSAECONNRESET
10055 = ENOBUFS WSAENOBUFS
10056 = EISCONN WSAEISCONN
10057 = ENOTCONN WSAENOTCONN
10058 = ESHUTDOWN WSAESHUTDOWN
10059 = ETOOMANYREFS WSAETOOMANYREFS
10060 = ETIMEDOUT WSAETIMEDOUT
10061 = ECONNREFUSED WSAECONNREFUSED
10062 = ELOOP WSAELOOP
10063 = WSAENAMETOOLONG
10064 = EHOSTDOWN WSAEHOSTDOWN
10065 = EHOSTUNREACH WSAEHOSTUNREACH
10066 = WSAENOTEMPTY
10067 = WSAEPROCLIM
10068 = EUSERS WSAEUSERS
10069 = EDQUOT WSAEDQUOT
10070 = ESTALE WSAESTALE
10071 = EREMOTE WSAEREMOTE
10091 = WSASYSNOTREADY
10092 = WSAVERNOTSUPPORTED
10093 = WSANOTINITIALISED
10101 = WSAEDISCON
More information about the Python-list
mailing list