The joys and jilts of non-blocking sockets

Robert Amesz rcameszREMOVETHIS at dds.removethistoo.nl
Sat May 5 09:46:16 EDT 2001


I've recently been doing a little work with sockets, more in particular 
non-blocking sockets, and I'm sorry to say the standard Python 
documentation isn't really too helpful here. I feel this is a mistake: 
without documentation people like me will experiment to find out how 
things work, and we may end up relying on features which are either 
different for different platforms, or not guaranteed to work with 
different versions of Python, or both. This is not good. I've 
documented my experiments in the hope that will be useful to others and 
also to elicit some comments, in particular where other platforms or 
versions of Python are concerned.

I've also studied timeoutsocket.py for some hints and pointers about 
socket behaviour, and this is a good source of information about some 
of the quirks of non-blocking sockets, so I'd like to thank 
Timothy O'Malley for that.

Even so I'd like to take the opportunity to point out a few bugs and a 
design flaw in version 1.15 (the latest version I was able to find). 
One of the bugs is/are a set of missing commas in lines 142, 143, 144 
and 147: without those commas tuples aren't tuples, I'm afraid. (My 
guess is those were lists originally.)

The other, slightly larger bug is that error code 10022 (in 
TimeoutSocket.connect()) is taken as an indication that the connection 
has been made, while in fact the connection has been refused (see below 
for more details about that).

The design flaw is that the module makes non-blocking sockets behave 
like blocking ones: this just doesn't make sense to me. Arguably, using 
both types of sockets in a single application shouldn't be too common, 
but as it - very cleverly - replaces the normal socket-module once 
imported, it really should handle the non-blocking case too. Not that 
it's hard: in fact, it's almost trivial, but it should be done.

But let's concentrate on socket behaviour itself. The observations 
below have been done on an Windows 98 machine, they might be different 
on other Windows versions, and they certainly *will* de different on a 
different OS, like UNIX or MAC-OS. I'm using Python version:

   Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32

Exceptions are shown on an indented line as they are displayed by 
'print' or the stack trace, and on the next line you'll find the 
symbolic name(s) for that code, as defined in module 'errno'. The codes 
starting with 'WSA' are from the Windows sockets .dll. 



CONNECTING THE SOCKET
---------------------

If the host exists and can be reached, connecting to it using a non-
blocking call *always* leads to this exception:

   (10035, 'The socket operation could not complete without blocking')
   10035 = EWOULDBLOCK     WSAEWOULDBLOCK

This message does not give you *any* information about the status of 
the connection: the machine may be busy connecting, the connection may 
have been made, or the connection may have been refused.

If the connection is refused (i.e. there's either no service listening 
to the port you're trying to connect to, or no new connections are 
being accepted on that port), trying to receive (or send) data through 
that socket will, once again, produce the same exception (10035), so 
that won't really help you to find out what your connection status is. 
Using getpeername() is more helpful:

   (10057, 'Socket is not connected')
   10057 = ENOTCONN        WSAENOTCONN

Unfortunately, the manual says this function doesn't exist on all 
platforms, so portable code should try to avoid it. 

But don't despair: if a connection is refused trying to connect again 
to the same host using the same socket will yield the following 
surprising exception:

   (10022, 'Invalid argument')
   10022 =                 WSAEINVAL

Well, it accepted the parameter(s) before, so what's that all about? 
Furthermore, it doesn't make any difference if you change the port 
number, you'll get the same result. Using a different hostname which 
points to the same IP-address doesn't change anything either, but using 
a different hostname *does*, strangely enough.

As sockets can't be re-used anyway this isn't something to look into 
too deeply. After a close() all further operations on that socket are 
are expressly forbidden, and in fact impossible. Yes, I just had to 
try! Although the exception you get when you try to reconnect is pretty 
puzzling:

   AttributeError: 'int' object has no attribute 'connect'

What happens here is that the internal socket object has been replaced 
by the int 0. But please don't rely on behaviour like that.


Ok, back to connecting. Because I did my testing on a single machine I 
wasn't able to catch the socket system in the middle of the connection 
handshake, so I can't tell if trying to connect at that time will yield 
another exception, but trying it after establishing the connection will 
result in this very predictable exception.

   (10056, 'Socket is already connected')
   10056 = EISCONN         WSAEISCONN


Hurrah, we're connected! Or are we? Well, not neccesarily: the 
connection may have been broken already. The system doesn't seem to be 
able to tell an idle connection from a broken one (this may be part of 
the nature of TCP/IP), and you need to do something with the stream to 
find that out, as you'll see below.

On the other hand, if you try to connect to a non-existent IP-address 
you'll see this exception:

   (10065, 'No route to host')
   10065 = EHOSTUNREACH    WSAEHOSTUNREACH


If the hostname couldn't be resolved, this is what you get:

   ('host not found',)

What, no error code? That's right, and this could be an issue if you 
expect a number in the first position of this tuple-like exception, or 
anything at all in the second position.



SENDING DATA
------------

Pretty straightforward, really. Just do MySocket.send(data), and if 
there's nothing wrong with the connection it will either work, or 
raise:

   (10035, 'The socket operation could not complete without blocking')
   10035 = EWOULDBLOCK     WSAEWOULDBLOCK

In the documentation it states that the function returns the number of 
bytes actually sent, but I've never observed this number to be 
different from the amount you're trying to send, even when it's a big 
chunk of data. When trying to flood a connection with data with small 
bits of data (I didn't read the data on the receiving end) it would 
raise the above exception after about 18K of data was 'sent', but if 
you try to send() larger (even much larger) chunks of data the first 
call always works, and only subsequent calls raise the exception. This 
behaviour might not be portable, though.

If you try to send data when the connection has been broken the 
following exception is raised:

   (10054, 'Connection reset by peer')
   10054 = ECONNRESET      WSAECONNRESET



RECEIVING DATA
--------------

Doing a MySocket.recv(max_length) can certainly result in some 
unexpected behaviour: if the connection is good, and there's some data 
waiting, the data will be returned. That's not the surprising bit. When 
the connection is good, and there's no data waiting, you'll get the 
ubiquitous

   (10035, 'The socket operation could not complete without blocking')
   10035 = EWOULDBLOCK     WSAEWOULDBLOCK

exception. That, too, isn't surprising. What *is* surprising, however, is 
that when the connection has been broken on the other end, no exception 
is raised whatsoever, but the recv() function will keep returning zero-
length strings. I wonder if that behaviour is intentional? Or portable, 
for that matter. As this is the only way that I know of telling a dead 
connection from a live one when receiving data, we're forced to rely on 
this strange behaviour, but I'd prefer the ECONNRESET-exception to would
be raised.



SOCKET EXCEPTIONS
-----------------

Sockets raise exceptions of type socket.error, and like any other 
exception that's a class. But you might be forgiven for thinking that 
it's a tuple because for all intents and purposes it behaves like one. 
(I presume this is for historic reasons, to make sure older code will 
keep working as expected.) It looks that way in the traceback, and if e 
is the exception you've caught you can look at e[0] (the number of the 
error) and e[1] (the associated message). This rule has one exception, 
however, and that is the ('host not found',) exception, which has the 
error message in the first position, and doesn't have a second position.


Strange beasts, those sockets. Under Windows, anyway.


Robert Amesz
-- 

APPENDIX - socket error codes from the 'errno' module

10004 =                 WSAEINTR
10009 =                 WSAEBADF
10013 =                 WSAEACCES
10014 =                 WSAEFAULT
10022 =                 WSAEINVAL
10024 =                 WSAEMFILE
10035 = EWOULDBLOCK     WSAEWOULDBLOCK
10036 = EINPROGRESS     WSAEINPROGRESS
10037 = EALREADY        WSAEALREADY
10038 = ENOTSOCK        WSAENOTSOCK
10039 = EDESTADDRREQ    WSAEDESTADDRREQ
10040 = EMSGSIZE        WSAEMSGSIZE
10041 = EPROTOTYPE      WSAEPROTOTYPE
10042 = ENOPROTOOPT     WSAENOPROTOOPT
10043 = EPROTONOSUPPORT WSAEPROTONOSUPPORT
10044 = ESOCKTNOSUPPORT WSAESOCKTNOSUPPORT
10045 = EOPNOTSUPP      WSAEOPNOTSUPP
10046 = EPFNOSUPPORT    WSAEPFNOSUPPORT
10047 = EAFNOSUPPORT    WSAEAFNOSUPPORT
10048 = EADDRINUSE      WSAEADDRINUSE
10049 = EADDRNOTAVAIL   WSAEADDRNOTAVAIL
10050 = ENETDOWN        WSAENETDOWN
10051 = ENETUNREACH     WSAENETUNREACH
10052 = ENETRESET       WSAENETRESET
10053 = ECONNABORTED    WSAECONNABORTED
10054 = ECONNRESET      WSAECONNRESET
10055 = ENOBUFS         WSAENOBUFS
10056 = EISCONN         WSAEISCONN
10057 = ENOTCONN        WSAENOTCONN
10058 = ESHUTDOWN       WSAESHUTDOWN
10059 = ETOOMANYREFS    WSAETOOMANYREFS
10060 = ETIMEDOUT       WSAETIMEDOUT
10061 = ECONNREFUSED    WSAECONNREFUSED
10062 = ELOOP           WSAELOOP
10063 =                 WSAENAMETOOLONG
10064 = EHOSTDOWN       WSAEHOSTDOWN
10065 = EHOSTUNREACH    WSAEHOSTUNREACH
10066 =                 WSAENOTEMPTY
10067 =                 WSAEPROCLIM
10068 = EUSERS          WSAEUSERS
10069 = EDQUOT          WSAEDQUOT
10070 = ESTALE          WSAESTALE
10071 = EREMOTE         WSAEREMOTE
10091 =                 WSASYSNOTREADY
10092 =                 WSAVERNOTSUPPORTED
10093 =                 WSANOTINITIALISED
10101 =                 WSAEDISCON



More information about the Python-list mailing list