The joys and jilts of non-blocking sockets

Timothy O'Malley timo at alum.mit.edu
Wed May 9 01:10:27 EDT 2001


[[ This message was both posted and mailed: see
   the "To," "Cc," and "Newsgroups" headers for details. ]]

hola.

In article <Xns90989FD82A446rcamesz at 127.0.0.1>, Robert Amesz wrote:
> I've also studied timeoutsocket.py for some hints and pointers about 
> socket behaviour, and this is a good source of information about some 
> of the quirks of non-blocking sockets, so I'd like to thank 
> Timothy O'Malley for that.

Thanks.

> Even so I'd like to take the opportunity to point out a few bugs and a 
> design flaw in version 1.15 (the latest version I was able to find). 
> One of the bugs is/are a set of missing commas in lines 142, 143, 144 
> and 147: without those commas tuples aren't tuples, I'm afraid. (My 
> guess is those were lists originally.)

Again, thanks.  Those shortcuts were recently added to remove
the name lookups (and platform dependency) from the connect()
and accept() routines.  Glad to hear of the problem and make the
fixes.

> The design flaw is that the module makes non-blocking sockets behave 
> like blocking ones: this just doesn't make sense to me.

It's true.  When I first wrote TimeoutSocket, I had it in my
mind that it would *only* be used with blocking sockets. (Why would
anyone need timeouts when none of their operations block?)  This
assumption broke down when I added the silent shim effect, whereby
it surreptiously replaces the real socket module.

I have had it on my "back burner" projects to fix this.  I haven't
got around to it -- largely because no one has noticed it before.
In general, people do not use this module with non-blocking sockets.

> but as it - very cleverly - replaces the normal socket-module once 
> imported, it really should handle the non-blocking case too. Not that 
> it's hard: in fact, it's almost trivial, but it should be done.

All true.  In version 1.16, ok?

> If the host exists and can be reached, connecting to it using a non-
> blocking call *always* leads to this exception:
> 
>    (10035, 'The socket operation could not complete without blocking')
>    10035 = EWOULDBLOCK     WSAEWOULDBLOCK
> 
> This message does not give you *any* information about the status of 
> the connection: the machine may be busy connecting, the connection may 
> have been made, or the connection may have been refused.

You lose me here.  By using a non-blocking socket, you have told
the operating system that it should never block on this socket.
In that case, I expect this exception.  The operating system is
informing you that the operation would block (because it wasn't
finished) when you made the connect() call.  This is all standard
and routine behavior for non-blocking sockets.

> If the connection is refused (i.e. there's either no service listening 
> to the port you're trying to connect to, or no new connections are 
> being accepted on that port), trying to receive (or send) data through 
> that socket will, once again, produce the same exception (10035), so 
> that won't really help you to find out what your connection status is. 

You should use select() and wait for the socket to be writeable to
determine when a socket is connected.  Again....standard procedure.


> But don't despair: if a connection is refused trying to connect again 
> to the same host using the same socket will yield the following 
> surprising exception:
> 
>    (10022, 'Invalid argument')
>    10022 =                 WSAEINVAL

In fact, once the socket is 'writeable' (using the above test), you
are supposed to call connect() again.  The result of the second
connect() call determines the result of the overall connection
operation.

I admit, I do not know what this Windows error message means.  Even
so, that does not mean that it is a failure of the interface.
Sometimes connections have errors -- this is all part of normal
operation.

> As sockets can't be re-used anyway this isn't something to look into 
> too deeply. After a close() all further operations on that socket are 
> are expressly forbidden, and in fact impossible. Yes, I just had to 
> try! Although the exception you get when you try to reconnect is pretty 
> puzzling:
> 
>    AttributeError: 'int' object has no attribute 'connect'

Ooops.  That looks like a TimeoutSocket bug.


> Ok, back to connecting. Because I did my testing on a single machine I 
> wasn't able to catch the socket system in the middle of the connection 
> handshake, so I can't tell if trying to connect at that time will yield 
> another exception, but trying it after establishing the connection will 
> result in this very predictable exception.
> 
>    (10056, 'Socket is already connected')
>    10056 = EISCONN         WSAEISCONN

As expected.  This is the correct response.

> Hurrah, we're connected! Or are we? Well, not neccesarily: the 
> connection may have been broken already. The system doesn't seem to be 
> able to tell an idle connection from a broken one (this may be part of 
> the nature of TCP/IP),

It is -- at least to a first approximation.  Before worrying about the
limitations of TCP, it'd be good to have a strong understanding of the
"normal" case.



In summary of this long message.....
  - Thanks for the bug report.
  - You are right about the non-blocking inadequacies of TimeoutSocket.
  - Much of what you have complained about is standard procedure
    when using sockets.  A whirlwind tour through BSD sockets would
    explain some of the history.

good luck.



More information about the Python-list mailing list