[Python-Dev] Socket timeout patch

Fri, 07 Jun 2002 12:29:40 -0400

First, a few new issues in this thread:

- On Windows, setting a timeout on a socket and then using s.makefile()
  works as expected (the I/O operations on the file will time out
  according to the timeout set on the socket).  This is because
  makefile() returns a pseudo-file that calls s.recv() etc. on the
  socket object.  But on Unix, s.makefile() on a socket with a timeout
  is a disaster: because the socket is internally set to nonblocking
  mode, all I/O operations will fail if they cannot be completed
  immediately (effectively setting a timeout of 0 on the file).  I have
  currently documented around this, but maybe it would be better if
  makefile() used a pseudo-file on all platforms, for uniform behavior.
  Thoughts?  I'm also thinking of implementing the socket wrapper (which
  is currently a Python class that has a "real" socket object as an
  instance variable) as a subclass of the real socket class instead.

- The original timeout socket code (in Python) by Tim O'Malley had a
  global timeout which you could set so that *all* sockets
  *automatically* had their timeout set.  This is nice if you want it
  to affect library modules like urllib or ftplib.  That feature is
  currently missing.  Should we add it?  (I have some concerns about
  it, in that it might break other code -- and it doesn't seem wise to
  do this for server-end sockets in general.  But it's a nice hack for
  smaller programs.)

Now on to my reply to Michael Gilfix:

>   Good stuff. The module needed a little work as I discovered as well
> :)

...and it still needs more.  There are still way too many #ifdefs in
the code.

>   Er, hopefully Bernard is still watching this thread as he wrote
> the test_timeout.py. He's been pretty quiet though as of late... I'm
> willing to rewrite the tests if he doesn't have the time. 

Either way would be good.

>   I think the tests should follow the same pattern as the
> test_socket.py.  While adding my regression tests, I noted that the
> general socket test suite could use some re-writing but I didn't feel
> it appropriate to tackle it at that point. Perhaps a next patch?

Yes, please!

> > - Cross-platform testing.  It's possible that the cleanup broke things
> >   on some platforms, or that select() doesn't work the same way.  I
> >   can only test on Windows and Linux; there is code specific to OS/2
> >   and RISCOS in the module too.
> 
>   This was a concern from the beginning but we had some chat on the
> dev list and concluded that any system supporting sockets has to
> support select or some equivalent (hence the initial reason for using
> the select module, although I agree it was expensive).

But that doesn't mean there aren't platform-specific tweaks necessary
to import the definition of select() and the FD_* macros.  We'll find
out soon enough, this is what alpha releases are for. :-)

> > - I'm not sure that the handling of timeout errors in accept(),
> >   connect() and connect_ex() is 100% correct (this code sniffs the
> >   error code and decides whether to retry or not).
> 
>   I've tested these on linux (manually) and they seem to work just
> fine. I didn't do as much testing with connect_ex but the code is
> very similar to connect, so confidence is high-er. The reason for the
> two-pass is because the initial connect needs to be made to start the
> process and then try again, based on the error codes, for non-blocking
> connects. It's weird like that.

I'll wait for the unit tests.  These should test all three modes
(blocking, non-blocking, and timeout).

Can you explain why on Windows you say that the socket is connected
when connect() returns a WSAEINVAL error?

Also, your code enters this block even in non-blocking mode, if a
timeout was also set.  (Fortunately I fixed this for you by setting
the timeout to -1.0 in setblocking(). Unfortunately there's still a
later test for !sock_blocking in the same block that cannot succeed
any more because of that.)

The same concerns apply to connect_ex() and accept(), which have very
similar logic.

I believe it is possible on some Unix variants (maybe on Linux) that
when select indicates that a socket is ready, if the socket is in
nonblocking mode, the call will return an error because some kernel
resource is unavailable.  This suggests that you may have to keep the
socket in blocking mode except when you have to do a connect() or
accept() (for which you can't do a select without setting the socket
in nonblocking mode first).

Looking at connect_ex, it seems to be missing the "res = errno" bit
in the case where it says "we're already connected".  It used to
return errno here, now it will return -1.  Maybe the conex_finally
label should be moved up to before the "if (res != 0) {" line?

> > - Should sock.settimeout(0.0) mean the same as sock.setblocking(0)?
> >   Currently it sets a timeout of zero seconds, and that behaves pretty
> >   much the same as setting the socket in nonblocking mode -- but not
> >   exactly.  Maybe these should be made the same?
> 
>   I thought about this and whether or not I wanted to address this.  I
> kinda decided to leave them separate though. I don't think setting a
> timeout means anything equivalent to setblocking(0). In fact, I can't
> see why anyone should ever set a timeout of zero and the immediate
> throwing of the exception is a good alert as to what's going on. I
> vote, leave them separate and as they are now...

OTOH, a timeout of 0 behaves very similar to nonblocking mode --
similar enough that a program that uses setblocking(0) would probably
also work when using settimeout(0).  I kind of like the idea of having
only a single internal flag value, sock_timeout, rather than two
(sock_timeout and sock_blocking).

> > - The socket.py module has been changed too, changing the way
> >   buffering is done on Windows.  I haven't reviewed or tested this
> >   code thoroughly.
> 
>   I added a regression test to test_socket.py to test this, that works
> on both the old code (I used 2.1.3) and the new code. Hopefully, this
> will be instrumental for those testing it and it reflects my manual
> tests.

The tests don't look very systematic.  There are many cases (default
bufsize, unbuffered, bufsize=1, large bufsize; combine with read(),
readline(), read a line larger than the buffer size, etc.).  We need a
more systematic approach to unit testing here.

--Guido van Rossum (home page: http://www.python.org/~guido/)