[Python-Dev] Socket timeout patch

Guido van Rossum guido@python.org
Fri, 07 Jun 2002 16:03:10 -0400


[Jeremy, please skip forward to where it says "Stevens" or "Jeremy"
and comment.]

>   Glad to hear that _fileobject works well.

I didn't say that.  I was hoping it does though. :-)

> Is there any benefit to having the file code in C?

Some operations (notably pickle.dump() and .load()) only work with
real files.  Other operations (e.g. printing a large list or dict) can
be faster to real files because they don't have to build the str() or
repr() of the whole thingk as a string first.

> I bet the python code isn't that much slower.

You're on.  Write a benchmark.  I notice that httplib uses makefile(),
and often reads very small chunks.

> It does seem a shame to have to switch
> between the two. Maybe one solution is that a makefile should set
> blocking back on if a timeout exists?

That's not very nice.  It could raise an exception.  But you could set
the timeout on the socket *after* calling makefile(), and then you'd
be hosed.  But if we always use the Python makefile(), the problem is
solved.

> That would solve the problem
> and is a consistent change since it only checks timeout behavior (=
> 2 lines of code). I'd vote for using the python fileobject for
> both. Some profiling of the two would be nice if you have the time.

I don't, maybe you do? :-)

> >   I'm also thinking of implementing the socket wrapper (which
> >   is currently a Python class that has a "real" socket object as an
> >   instance variable) as a subclass of the real socket class instead.

>   Er, what's the difference between that and _socketobject in
> socket.py? Why not just use the python bit consistently?

Making it a subclass should be faster.  But I don't know yet if it can
work -- I probably have to change the constructor at the C level to be
able to support dup() (which is the other reason for the Python
wrapper).

>   Is it really so painful for apps to keep track of all their sockets
> and then do something like:
> 
>      for sock in sock_list:
>         sock.settimeout (blah)
> 
>   Why keep track of them in the socket module, unless there's already code
> for this.

Steve Holden already answered this one.  Also, you don't have to keep
track of all sockets -- you just have to apply the timeout if one is
set in a global variable.

>   Well, we agreed to do some clean-up in a separate patch but you seem
> anxious to get it in there :)

I am relaxing now, waiting for you to pick up again.

>   Well, this was the initial reason to use the selectmodule.c code.
> There's got to be a way to share code between the two for bare access
> to select, since someone else might want to use such functionality one
> day (and this has set the precendent). Why not make a small change to
> selectmodule.c that opens up the code in a C API or some sort? And
> then have select_select use that function internally.

If two modules are both shared libraries, it's really painful to share
entry points (see the interface between _ssl.c and socketmodule.c for
an example -- it's written down in a large comment block in
socketmodule.h).  I really think one should be able to use select() in
more than one file.  At least it works on Windows. ;-)

> > > > - I'm not sure that the handling of timeout errors in accept(),
> > > >   connect() and connect_ex() is 100% correct (this code sniffs the
> > > >   error code and decides whether to retry or not).
> 
>   This is how the original timeoutsocket.py did it and it seems
> to be the way to do blocking connects.

I understand all that.  My comment is that your patch changed the
control flow in non-blocking mode too.  I think I accidentally fixed
it by setting sock_timeout to -1.0 in setblocking().  But I'm not 100%
sure so I'd like this aspect to be unit-tested thoroughly.

>   Ok.. Should I merge the test_timeout.py and test_socket.py as well
> then?

No, you can create as many (or as few) unit test files as you need.

> A little off-topic, while I was thinking of restructuring these
> tests, I was wondering what might be the best way to structure a unit
> test where things have to work in seperate processes/threads. What
> I'd really like to do is:
> 
>   * Have the setup function set up server and client sockets
>   * Have the tear-down function close them
>   * Have some synchronization function or simple message (this is
>     socket specific)
>   * Then have a test that has access to both threads and can insert
>     call-backs to run in each thread.
> 
>   This seems tricky with the current unit-test frame work. The way I'd
> do it is using threading.Event () and have the thing block until the
> server-side and client-side respectively submit their test callbacks.
> But it might be nice to have a general class that can be added to the
> testing framework. Thoughts?

If I were you I'd worry about getting it right once first.  Then we
can see if there's room for generalization.  (You might want to try to
convert test_socketserver.py to your proposed framework to see how
well it works.)

> > Can you explain why on Windows you say that the socket is connected
> > when connect() returns a WSAEINVAL error?
> 
>   This is what timeoutsocket.py used as the unix equivalent error
> codes, and since I'm not set up to test windows and since it was
> working code, I took their word for it.

Well, but WSAEINVAL can also be returned for other conditions.  See

http://msdn.microsoft.com/library/en-us/winsock/wsapiref_8m7m.asp

it seems that the *second* time you call connect() WSAEINVAL can only
mean that you're already connected.  But if this socket is in a
different state, and the connect() is simply not appropriate, I don't
like the fact that connect() would simply return "success" rather than
reporting an error.  E.g. I could do this:

  s = socket()
  s.settimeout(100)
  s.connect((host, port))
  .
  .
  .
  # By mistake:
  s.connect((otherhost, otherport))

I want the latter connect() to fail, but I think your code will make
it succeed.

> > Also, your code enters this block even in non-blocking mode, if a
> > timeout was also set.  (Fortunately I fixed this for you by setting
> > the timeout to -1.0 in setblocking(). Unfortunately there's still a
> > later test for !sock_blocking in the same block that cannot succeed
> > any more because of that.)
> 
>   This confusion is arising because of the restructuring of the code.
> Erm, this check applies if we have a timeout but are in non-blocking
> mode. Perhaps you changed this? To make it clearer, originally before
> the v2 of the patch, the socket was always in non-blocking mode, so
> it was necessary to check whether we were examining error code with
> non-blocking in mind, or whether we were checking for possible timeout
> behavior. Since we've changed this, it now checks if non-blocking has
> been set while a timeout has been set. Seems valid to me...

But I changed that again: setblocking() now always disables the
timeout.  Read the new source in CVS.

> > The same concerns apply to connect_ex() and accept(), which have very
> > similar logic.
> > 
> > I believe it is possible on some Unix variants (maybe on Linux) that
> > when select indicates that a socket is ready, if the socket is in
> > nonblocking mode, the call will return an error because some kernel
> > resource is unavailable.  This suggests that you may have to keep the
> > socket in blocking mode except when you have to do a connect() or
> > accept() (for which you can't do a select without setting the socket
> > in nonblocking mode first).
> 
>   Not sure about this. Checking the man pages, the error codes seem
> to be the thing to check to determine what the behavior is. Perhaps
> you could clarify?

When a timeout is set, the socket file descriptor is always in
nonblocking mode.  Take sock_recv() for example.  It calls
internal_select(), and if that returns >= 1, it calls recv().  But
according to the Stevens books, it is still possible (under heavy
load) that the recv() returns an EWOULDBLOCK error.

(We ran into this while debugging a high-performance application based
on Spread.  The select() succeeded, but the recv() failed, because the
socket was in nonblocking mode.  Well, I'm *almost* sure that this was
the case -- Jeremy Hylton should know the details.)

> > Looking at connect_ex, it seems to be missing the "res = errno" bit
> > in the case where it says "we're already connected".  It used to
> > return errno here, now it will return -1.  Maybe the conex_finally
> > label should be moved up to before the "if (res != 0) {" line?
> 
>   Ah yes. I didn't look closely enough at the windows bit. On linux
> it isn't necessary. Let's move it up.

OK, done.

> > OTOH, a timeout of 0 behaves very similar to nonblocking mode --
> > similar enough that a program that uses setblocking(0) would probably
> > also work when using settimeout(0).  I kind of like the idea of having
> > only a single internal flag value, sock_timeout, rather than two
> > (sock_timeout and sock_blocking).
> 
>   But one throws an exception and one doesn't.

What do you mean?  In nonblocking mode you get an exception when the
socket isn't ready too.

> It seems to me that
> setting a timeout of 0 is sort of an error, if anything. It'll be an
> easy way to do a superficial test of the functionality in the regr
> test.

OK, we don't seem to be able to agree on this.  I'll let your wisdom
prevail.

> > The tests don't look very systematic.  There are many cases (default
> > bufsize, unbuffered, bufsize=1, large bufsize; combine with read(),
> > readline(), read a line larger than the buffer size, etc.).  We need a
> > more systematic approach to unit testing here.
> 
>   Ok, so to recap which tests we want:
> 
>      * Default read()
>      * Read with size given
>      * unbuffered read
>      * large buffer size
>      * Mix read and realine
>      * Do a realine
>      * Do a readline larger than buffer size.
> 
>   Any others in the list?

Check the socket.py source code and make sure that every code path
through every method is taken at least once.

--Guido van Rossum (home page: http://www.python.org/~guido/)