More urllib timeout issues.

John Nagle nagle at animats.com
Fri Apr 27 15:03:19 EDT 2007


   I thought I had all the timeout problems with urllib worked around,
but no.

   socket.setdefaulttimeout is useful, but not always effective.
I'm setting that to 15 seconds.
If the host end won't open the connection within 15 seconds,
urllib times out.  But if the host end opens the connection,
then never sends anything, urllib waits for many minutes before
timing out.  Any idea how to deal with this?  And don't just
say "use urllib2" unless you KNOW it works better there and
can explain why.  I finally have M2Crypto and urllib playing
well together, and don't want to mess with that.

   For some wierd reason, several UK academic sites have this
behavior, including "soton.ac.uk".  If you try to open that
in a browser, the browser just sits there, and eventually,
after several minutes, displays "The site is taking too
long to respond".

   What's the current status in this area?  Some patches to sockets
were proposed a while back.  There's a long history of trouble
in this area, and some fixes, but nothing that just works.
The sockets module has two timeout settings (socket.setdefaulttimeout and
sock.settimeout, the M2Crypto module has two (sock.set_socket_read_timeout and 
sock.set_socket_write_timeout), and none of them play well together
or with the urllib/urllib2/httplib level and the blocking/non blocking
socket distinction.

   What we really should have is something like this:

Sockets should have
	set_socket_connect_timeout
	set_socket_read_timeout
	set_socket_write_timeout

which set an upper limit on how long a socket can go with a request for
a connect, read or write pending but without progress on the connection.
This needs to be independent of select poll timeouts, and these timeouts
should work on blocking sockets.

The existing socket function

	settimeout

should set all of the above, and

	socket.setdefaulttimeout

should set the default value for settimeout to be used on new sockets.

SSL and M2Crypto, which wrap socket functionality,
should understand all the above functions.

HTTPlib, urllib, and urllib2 objects should understand

	settimeout

Making the connect/read/write timeout distinction at that level
probably isn't worth the trouble.
	
Then we'd have a reasonable network timeout system.
We have about half of the above now, but it's not consistent.

Comments?

				John Nagle



More information about the Python-list mailing list