Problem with slow httplib connections on Windows (and maybe other platforms)

Christoph Zwerschke cito at online.de
Sun Feb 1 10:33:51 EST 2009


It cost me a while to analyze the cause of the following problem.

The symptom was that testing a local web app with twill was fast
on Python 2.3, but very slow on Python 2.4-2.6 on a Win XP box.

This boiled down to the problem that if you run a SimpleHTTPServer
for localhost like this,

   BaseHTTPServer.HTTPServer(('localhost', 8000),
       SimpleHTTPServer.SimpleHTTPRequestHandler).serve_forever()

and access it using httplib.HTTPConnection on the same host like this

   httplib.HTTPConnection('localhost', 8000).connect()

then this call is fast using Py 2.3, but slow with Py 2.4-2.6.

I found that this was caused by a mismatch of the ip version used
by SimpleHTTPServer and HTTPConnection for a "localhost" argument.

What actually happens is the following:

* BaseHTTPServer binds only to the IPv4 address of localhost, because
   it's based on TCPServer which has address_family=AF_INET by default.

* HTTPConnection.connect() however tries to connect to all IP addresses
   of localhost, in the order determined socket.getaddrinfo('localhost').

   With Py 2.3 (without IPv6 support) this is only the IPv4 address,
   but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address
   first, then the IPv4 address. Since the IPv6 address is checked first,
   this gives a timeout and causes the slow connect() call. The order by
   which getaddrinfo returns IPv4/v6 under Linux seems to vary depending
   on the glibc version, so it may be a problem on other platforms, too.

You can see the cause of the slow connect() like this:

   import httplib
   conn = httplib.HTTPConnection('localhost', 8000)
   conn.set_debuglevel(1)
   conn.connect()

This is what I get:

   connect: (localhost, 8000)
   connect fail: ('localhost', 8000)
   connect: (localhost, 8000)

The first (failing) connect is the attempt to connect to the IPv6
address which BaseHTTPServer doesn't listen to. (This is the debug
output of Py 2.5 which really should be improved to show the IP address
that is actually used. Unfortunately, in Py 2.6 the debug output when
connecting has even fallen prey to a refactoring. I think it should
either be added again or set_debuglevel() is now pretty meaningless.)

Can we do something about the mismatch that SimpleHTTPServer only serves
IPv4, but HTTPConnection tries to connect with IPv6 first?

I guess other people also stumbled over this, maybe without even
noticing and just wondering about the slow performance. E.g.:
http://schotime.net/blog/index.php/2008/05/27/slow-tcpclient-connection-sockets/

One possible solution would be to improve the TCPServer in the standard
lib so that it determines the address_family and real server_address
based on the first return value of socket.getaddrinfo, like this:

class TCPServer(BaseServer):
     ...

     def __init__(self, server_address, RequestHandlerClass):
         if server_address and len(server_address) == 2:
             (self.address_family, dummy, dummy, dummy,
                 server_address) = socket.getaddrinfo(*server_address)[0]
         else:
             raise TypeError("server_address must be a 2-tuple")
         BaseServer.__init__(self, server_address, RequestHandlerClass)
         ...

That way, if you either serve as or connect to 'localhost', you will
always consistently do this via IPv4 or IPv6, depending on what is
preferred on your platform.

Does this sound reasonable? Any better ideas?

-- Christoph



More information about the Python-list mailing list