[ python-Bugs-874842 ] httplib fails on Akamai URLs

Mon Apr 12 16:32:47 EDT 2004

Bugs item #874842, was opened at 2004-01-11 06:16
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=874842&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Leif Hedstrom (zwoop)
Assigned to: Nobody/Anonymous (nobody)
Summary: httplib fails on Akamai URLs

Initial Comment:
Using Python 2.3.2 and httplib, reading from Akamai
URLs will always hang at the end of the transacation.
As common as this must be, I couldn't find anything
related to it on any search engines, nor on the bug
list here.

The problem is that Akamai returns an HTTP/1.0
response, with a header like:

   Connection: keep-alive

httplib does not recognize this response properly (the
Connection: header parsing is only done for HTTP/1.1
responses). I'm not sure exactly what the right
solution is, but I'm supplying one alternative solution
that does solve the problem. I'm attaching a diff
against httplib.py.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-12 16:32

Message:
Logged In: YES 
user_id=6380

Hmm...  Indeed. read() checks will_close and apparently
setting that to False will do the right thing.

I don't know HTTP and this code well enough to approve this
fix though. Also, the comment right above your patch should
probably be fixed; it claims that connection headers on
HTTP/1.0 are due to confused proxies. (Maybe that's what
Akamai servers are? :-)

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-12 16:13

Message:
Logged In: YES 
user_id=480913

Yeah, that works for me to. But the problem is in the
HTTPResponse class from the httplib.py module. For example,
this code (butchered from my application) will hang on
Akamai URLs:

#!/usr/bin/python

import httplib

def testHTTPlib(host, url):
    http = httplib.HTTPConnection(host)
    try:
        http.request('GET', url)
        response = http.getresponse()
    except IOError:
        self._log.warning("Can't connect to %s", url)
        return False
    except socket.error:
        self._log.error("Socket error retrieving %s", url)
        return False
    except socket.timeout:
        self._log.warning("Timeout connecting to %s", url)
        return False
    else:
        try:
            data = response.read()
            return True
        except socket.timeout:
            self._log.warning("Timeout reading from %s", url)
            return False
    return False

print testHTTPlib("www.ogre.com", "/")
print testHTTPlib("www.akamai.com", "/")

Granted, I think Akamai aren't strictly following the
protocols, but it's inconvenient that this piece of code
stalls here (and only for akamai.com domains, I've tried a
lot of them).

Thanks!

-- Leif

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-12 15:36

Message:
Logged In: YES 
user_id=6380

Can you give a complete program that reproduces this? I've 
tried this:

>>> import urllib
>>> urllib.urlopen("http://www.akamai.com").read()

and it doesn't hang for me. I tried a number of Python 
versions from 2.2 through 2.4a0.

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-01-11 14:37

Message:
Logged In: YES 
user_id=480913

Oh, I forgot, this is easiest reproduced by simple
requesting the URL

   http://www.akamai.com/

Fortunately they Akamai their home page as well. :-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=874842&group_id=5470