[ python-Bugs-874842 ] httplib fails on Akamai URLs

Mon Apr 19 20:57:34 EDT 2004

Bugs item #874842, was opened at 2004-01-11 03:16
Message generated for change (Comment added) made by zwoop
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=874842&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Leif Hedstrom (zwoop)
Assigned to: Jeremy Hylton (jhylton)
Summary: httplib fails on Akamai URLs

Initial Comment:
Using Python 2.3.2 and httplib, reading from Akamai
URLs will always hang at the end of the transacation.
As common as this must be, I couldn't find anything
related to it on any search engines, nor on the bug
list here.

The problem is that Akamai returns an HTTP/1.0
response, with a header like:

   Connection: keep-alive

httplib does not recognize this response properly (the
Connection: header parsing is only done for HTTP/1.1
responses). I'm not sure exactly what the right
solution is, but I'm supplying one alternative solution
that does solve the problem. I'm attaching a diff
against httplib.py.

----------------------------------------------------------------------

>Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-19 17:57

Message:
Logged In: YES 
user_id=480913

As I said, no matter what we do, it's a hack on something
that's broken on the web (now there's a shocker :-). I don't
feel terribly strongly on this issue, I merely filed the bug
report because I had this problem, and it took me several
hours to figure out why my daemon would stall on Akamai
URLs. I'm guessing other users of httplib.py might run into
the same problem.

As for the patch, the comments would of course have to
change, I didn't want to impose more changes in the diff
than necessary.

Besides the suggested patch, an alternative solution is to
provide a specialized implementation of the HTTPResponse
class, which works with Akamai. The users of the httplib.py
module would then have to explicitly request that
httplib.HTTPConnection should instantiate that class instead
of the default one. Preferably this would be passed as a new
argument to the constructor for HTTPConnection.

And I agree that it's a hack to have to code around poor
server implementations. But not sure what our odds are to
get Akamai to fix their servers any time soon, since pretty
much any web browser in existance works with their broken
implementation.

Cheers,

-- leif

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-19 15:32

Message:
Logged In: YES 
user_id=6380

I won't reject the patch on that basis. Like HTML, it's more
useful to be able to handle what we see in the real world
than to stick to the standard. Clearly the OP needs to be
able to access Akamai servers. He doesn't have the power to
fix the Akamai servers,so saying "the server is wrong"
doesn't do him any good. (The comment should stateclearly
that Akamai *is* wrong though!)

Or do you have a different suggestion for how the poster can
work around the problem?

----------------------------------------------------------------------

Comment By: Greg Stein (gstein)
Date: 2004-04-19 15:26

Message:
Logged In: YES 
user_id=6501

I have a philosophical problem with compensating for servers
that obviously break protocols. The server should be fixed,
not *every* client on the planet. From that standpoint, this
problem/fix should be rejected, though I defer to Guido on
that choice.

That said, the comment right above the patch should be
fixed. The whole point of that comment is, "the header
shouldn't be there, so we shouldn't bother to examine the
thing." Obviously, the new code does, so the two comments
should be merged. The comment about Akamai should also be
strengthened to note that it is violating the HTTP protocol
(see section 8.1.2.1 of RFC 2616).

Summary: I'd reject it, but will leave that to Guido to
choose (i.e. "we'll help users even tho it violates
protocols"). If he wants it, then +1 if the comments are
fixed up.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2004-04-15 14:59

Message:
Logged In: YES 
user_id=31392

Looks good to me.  I want to see if I can come up with a
simple test module for httplib with the network resource
enabled.  I'll see if I can do that tonight.

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-12 13:54

Message:
Logged In: YES 
user_id=480913

Heh, yeah, I'm pretty sure that's the problem, Akamai being
confused about protocols. They claim to be a v1.0 HTTP
proxy, yet they use v1.1 HTTP headers :-/. This is why I
mentioned I wasn't sure exactly what the right solution is.
And no matter what we do, it'll be a hack.  Maybe the
original author of the module has some insight ?

Unfortunately, there's a lot of Akamai content out there
that are affected by this.

Cheers,

-- Leif

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-12 13:32

Message:
Logged In: YES 
user_id=6380

Hmm...  Indeed. read() checks will_close and apparently
setting that to False will do the right thing.

I don't know HTTP and this code well enough to approve this
fix though. Also, the comment right above your patch should
probably be fixed; it claims that connection headers on
HTTP/1.0 are due to confused proxies. (Maybe that's what
Akamai servers are? :-)

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-04-12 13:13

Message:
Logged In: YES 
user_id=480913

Yeah, that works for me to. But the problem is in the
HTTPResponse class from the httplib.py module. For example,
this code (butchered from my application) will hang on
Akamai URLs:

#!/usr/bin/python

import httplib

def testHTTPlib(host, url):
    http = httplib.HTTPConnection(host)
    try:
        http.request('GET', url)
        response = http.getresponse()
    except IOError:
        self._log.warning("Can't connect to %s", url)
        return False
    except socket.error:
        self._log.error("Socket error retrieving %s", url)
        return False
    except socket.timeout:
        self._log.warning("Timeout connecting to %s", url)
        return False
    else:
        try:
            data = response.read()
            return True
        except socket.timeout:
            self._log.warning("Timeout reading from %s", url)
            return False
    return False

print testHTTPlib("www.ogre.com", "/")
print testHTTPlib("www.akamai.com", "/")

Granted, I think Akamai aren't strictly following the
protocols, but it's inconvenient that this piece of code
stalls here (and only for akamai.com domains, I've tried a
lot of them).

Thanks!

-- Leif

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2004-04-12 12:36

Message:
Logged In: YES 
user_id=6380

Can you give a complete program that reproduces this? I've 
tried this:

>>> import urllib
>>> urllib.urlopen("http://www.akamai.com").read()

and it doesn't hang for me. I tried a number of Python 
versions from 2.2 through 2.4a0.

----------------------------------------------------------------------

Comment By: Leif Hedstrom (zwoop)
Date: 2004-01-11 11:37

Message:
Logged In: YES 
user_id=480913

Oh, I forgot, this is easiest reproduced by simple
requesting the URL

   http://www.akamai.com/

Fortunately they Akamai their home page as well. :-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=874842&group_id=5470