[Python-Dev] how to debug httplib slowness

Fri Sep 4 17:02:39 CEST 2009

Simon Cross wrote:
> Well, since the source for _read_chunked includes the comment
> 
>         # XXX This accumulates chunks by repeated string concatenation,
>         # which is not efficient as the number or size of chunks gets big.
> 
> you might gain some speed improvement with minimal effort by gathering
> the read data chunks into a list and then returning "".join(chunks) at
> the end.

True, I'll be trying that and reporting back, but, more interestingly, I 
did some analysis with wireshark (only 200MB-odd of .pcap logs that was 
fun ;-) to see the differences in the http conversation and noticed more 
interestingness...

So, httplib does this:

GET /<blah> HTTP/1.1
Host: <blah>
Accept-Encoding: identity
Authorization: Basic <blah>

HTTP/1.1 200 OK
Date: Fri, 04 Sep 2009 11:44:22 GMT
Server: Apache-Coyote/1.1
ContentLength: 116245504
Content-Type: application/vnd.excel
Transfer-Encoding: chunked

While wget does this:

<snip 401 conversation>
GET /<blah> HTTP/1.0
User-Agent: Wget/1.11.4
Accept: */*
Host: <blah>
Connection: Keep-Alive
Authorization: Basic <blah>

HTTP/1.1 200 OK
Date: Fri, 04 Sep 2009 11:35:19 GMT
Server: Apache-Coyote/1.1
ContentLength: 116245504
Content-Type: application/vnd.excel
Connection: close

Interesting points:

- Apache in this instance responds with HTTP 1.1, even though the wget 
request was 1.0, is that allowed?

- Apache responds with a chunked response only to httplib. Why is that?

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
            - http://www.simplistix.co.uk