[ python-Bugs-1580738 ] httplib hangs reading too much data
SourceForge.net
noreply at sourceforge.net
Sat Oct 28 00:53:51 CEST 2006
Bugs item #1580738, was opened at 2006-10-19 14:06
Message generated for change (Comment added) made by djmitche
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1580738&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dustin J. Mitchell (djmitche)
Assigned to: Nobody/Anonymous (nobody)
Summary: httplib hangs reading too much data
Initial Comment:
I'm building an interface to Amazon's S3, using httplib. It uses a
single object for multiple transactions. What's happening is this:
HTTP > PUT /unitest-temp-1161039691 HTTP/1.1
HTTP > Date: Mon, 16 Oct 2006 23:01:32 GMT
HTTP > Authorization: AWS <<cough>>:KiTWRuq/
6aay0bI2J5DkE2TAWD0=
HTTP > (end headers)
HTTP < HTTP/1.1 200 OK
HTTP < content-length: 0
HTTP < x-amz-id-2: 40uQn0OCpTiFcX+LqjMuzG6NnufdUk/..
HTTP < server: AmazonS3
HTTP < x-amz-request-id: FF504E8FD1B86F8C
HTTP < location: /unitest-temp-1161039691
HTTP < date: Mon, 16 Oct 2006 23:01:33 GMT
HTTPConnection.__state before response.read: Idle
HTTPConnection.__response: closed? False length: 0
reading response
HTTPConnection.__state after response.read: Idle
HTTPConnection.__response: closed? False length: 0
..later in the same connection..
HTTPConnection.__state before putrequest: Idle
HTTPConnection.__response: closed? False length: 0
HTTP > DELETE /unitest-temp-1161039691 HTTP/1.1
HTTP > Date: Mon, 16 Oct 2006 23:01:33 GMT
HTTP > Authorization: AWS <<cough>>:
a5OizuLNwwV7eBUhha0B6rEJ+CQ=
HTTP > (end headers)
HTTPConnection.__state before getresponse: Request-sent
HTTPConnection.__response: closed? False length: 0
File "/usr/lib64/python2.4/httplib.py", line 856, in getresponse
raise ResponseNotReady()
If the first request does not precede it, the second request is fine.
To avoid excessive memory use, I'm calling request.read(16384)
repeatedly, instead of just calling request.read(). This seems to be
key to the problem -- if I omit the 'amt' argument to read(), then the
last line of the first request reads
HTTPConnection.__response: closed? True length: 0
and the later call to getresponse() doesn't raise ResponseNotReady.
Looking at the source for httplib.HTTPResponse.read, self.close() gets
called in the latter (working) case, but not in the former
(non-working). It would seem sensible to add 'if self.length == 0:
self.close()' to the end of that function (and, in fact, this change makes
the whole thing work), but this comment makes me hesitant:
# we do not use _safe_read() here because this may be a .will_close
# connection, and the user is reading more bytes than will be provided
# (for example, reading in 1k chunks)
I suspect that either (a) this is a bug or (b) the client is supposed to
either call read() with no arguments or calculate the proper inputs to
read(amt) based on the Content-Length header. If (b), I would think
the docs should be updated to reflect that?
Thanks for any assistance.
----------------------------------------------------------------------
>Comment By: Dustin J. Mitchell (djmitche)
Date: 2006-10-27 17:53
Message:
Logged In: YES
user_id=7446
Excellent -- the first paragraph, where you talk about the .length attribute, makes things quite clear, so I agree that (b) is the correct solution: include the content of that
paragraph in the documentation. Thanks!
----------------------------------------------------------------------
Comment By: Mark Hammond (mhammond)
Date: 2006-10-26 21:21
Message:
Logged In: YES
user_id=14198
The correct answer is indeed (b) - but note that httplib
will itself do the content-length magic for you, including
the correct handling of 'chunked' encoding. If the .length
attribute is not None, then that is exactly how many bytes
you should read. If .length is None, then either chunked
encoding is used (in which case you can call read() with a
fixed size until it returns an empty string), or no
content-length was supplied (which can be treated the same
as chunked, but the connection will close at the end.
Checking ob.will_close can give you some insight into that.
Its not reasonable to add 'if self.length==0: self.close()'
- it is perfectly valid to have a zero byte response within
a keep-alive connection - we don't want to force a new
(expensive) connection to the server just because a zero
byte response was requested.
The HTTP semantics are hard to get your head around, but I
believe httplib gets it right, and a ResponseNotReady
exception in particular points at an error in the code
attempting to use the library. Working with connections
that keep alive is tricky - you just jump through hoops to
ensure you maintain the state of the httplib object
correctly - in general, that means you must *always* consume
the entire response (even if it is zero bytes) before
attempting to begin a new request. This requirement doesn't
exists for connections that close - if you fail to read the
entire response it can be thrown away as the next request
will happen on a new connection.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1580738&group_id=5470
More information about the Python-bugs-list
mailing list