[ python-Bugs-1486335 ] httplib: read/_read_chunked failes with ValueError sometime

Wed Mar 14 16:39:02 CET 2007

Bugs item #1486335, was opened at 2006-05-11 04:14
Message generated for change (Comment added) made by altman
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1486335&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: kxroberto (kxroberto)
Assigned to: Greg Ward (gward)
Summary: httplib: read/_read_chunked failes with ValueError sometime

Initial Comment:
This occasionally shows up in a logged trace, when a
application crahes on ValueError on a
http(s)_response.read() :

(py2.3.5 - yet relevant httplib code is still the same
in current httplib) 

.... \'  File "socket.pyo", line 283,
in read\\n\', \'  File "httplib.pyo", line 389, in
read\\n\', \'  File "httplib.pyo", line 426, in
_read_chunked\\n\', \'ValueError: invalid literal for
int(): \\n\']  :::

its the line:

chunk_left = int(line, 16)

Don't know what this line is about. Yet, that should be
protected, as a http_response.read() should not fail
with ValueError, but only with
IOError/EnvironmentError, socket.error - otherwise
Error Exception handling becomes a random task.

-Robert

Side note regarding IO exception handling: See also FR
#1481036 (IOBaseError): why socket.error.__bases__ is
(<class exceptions.Exception at 0x011244E0>,)  ?

----------------------------------------------------------------------

Comment By: Patrick Altman (altman)
Date: 2007-03-14 10:39

Message:
Logged In: YES 
user_id=405010
Originator: NO

I am attempting to use a HEAD request against Amazon S3 to check
whether a file exists or not and if it does parse the md5 hash from
the ETag in the response to verify the contents of the file so as to
save on bandwidth of uploading files when it is not necessary.

If the file exist, the HEAD works as expected and I get valid headers
back that I can parse and pull the ETag out of the dictionary using
getheader('ETag')[1:-1] (using the slice to trim off the double-quotes
in the string.

The problem lies when I attempt to send a HEAD request when no file
exists.   As expected, a 404 Not Found response is sent back from
Amazon however, my test scripts seem to hang.  I run python with
trace.py and it hangs here:

 --- modulename: httplib, funcname: _read_chunked
httplib.py(536):         assert self.chunked != _UNKNOWN
httplib.py(537):         chunk_left = self.chunk_left
httplib.py(538):         value = ''
httplib.py(542):         while True:
httplib.py(543):             if chunk_left is None:
httplib.py(544):                 line = self.fp.readline()
 --- modulename: socket, funcname: readline
socket.py(321):         data = self._rbuf
socket.py(322):         if size < 0:
socket.py(324):             if self._rbufsize <= 1:
socket.py(326):                 assert data == ""
socket.py(327):                 buffers = []
socket.py(328):                 recv = self._sock.recv
socket.py(329):                 while data != "\n":
socket.py(330):                     data = recv(1)

It eventually completes with an exception here:

  File "C:\Python25\lib\httplib.py", line 509, in read
    return self._read_chunked(amt)
  File "C:\Python25\lib\httplib.py", line 548, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''

For reference, ethereal captured the following request and response:

HEAD <REMOVED> HTTP/1.1
Host: s3.amazonaws.com
Accept-Encoding: identity
Date: Tue, 13 Mar 2007 02:54:12 GMT
Authorization: AWS <REMOVED>

HTTP/1.1 404 Not Found
x-amz-request-id: E20B4C0D0C48B2EF
x-amz-id-2: <REMOVED>
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Tue, 13 Mar 2007 02:54:16 GMT
Server: AmazonS3 

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-08-07 19:23

Message:
Logged In: YES 
user_id=261020

I think it's only worth worrying about bad chunking that a)
has been observed in the wild (though not necessarily by us)
and b) popular browsers can cope with.

Greg: """If there is an error here, it's at EOF, so it's not
that big a deal."""

That's only if the response will be closed at the end of the
current transaction.  Quoting from 1411097:

"""if the connection will not close at the end of the
transaction, the behaviour should not change from what's
currently in SVN (we should not assume that the chunked
response has ended unless we see the proper terminating
CRLF)."""

Perhaps we don't need to be quite as strict as that, but the
point is that otherwise, how do we know the server hasn't
already sent that last CRLF, and that it will turn up in
three weeks' time?-)  If that happens, not sure exactly how
httplib will treat the CRLF and possible chunked encoding
trailers, but I suspect something bad happens.  Perhaps we
could just always close the connection in this case?

I'm not confident I know yet how best to fix these issues. 
I just tried reading curl's transfer.c and http_chunks.c.  I
discovered only that I have to be fully awake to read a 1200
line function :-/

----------------------------------------------------------------------

Comment By: Greg Ward (gward)
Date: 2006-07-25 21:13

Message:
Logged In: YES 
user_id=14422

OK, I've been working on this some more and I have a very
crude addition to test_httplib.py.  I'm going to attach it
here and solicit feedback on python-dev: I'm not sure how
many kinds of bad response chunking I really want to worry
about.  

----------------------------------------------------------------------

Comment By: Greg Ward (gward)
Date: 2006-07-24 14:38

Message:
Logged In: YES 
user_id=14422

I'm seeing this with Python 2.3.5 and 2.4.3 hitting a PHP
app and getting a large error page.  It looks as though the
server is incorrectly chunking the response: lwp-request at
least gives a better error message than httplib.py:

  $ GET "http://..."
  500 EOF when chunk header expected

I'm unclear on precisely what the server is doing wrong. 
The response looks like this:

HTTP/1.1 200 OK
Date: Mon, 24 Jul 2006 19:18:47 GMT
Server: Apache/2.0.54 (Fedora)
X-Powered-By: PHP/4.3.11
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

2169\r\n
\r\n
[...first 0x2169 bytes of response...]\r\n
20b2\r\n
[...next 0x20b2 bytes...]
[...repeat many times...]
20b2\r\n
[...the last 0x20b2 bytes...]
\r\n

The blank line at eof appears to be confusing httplib.py: it
bombs because 

  int('', 16)

raises ValueError.

Observation #1: if this is indeed a protocol error (ie. the
server is in the wrong), httplib.py should turn the
ValueError into an HTTPException.  Perhaps it should define
a new exception class for low-level protocol errors (bad
chunking).  Maybe it should reuse IncompleteRead.

Observation #2: gee, my web browser doesn't barf on this
response, so why should httplib.py?  If there is an error
here, it's at EOF, so it's not that big a deal.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1486335&group_id=5470