[issue8035] urllib.request.urlretrieve hangs
Charles-Francois Natali
report at bugs.python.org
Sun Apr 4 23:01:25 CEST 2010
Charles-Francois Natali <neologix at free.fr> added the comment:
Alright, what happens is the following:
- the file you're trying to retrieve is actually redirected, so the server send a HTTP/1.X 302 Moved Temporarily
- in urllib, when we get a redirection, we call redirect_internal:
def redirect_internal(self, url, fp, errcode, errmsg, headers, data):
if 'location' in headers:
newurl = headers['location']
elif 'uri' in headers:
newurl = headers['uri']
else:
return
void = fp.read()
fp.close()
# In case the server sent a relative URL, join with original:
newurl = basejoin(self.type + ":" + url, newurl)
return self.open(newurl)
the fp.read() is there to wait for the remote end to close connection
The problem, in this case, is that with Python 3.1, httplib uses HTTP/1.1 instead of HTTP/1.0 in version 2.6, and with HTTP/1.1 the server doesn't close the connection after sending the redirect (shown by tcpdump).
So, the process remains stuck on fp.read().
Now, in version 3.1, if we simply change Lib/http/client.py:628
from
class HTTPConnection:
_http_vsn = 11
_http_vsn_str = 'HTTP/1.1'
to
class HTTPConnection:
_http_vsn = 11
_http_vsn_str = 'HTTP/1.0'
to use HTTP/1.0 instead, the retrieval works fine.
Obviously, this is not a good solution. Since the RFC doesn't seem to require the server to close the connection after sending a redirect, we'd probably better close the connection ourselves.
That's what the attached patch does, it simply removes the call to fp.read() before closing the connection. It also removes this for http_error_default, since if an error occurs, we probably want to close the connection as soon as possible instead of waiting for server to do so.
----------
keywords: +patch
nosy: +neologix
Added file: http://bugs.python.org/file16758/urllib_redirect.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8035>
_______________________________________
More information about the Python-bugs-list
mailing list