[issue12576] urlib.request fails to open some sites

Mon Jul 18 11:00:39 CEST 2011

STINNER Victor <victor.stinner at haypocalc.com> added the comment:

h.close() (HTTPConnection.close) in the finally block of AbstractHTTPHandler.do_open() calls indirectly r.close() (HTTPResponse.close). The problem is that the content of the response cannot be read if its close() method was called.

The changelog of the fix (commit ad6bdfd7dd4b) is: "Issue #12133: AbstractHTTPHandler.do_open() of urllib.request closes the HTTP connection if its getresponse() method fails with a socket error. Patch written by Ezio Melotti."

The HTTP connection is not only closed in case of an error, but it is always closed.

It's a bug because we cannot read the content of www.imdb.com, whereas it works without the commit. Test script:
---------------
import urllib.request, gc

print("python.org")
with urllib.request.urlopen("http://www.python.org/") as page:
    content = page.read()
    print("content: %s..." % content[:40])
gc.collect()

print("imdb.com")
with urllib.request.urlopen("http://www.imdb.com/") as page:
    content = page.read()
    print("content: %s..." % content[:40])
gc.collect()

print("exit")
---------------

----------
nosy: +haypo

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12576>
_______________________________________