Why such different HTTP response results between 2.5 and 3.0

Brian Allen Vanderburg II BrianVanderburg2 at aim.com
Mon Feb 2 01:37:13 EST 2009


an00na at gmail.com wrote:
> Below are two semantically same snippets for querying the same partial
> HTTP response, for Python2.5 and Python 3.0 respectively.
> However, the 3.0 version returns a not-so-right result(msg) which is a
> bytes of length 239775, while the 2.5 version returns a good msg which
> is a 239733 byte-long string that is the content of a proper zip file.
> I really can't figure out what's wrong, thought I've sought out some
> "\r\n" segments in msg 3.0 that is absent in msg 2.5.
> So are there anyone could give me some hints? Thanks in advance.
>
> Code:
>
> # Python 2.5
> import urllib2
> auth_handler = urllib2.HTTPBasicAuthHandler()
> auth_handler.add_password(realm="pluses and minuses",
>                           uri='http://www.pythonchallenge.com/pc/hex/
> unreal.jpg',
>                           user='butter',
>                           passwd='fly')
> opener = urllib2.build_opener(auth_handler)
>
> req = urllib2.Request('http://www.pythonchallenge.com/pc/hex/
> unreal.jpg')
> req.add_header('Range', 'bytes=1152983631-')
> res = opener.open(req)
> msg = res.read()
>
> # Python 3.0
> import urllib.request
> auth_handler = urllib.request.HTTPBasicAuthHandler()
> auth_handler.add_password(realm="pluses and minuses",
>                           uri='http://www.pythonchallenge.com/pc/hex/
> unreal.jpg',
>                           user='butter',
>                           passwd='fly')
> opener = urllib.request.build_opener(auth_handler)
>
> req = urllib.request.Request('http://www.pythonchallenge.com/pc/hex/
> unreal.jpg')
> req.add_header('Range', 'bytes=1152983631-')
> res = opener.open(req)
> msg = res.read()
> --
> http://mail.python.org/mailman/listinfo/python-list
>   
 From what I can tell, Python 2.5 returns the request automatically 
decoded as text.  Python 3.0 returns a bytes object and doesn't decode 
it at all.  I did a test with urlopen:

In 2.5 for http://google.com just get the regular HTML
In 3.0 I get some extras at the start and end:

    191d\r\n at the start
    \r\n0\r\n\r\n at the end

In 2.5, newlines are automatically decoded
In 3.0, the \r\n pairs are kept

I hope their is an easy way to decode it as it was in 2.x

Brian Vanderburg II



More information about the Python-list mailing list