python3 urlopen(...).read() returns bytes

Glenn G. Chappell glenn.chappell at gmail.com
Mon Dec 22 16:41:56 EST 2008


I just ran 2to3 on a py2.5 script that does pattern matching on the
text of a web page. The resulting script crashed, because when I did

    f = urllib.request.urlopen(url)
    text = f.read()

then "text" is a bytes object, not a string, and so I can't do a
regexp on it.

Of course, this is easy to patch: just do "f.read().decode()".
However, it strikes me as an obvious bug, which ought to be fixed.
That is, read() should return a string, as it did in py2.5.

But apparently others disagree? This was mentioned in issue 3930
( http://bugs.python.org/issue3930 ) back in September '08, but that
issue is now closed, apparently because consistent behavior was
achieved. But I figure consistently bad behavior is still bad.

This change breaks pretty much every Python program that opens a
webpage, doesn't it? 2to3 doesn't catch it, and, in any case, why
should read() return bytes, not string? Am I missing something?

By the way, I'm running Ubuntu 8.10. Doing "python3 --version" prints
"Python 3.0rc1+".



More information about the Python-list mailing list