harmful str(bytes)
Hallvard B Furuseth
h.b.furuseth at usit.uio.no
Thu Oct 7 17:33:35 EDT 2010
I've been playing a bit with Python3.2a2, and frankly its charset
handling looks _less_ safe than in Python 2.
The offender is bytes.__str__: str(b'foo') == "b'foo'".
It's often not clear from looking at a piece of code whether
some data is treated as strings or bytes, particularly when
translating from old code. Which means one cannot see from
context if str(s) or "%s" % s will produce garbage.
With 2.<late> conversion Unicode <-> string the equivalent operation did
not silently produce garbage: it raised UnicodeError instead. With old
raw Python strings that was not a problem in applications which did not
need to convert any charsets, with python3 they can break.
I really wish bytes.__str__ would at least by default fail.
--
Hallvard
More information about the Python-list
mailing list