harmful str(bytes)

Hallvard B Furuseth h.b.furuseth at usit.uio.no
Thu Oct 7 17:33:35 EDT 2010


I've been playing a bit with Python3.2a2, and frankly its charset
handling looks _less_ safe than in Python 2.

The offender is bytes.__str__: str(b'foo') == "b'foo'".
It's often not clear from looking at a piece of code whether
some data is treated as strings or bytes, particularly when
translating from old code.  Which means one cannot see from
context if str(s) or "%s" % s will produce garbage.

With 2.<late> conversion Unicode <-> string the equivalent operation did
not silently produce garbage: it raised UnicodeError instead.  With old
raw Python strings that was not a problem in applications which did not
need to convert any charsets, with python3 they can break.

I really wish bytes.__str__ would at least by default fail.

-- 
Hallvard



More information about the Python-list mailing list