Python 3.2 has some deadly infection

Robin Becker robin at reportlab.com
Mon Jun 2 07:10:48 EDT 2014


............
>
> I probably should have mentioned it, but in my case it's not even Python
> (Java). It's exactly the same principal - an assumption was made that has
> become entrenched due to the fear of breakage. If they'd been forced to
> think about encodings up-front, it shouldn't have been an issue, which was
> the point I was trying to make.
>
there seems to be an implicit assumption in python land that encoded strings are 
the norm. On virtually every computer I encounter that assumption is wrong. The 
vast majority of bytes in most computers is not something that can be easily 
printed out for humans to read. I suppose some clever pythonista can figure out 
an encoding to read my .o / .so etc  files, but they are practically meaningless 
to a unicode program today. Same goes for most image formats and media files. 
Browsers routinely encounter mis/un-encoded pages.

> In Java, it's much worse. At least with Python you can perform string-like
> operations on bytes. In Java you have to convert it to characters before
> you can really do anything with it, so people just use the default encoding
> all the time - especially if they want the convenience of line-by-line
> reading using BufferedReader ...
..


In python I would have preferred for bytes to remain the default io mechanism, 
at least that would allow me to decide if I need any decoding.

As the cat example

http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

showed these extra assumptions are sometimes really in the way.
-- 
Robin Becker



More information about the Python-list mailing list