Python 3.2 has some deadly infection

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Mon Jun 2 03:45:56 EDT 2014


Tim Delaney <timothy.c.delaney <at> gmail.com> writes:

> 
> I also should have been more clear that *in the particular situation I was
talking about* iso-latin-1 as default would be the right thing to do, not in
the general case. Quite often we won't know the correct encoding until we've
executed a command via ssh - iso-latin-1 will allow us to extract the info
we need (which will generally be 7-bit ASCII) without the possibility of an
invalid encoding. Sure we may get mojibake, but that's better than the
alternative when we don't yet know the correct encoding.
>  
> Latin-1 is one of those legacy encodings which needs to die, not to be
> entrenched as the default. My terminal uses UTF-8 by default (as
itshould), and if I use the terminal to input "δжç", Python ought to seewhat
I input, not Latin-1 moji-bake.
> 
> 
> For some purposes, there needs to be a way to treat an arbitrary stream of
bytes as an arbitrary stream of 8-bit characters. iso-latin-1 is a
convenient way to do that.
> 

For that purpose, Python3 has the bytes() type. Read the data as is, then
decode it to a string once you figured out its encoding.

Wolfgang






More information about the Python-list mailing list