Re: Another å, ä, ö question

Thu Sep 22 11:32:48 EDT 2016

On Thu, Sep 22, 2016 at 10:27 PM, Peter Otten <__peter__ at web.de> wrote:
> When the encoding used for the file and the encoding used by the terminal
> differ the output of non-ascii characters gets messed up. Example script:
>
> # -*- coding: iso-8859-15 -*-
>
> print "first unicode:"
> print u"Schöön"
>
> print "then bytes:"
> print "Schöön"
>
> When I dump that in my UTF-8 terminal all "ö"s are lost because it gets the
> invalid byte sequence b"\xf6" rather than the required b"\xc3\xb6":
>
> $ cat demo.py
> # -*- coding: iso-8859-15 -*-
>
> print "first unicode:"
> print u"Sch��n"
>
> print "then bytes:"
> print "Sch��n"
>
> But when I run the code:
>
> $ python demo.py
> first unicode:
> Schöön
> then bytes:
> Sch��n

What this really means is that you (almost certainly) shouldn't be
storing non-ASCII text in byte strings. Most stuff will "just work" if
you're using a Unicode string (obviously cat doesn't acknowledge the
coding cookie, but Python itself does, as do a number of editors), and
of course, you can avoid all the u"..." prefixes by going to Py3.
Trying to use text in byte strings is extremely encoding-dependent,
and thus dangerous. Sure, it'll generally work for ASCII... but only
because you're highly likely to have your terminal set to an
ASCII-compatible encoding. If you pick something else.... you're in
for a whole new world of fun. Acres of entertainment.

ChrisA