Another å, ä, ö question

Thu Sep 22 08:27:10 EDT 2016

Martin Schöön wrote:

> Den 2016-09-20 skrev Peter Otten <__peter__ at web.de>:
>> Martin Schöön wrote:
>>
>>> Den 2016-09-19 skrev Christian Gollwitzer <auriocus at gmx.de>:
>>>> Am 19.09.16 um 22:21 schrieb Martin Schöön:
>>>>> I am studying some of these tutorials:
>>>>> https://pythonprogramming.net/matplotlib-intro-tutorial/
> <snip>
>>>>
>>>> Assuming that you use UTF-8 (you should check with an emacs expert, I
>>>> am not an emacs user), try putting the header
>>>>
>>>> #!/usr/bin/python
>>>> # -*- coding: utf-8 -*-
>>>>
>>>> on top of your files.
>>>>
>>> I already have this and since it doesn't work from command line
>>> either it can't be an emacs unique problem.
>>> 
>>
>> Are all non-ascii strings unicode? I. e.
>>
>> u"Schöön" rather than just "Schöön"
>>
>> ?
> 
> I am not sure I answer your questions since I am not quite sure I
> understand it but here goes: The complete file is UTF-8 encoded as
> that is how Geany is set up to create new files (I just checked).

When the encoding used for the file and the encoding used by the terminal 
differ the output of non-ascii characters gets messed up. Example script:

# -*- coding: iso-8859-15 -*-

print "first unicode:"
print u"Schöön"

print "then bytes:"
print "Schöön"

When I dump that in my UTF-8 terminal all "ö"s are lost because it gets the 
invalid byte sequence b"\xf6" rather than the required b"\xc3\xb6":

$ cat demo.py
# -*- coding: iso-8859-15 -*-

print "first unicode:"
print u"Sch��n"

print "then bytes:"
print "Sch��n"

But when I run the code:

$ python demo.py
first unicode:
Schöön
then bytes:
Sch��n

There are other advantages, too:

>>> print "Schöön".upper()
SCHööN
>>> print u"Schöön".upper()
SCHÖÖN
>>> print "Schöön"[:4]
Sch�
>>> print u"Schöön"[:4]
Schö