Trying to understand this moji-bake

Cameron Simpson cs at zip.com.au
Sat Jan 25 00:08:23 EST 2014


On 25Jan2014 04:37, Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> I have an unexpected display error when dealing with Unicode strings, and 
> I cannot understand where the error is occurring. I suspect it's not 
> actually a Python issue, but I thought I'd ask here to start.
> 
> Using Python 3.3, if I print a unicode string from the command line, it 
> displays correctly. I'm using the KDE 3.5 Konsole application, with the 
> encoding set to the default (which ought to be UTF-8, I believe, although 
> I'm not completely sure).

There are at least 2 layers: the encoding python is using for
transcription to the terminal and the decoding the terminal is
making of the byte stream to decide what to display.

The former can be printed with:

  import sys
  print(sys.stdout.encoding)

The latter depends on your desktop settings and KDE settings I
guess. I would hope the Konsole will decide based on your environment
settings. Running the shell command:

  locale

will print the settings derived from that. Provided your environment
matches that which invoked the Konsole, that should be informative.

But I expect the Konsole is decoding using UTF-8 because so much
else works for you already.

I would point out that you could perhaps debug with something like this:

  python2.7 ..... | od -c

which will print the output bytes. By printing to the terminal,
you're letting the terminal's decoding get in your way. It is fine
for seeing correct/incorrect results, but not so fine for seeing
the bytes causing them.

> This displays correctly:
> [steve at ando ~]$ python3.3 -c "print(u'ñøλπйж')"
> ñøλπйж
> 
> 
> Likewise for Python 3.2:
> [steve at ando ~]$ python3.2 -c "print('ñøλπйж')"
> ñøλπйж
> 
> But using Python 2.7, I get a really bad case of moji-bake:
> [steve at ando ~]$ python2.7 -c "print u'ñøλπйж'"
> ñøλÏйж
> 
> However, interactively it works fine:
[...]

Debug by printing sys.stdout.encoding at this point.

I do recall getting different output encodings depending on how
Python was invoked; I forget the pattern, but I also remember writing
some ghastly hack to work around it, which I can't find at the
moment...

Also see "man python2.7" in particular the PYTHONIOENCODING environment
variable. That might let you exert more control.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

ASCII  n s. [from the greek]  Those people who, at certain times of the year,
have no shadow at noon; such are the inhabitatants of the torrid zone.
        - 1837 copy of Johnson's Dictionary



More information about the Python-list mailing list