Trying to understand this moji-bake

Peter Otten __peter__ at web.de
Sat Jan 25 03:56:09 EST 2014


Steven D'Aprano wrote:

> I have an unexpected display error when dealing with Unicode strings, and
> I cannot understand where the error is occurring. I suspect it's not
> actually a Python issue, but I thought I'd ask here to start.

I suppose it is a Python issue -- where Python fails to guess an encoding it 
usually falls back to ascii.

> But using Python 2.7, I get a really bad case of moji-bake:
> 
> [steve at ando ~]$ python2.7 -c "print u'ñøλπйж'"
> ñøλÏйж
> 
> 
> However, interactively it works fine:
> 
> [steve at ando ~]$ python2.7 -E
> Python 2.7.2 (default, May 18 2012, 18:25:10)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> print u'ñøλπйж'
> ñøλπйж

You can provoke it with exec:

>>> exec "print u'ñøλπйж'"
ñøλÏйж
>>> exec u"print u'ñøλπйж'"
ñøλπйж
>>> exec "# -*- coding: utf-8 -*-\nprint u'ñøλπйж'"
ñøλπйж

> This occurs on at least two different machines, one using Centos and the
> other Debian.
> 
> Anyone have any idea what's going on? I can replicate the display error
> using Python 3 like this:
> 
> py> s = 'ñøλπйж'
> py> print(s.encode('utf-8').decode('latin-1'))
> ñøλÏйж
> 
> but I'm not sure why it's happening at the command line. Anyone have any
> ideas?

It is probably burried in the C code -- after a few indirections I lost 
track :(




More information about the Python-list mailing list