Trying to understand this moji-bake

wxjmfauth at gmail.com wxjmfauth at gmail.com
Sat Jan 25 04:24:06 EST 2014


Le samedi 25 janvier 2014 05:37:34 UTC+1, Steven D'Aprano a écrit :
> I have an unexpected display error when dealing with Unicode strings, and 
> 
> I cannot understand where the error is occurring. I suspect it's not 
> 
> actually a Python issue, but I thought I'd ask here to start.
> 
> 
> 
> Using Python 3.3, if I print a unicode string from the command line, it 
> 
> displays correctly. I'm using the KDE 3.5 Konsole application, with the 
> 
> encoding set to the default (which ought to be UTF-8, I believe, although 
> 
> I'm not completely sure). This displays correctly:
> 
> 
> 
> [steve at ando ~]$ python3.3 -c "print(u'ñøλπйж')"
> 
> ñøλπйж
> 
> 
> 
> 
> 
> Likewise for Python 3.2:
> 
> 
> 
> [steve at ando ~]$ python3.2 -c "print('ñøλπйж')"
> 
> ñøλπйж
> 
> 
> 
> 
> 
> But using Python 2.7, I get a really bad case of moji-bake:
> 
> 
> 
> [steve at ando ~]$ python2.7 -c "print u'ñøλπйж'"
> 
> ñøλÏйж
> 
> 
> 
> 
> 
> However, interactively it works fine:
> 
> 
> 
> [steve at ando ~]$ python2.7 -E
> 
> Python 2.7.2 (default, May 18 2012, 18:25:10)
> 
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux2
> 
> Type "help", "copyright", "credits" or "license" for more information.
> 
> >>> print u'ñøλπйж'
> 
> ñøλπйж
> 
> 
> 
> 
> 
> This occurs on at least two different machines, one using Centos and the 
> 
> other Debian.
> 
> 
> 
> Anyone have any idea what's going on? I can replicate the display error 
> 
> using Python 3 like this:
> 
> 
> 
> py> s = 'ñøλπйж'
> 
> py> print(s.encode('utf-8').decode('latin-1'))
> 
> ñøλÏйж
> 
> 
> 
> but I'm not sure why it's happening at the command line. Anyone have any 
> 
> ideas?
> 
> 
> 

The basic problem is neither Python, nor the system (OS), nor
the terminal, nor the GUI console. The basic problem is that
all these elements [*] are not "speaking" the same language.

The second problem lies in Python itsself. Python attempts
to solve this problem by doing its own "cooking" based on the
elements, I pointed above [*], with the side effect the
situation may just become more confused and/or just not properly
working (sys.std***.encoding, print, GUI/terminal, souce
coding, ...)

The third problem is more *x specific. In many cases,
the Python "distribution" is tweaked in such a way to
make it working on a specific *x-version/distribution
(sys.getdefaultencoding(), site.py, sitecustomize.py)
and finally resulting in a non properly working Python.

Fourth problem. GUI applications supposed to mimick the
"real" terminal by doing and adding their own "recipes".

Fifth problem. The user who has to understand all this
stuff.

n-th problem, ...
jmf

PS I already understood all this stuff ten years ago!




More information about the Python-list mailing list