cmd.exe on WIndows - problem with displaying some Unicode characters

wxjmfauth at gmail.com wxjmfauth at gmail.com
Mon Aug 4 04:47:18 EDT 2014


Le lundi 4 août 2014 02:17:59 UTC+2, Glenn Linderman a écrit :
> On 8/3/2014 4:25 PM, Andrew Berg wrote:
> 
>     
>     
>       On 2014.08.03 18:08, Chris Angelico wrote:
> 
>       
>         The best way to do it is to use the Unicode codepage, but cmd.exe just
> plain has issues. There are underlying Windows APIs for displaying
> text that have problems with astral characters (I think that's what it
> is), so ultimately, you're largely stuck.
> 
>       
>       That is not quite true. The terminal has these issues, not the shell. Using
> cp65001 does make Unicode in a Windows terminal possible, but using a better
> terminal[1] makes it almost perfect (my experience has been that input can be
> difficult, but output works well). I personally have used an IRC bot written in
> Python with logging output containing Unicode characters that display just fine
> (both locally and over SSH).
> 
> [1] I recommend ConEmu: https://code.google.com/p/conemu-maximus5/
> 
>     
>     I will be reading more about conemu, thanks for the reference.
> 
>     
> 
>     http://bugs.python.org/issue1602  describes 7 years worth of
>     discussion of the problems with the console/terminal used by default
>     by cmd.exe and other Windows command line programs, versus Python.
> 
>     
> 
>     The recent insights in the last couple weeks have given me hope that
>     Python might be able to be fixed to work properly with the default
>     Windows console at long last... at least for non-astral characters
>     (I'm not sure whether or not the Windows console supports non-BMP
>     characters).
> 
>     
> 
>     For this OP problem, it is mostly a matter of finding a fixed-width
>     font that supports the box drawing characters and the Polish
>     characters that are desired.  Lucida Console has a fair repertoire,
>     and Consolas has a fair repertoire, in the fixed-width font arena.
>     There may be others, documented on Polish language web sites that I
>     wouldn't know about, and I don't know enough Polish to be sure those
>     I mentioned suffice.
> 
>     
> 
>     And then, the workarounds mentioned in the above-referenced bug or
>     on the GitHub or PyPi sites mentioned should provide any needed
>     additional solutions... and hopefully something along this line
>     finally integrated into Python so that it can finally be said that
>     Python supports Unicode properly on Windows (or at least as properly
>     as Windows allows... but it is pretty clear that Windows supports
>     Unicode, even for the console, using different APIs that Python is
>     presently using, and that mismatch between APIs is really the source
>     of the problems with using Unicode in Python on Windows).
> 


1) A lot of confusion and imprecisions.

2) Unicode will never work properly because its handling
is wrong by design.

3) From my interactive interpreter:
>>> me.centralwidget.shell.font().rawName()
'Consolas'
>>> me.centralwidget.shell.fontMetrics().width('—')
9
>>> me.centralwidget.shell.fontMetrics().width('㑖')
17
>>> # note:
>>> me.centralwidget.shell.fontMetrics().width('\t')
80
>>>

(When I think I will thow away all this work...)

4) Already posted:

Fun with win_unicode_console

NB Modulo .notdef glyph in font.

D:\>D:\conuni\build\exe.win32-3.2\jmtest.exe
3.2.5 (default, May 15 2013, 23:06:03) [MSC v.1500 32 bit (Intel)]
Quelques caractères: «abc需ßÜÆŸçñö»
Loop: empty string => quit
—>for ascii users: abc
Votre entrée était : for ascii users: abc  20 caractère(s)
—>abc需ß
Votre entrée était : abcéœ€ß  7 caractère(s)
—>*€*\u20ac*
Votre entrée était : *€*€*  5 caractère(s)
—>\\\
Votre entrée était : \\\  3 caractère(s)
—>\\u0066
Votre entrée était : \f  2 caractère(s)
—>aϕЯ①ǢṺijṏz
Votre entrée était : aϕЯ①ǢṺijṏz  9 caractère(s)
—>D:\jm\Москва\Zürich\Αθήνα\œdipe
Votre entrée était : D:\jm\Москва\Zürich\Αθήνα\œdipe  31 caractère(s)
—>a\u123z
Wahrsheinlich falsches \uxxxx
—>r'\'
Votre entrée était : r'\'  4 caractère(s)
—>é\u1234\u3456z
Votre entrée était : éሴ㑖z  4 caractère(s)
—>
Fin


Addendum: there is a "but".

5) I'm aware about the discussions on the subject, see 1).

jmf





More information about the Python-list mailing list