unable to print Unicode characters in Python 3

Mon Jan 26 17:38:50 EST 2009

On Jan 27, 8:38 am, Jean-Paul Calderone <exar... at divmod.com> wrote:
> On Mon, 26 Jan 2009 13:26:56 -0800 (PST), jefm <jef.mangelsch... at gmail.com> wrote:
> >>As Benjamin Kaplin said, Windows terminals use the old cp1252 character
> >>set, which cannot display the euro sign. You'll either have to run it in
> >> something more modern like the cygwin rxvt terminal, or output some
> >>other way, such as through a GUI.
>
> >>With the standard console, I get the same.  But with IDLE, using the
> >>same Python build but through a different interface
>
> >>Scream at Microsoft or try to find or encourage a console
> >>replacement that Python could use.  In the meanwhile, use IDLE.  Not
> >>perfect for Unicode, but better.
>
> >So, if I understand it correctly, it should work as long as you run
> >your Python code on something that can actually print the Unicode
> >character.
> >Apparently, the Windows command line can not.
>
> >I mainly program command line tools to be used by Windows users. So I
> >guess I am screwed.
>
> >Other than converting my tools to have a graphic interface, is there
> >any other solution, other than give Bill Gates a call and bring his
> >command line up to the 21st century ?
>
> cp1252 can represent the euro sign (<http://en.wikipedia.org/wiki/Windows-1252>).  Apparently the chcp command can be used to change the code page
> active in the console (<http://technet.microsoft.com/en-us/library/bb490874.aspx>).  I've never tried this myself, though.
>

Short answer: it doesn't work.

Test [Windows XP SP3, Python 2.6.1]:

C:\junk>chcp
Active code page: 850

C:\junk>chcp 1252
Active code page: 1252

C:\junk>chcp
Active code page: 1252

C:\junk>\python26\python
Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.stdout.encoding; sys.stderr.encoding
'cp1252'
'cp1252'

# So far, so good

>>> import unicodedata as ucd
>>> for b in range(128, 256):
...    c = chr(b)
...    u = c.decode('cp1252', 'replace')
...    name = ucd.name(u)
...    print hex(b), c, repr(u), name
...
0x80 € u'\u20ac' EURO SIGN
0x81  u'\ufffd' REPLACEMENT CHARACTER
0x82 ‚ u'\u201a' SINGLE LOW-9 QUOTATION MARK
[snip]
0xfb û u'\xfb' LATIN SMALL LETTER U WITH CIRCUMFLEX
0xfc ü u'\xfc' LATIN SMALL LETTER U WITH DIAERESIS
0xfd ý u'\xfd' LATIN SMALL LETTER Y WITH ACUTE
[snip]
Ignore what you are seeing in the second field of each above line; it
could well look OK. However what I see on the console is:
capital C with cedilla
small u with diaeresis (umlaut)
small e with acute
superscript one
superscript three
superscript two [yes, out of order]

IOW, the bridge might think it's in cp1252 mode, but nobody told the
engine room, which is still churning out cp850.