Unicode, stdout, and stderr

Steven D'Aprano steve at pearwood.info
Tue Jul 22 02:58:30 EDT 2014


On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote:

> Hi all
> 
> This is not important, but I would appreciate it if someone could
> explain the following, run from cmd.exe on Windows Server 2003 -
> 
> C:\>python
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
> bit (In
> tel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = '\u2119'
>>>> x  # this uses stderr
> '\u2119'


What makes you think it uses stderr? To the best of my knowledge, it uses 
stdout.


>>>> print(x)  # this uses stdout
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
> position 0: character maps to <undefined>

I think your problem is that print tries to encode the string to your 
terminal's encoding, which appears to be CP-437 ("MS DOS" code page). Can 
you convince cmd.exe to use UTF-8? That should fix the problem. (Although 
apparently Window's handling of UTF-8 is buggy, so it will create many 
wonderful new problems, yay!)

http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how

http://stackoverflow.com/questions/14109024/how-to-make-unicode-charset-in-cmd-exe-by-default

http://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8



> It seems that there is a difference between writing to stdout and
> writing to stderr. 

I would be surprised if that were the case, but I don't have a Windows 
box to test it. Try this:


import sys
print(x, file=sys.stderr)  # I expect this will fail
print(repr(x), file=sys.stdout)  # I expect this will succeed



-- 
Steven



More information about the Python-list mailing list