Unicode, stdout, and stderr

Peter Otten __peter__ at web.de
Tue Jul 22 03:19:49 EDT 2014


Frank Millman wrote:

> Hi all
> 
> This is not important, but I would appreciate it if someone could explain
> the following, run from cmd.exe on Windows Server 2003 -
> 
> C:\>python
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
> bit (In
> tel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = '\u2119'
>>>> x  # this uses stderr
> '\u2119'

No, both print to stdout, but just

>>> x

is passed to the display hook of the interactive interpreter. This applies 
repr() and  then tries to print the result. If this fails it makes another 
effort, roughly (the actual code is written in C)

sys.stdout.buffer.write(repr(x).encode(
    sys.stdout.encoding, "backslashreplace"))


>>>> print(x)  # this uses stdout
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
> position
> 0: character maps to <undefined>
>>>>
> 
> It seems that there is a difference between writing to stdout and writing
> to stderr. My questions are -
> 
> 1. What is the difference?
> 
> 2. Is there an easy way to get stdout to behave the same as stderr?

You could set the PYTHONIOENCODING environment variable with an error 
handler:

[simulating the behaviour you are seeing on a linux/utf-8 machine]

$ PYTHONIOENCODING=cp437 python3.4
Python 3.4.0rc1+ (default:16384988a526+, Mar 11 2014, 16:56:15) 
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "\u2112"
'\u2112'
>>> print("\u2112")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2112' in 
position 0: character maps to <undefined>
>>> 

[the proposed fix]

$ PYTHONIOENCODING=cp437:backslashreplace python3.4
Python 3.4.0rc1+ (default:16384988a526+, Mar 11 2014, 16:56:15) 
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "\u2112"
'\u2112'
>>> print("\u2112")
\u2112





More information about the Python-list mailing list