Unicode, stdout, and stderr

Peter Otten __peter__ at web.de
Tue Jul 22 05:09:37 EDT 2014


Frank Millman wrote:

> 
> "Peter Otten" <__peter__ at web.de> wrote in message
> news:lql3am$2q7$1 at ger.gmane.org...
>> Frank Millman wrote:
>>
>>> Hi all
>>>
>>> This is not important, but I would appreciate it if someone could
>>> explain the following, run from cmd.exe on Windows Server 2003 -
>>>
>>> C:\>python
>>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>>> bit (In
>>> tel)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> x = '\u2119'
>>>>>> x  # this uses stderr
>>> '\u2119'
>>
>> No, both print to stdout, but just
>>
>>>>> x
>>
>> is passed to the display hook of the interactive interpreter. This
>> applies
>> repr() and  then tries to print the result. If this fails it makes
>> another effort, roughly (the actual code is written in C)
>>
>> sys.stdout.buffer.write(repr(x).encode(
>>    sys.stdout.encoding, "backslashreplace"))
>>
>>
> 
> Thanks, Peter. Very interesting.
> 
> Out of interest, does the same thing happen when writing to sys.stderr?

If you are asking about the fallback mechanism, that is specific to 
sys.displayhook in the interactive interpreter. 

But stdout and stderr do handle errors differently:

>>> import sys
>>> sys.stdout.errors
'strict'
>>> sys.stderr.errors
'backslashreplace'

So a codepoint written to stdout that cannot be encoded with stdout.encoding 
raises an error while a codepoint written to stderr that cannot be encoded 
with stderr.encoding is escaped.

Another way to make stdout more forgiving:

>>> import sys
>>> print("\u2119")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in 
position 0: character maps to <undefined>
>>> sys.stdout = open(1, mode="w", errors="xmlcharrefreplace", 
encoding=sys.stdout.encoding, closefd=False)
>>> print("\u2119")


More information about the Python-list mailing list