Unicode, stdout, and stderr

Tue Jul 22 05:33:58 EDT 2014

Le mardi 22 juillet 2014 11:09:37 UTC+2, Peter Otten a écrit :
> Frank Millman wrote:
> 
> 
> 
> > 
> 
> > "Peter Otten" <__peter__ at web.de> wrote in message
> 
> > news:lql3am$2q7$1 at ger.gmane.org...
> 
> >> Frank Millman wrote:
> 
> >>
> 
> >>> Hi all
> 
> >>>
> 
> >>> This is not important, but I would appreciate it if someone could
> 
> >>> explain the following, run from cmd.exe on Windows Server 2003 -
> 
> >>>
> 
> >>> C:\>python
> 
> >>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
> 
> >>> bit (In
> 
> >>> tel)] on win32
> 
> >>> Type "help", "copyright", "credits" or "license" for more information.
> 
> >>>>>> x = '\u2119'
> 
> >>>>>> x  # this uses stderr
> 
> >>> '\u2119'
> 
> >>
> 
> >> No, both print to stdout, but just
> 
> >>
> 
> >>>>> x
> 
> >>
> 
> >> is passed to the display hook of the interactive interpreter. This
> 
> >> applies
> 
> >> repr() and  then tries to print the result. If this fails it makes
> 
> >> another effort, roughly (the actual code is written in C)
> 
> >>
> 
> >> sys.stdout.buffer.write(repr(x).encode(
> 
> >>    sys.stdout.encoding, "backslashreplace"))
> 
> >>
> 
> >>
> 
> > 
> 
> > Thanks, Peter. Very interesting.
> 
> > 
> 
> > Out of interest, does the same thing happen when writing to sys.stderr?
> 
> 
> 
> If you are asking about the fallback mechanism, that is specific to 
> 
> sys.displayhook in the interactive interpreter. 
> 
> 
> 
> But stdout and stderr do handle errors differently:
> 
> 
> 
> >>> import sys
> 
> >>> sys.stdout.errors
> 
> 'strict'
> 
> >>> sys.stderr.errors
> 
> 'backslashreplace'
> 
> 
> 
> So a codepoint written to stdout that cannot be encoded with stdout.encoding 
> 
> raises an error while a codepoint written to stderr that cannot be encoded 
> 
> with stderr.encoding is escaped.
> 
> 
> 
> Another way to make stdout more forgiving:
> 
> 
> 
> >>> import sys
> 
> >>> print("\u2119")
> 
> Traceback (most recent call last):
> 
>   File "<stdin>", line 1, in <module>
> 
>   File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
> 
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in 
> 
> position 0: character maps to <undefined>
> 
> >>> sys.stdout = open(1, mode="w", errors="xmlcharrefreplace", 
> 
> encoding=sys.stdout.encoding, closefd=False)
> 
> >>> print("\u2119")
> 
> ℙ

=====

or in a similar way

>>> print(ascii('abcéoe EURO\u2119'))
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stdout.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stderr.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>> 
>>> sys.stdout.write((ascii('abcéoe EURO\u2119').strip("'") + '\n'))
abc\xe9\u0153\u20ac\u2119
>>>

jmf