[issue18713] Enable surrogateescape on stdin and stdout when appropriate

STINNER Victor report at bugs.python.org
Thu Aug 22 15:18:23 CEST 2013


STINNER Victor added the comment:

> The surrogateescape error handler is dangerous with utf-16/32. It can produce globally invalid output.

I don't understand, can you give an example? surrogateescape generate invalid encoded string with any encoding. Example with UTF-8:

>>> b"a\xffb".decode("utf-8", "surrogateescape")
'a\udcffb'

>>> 'a\udcffb'.encode("utf-8", "surrogateescape")
b'a\xffb'

>>> b'a\xffb'.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte

So str.encode("utf-8", "surrogateescape") produces an invalid UTF-8 sequence.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18713>
_______________________________________


More information about the Python-bugs-list mailing list