[issue10417] unittest triggers UnicodeEncodeError with non-ASCII character in the docstring of the test function

STINNER Victor report at bugs.python.org
Tue Nov 16 15:10:07 CET 2010


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

In Python 3, sys.stderr uses the 'backslashreplace' error handler. With C locale, sys.stderr uses the ASCII encoding and so the é unicode character is printed as \xe9.

In Python 2, sys.stderr.errors is strict by default.

It works if you specify the error handler:

$ ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
é
$ PYTHONIOENCODING=ascii:backslashreplace ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
\xe9

But with ASCII encoding, and the default error handler (strict), it fails:

$ PYTHONIOENCODING=ascii ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
$ LANG= ./python -c "import sys; sys.stderr.write(u'\xe9\n')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

Change the default error handler in a minor release is not a good idea. But we can emulate the backslashreplace error handler. distutils.log does that in Python3:

class Log:

    def __init__(self, threshold=WARN):
        self.threshold = threshold

    def _log(self, level, msg, args):
        if level not in (DEBUG, INFO, WARN, ERROR, FATAL):
            raise ValueError('%s wrong log level' % str(level))

        if level >= self.threshold:
            if args:
                msg = msg % args
            if level in (WARN, ERROR, FATAL):
                stream = sys.stderr
            else:
                stream = sys.stdout
            if stream.errors == 'strict':
                # emulate backslashreplace error handler
                encoding = stream.encoding
                msg = msg.encode(encoding, "backslashreplace").decode(encoding)
            stream.write('%s\n' % msg)
            stream.flush()
    (...)

_WritelnDecorator() of unittest.runner should maybe use the same code.

----------
nosy: +haypo

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10417>
_______________________________________


More information about the Python-bugs-list mailing list