[issue15216] Support setting the encoding on a text stream after creation

STINNER Victor report at bugs.python.org
Tue Aug 7 04:01:37 CEST 2012


STINNER Victor added the comment:

Here is a Python implementation of TextIOWrapper.set_encoding().

The main limitation is that it is not possible to set the encoding on a non-seekable stream after the first read (if the read buffer is not empty, ie. if there are pending decoded characters).

+        # flush read buffer, may require to seek backward in the underlying
+        # file object
+        if self._decoded_chars:
+            if not self.seekable():
+                raise UnsupportedOperation(
+                    "It is not possible to set the encoding "
+                    "of a non seekable file after the first read")
+            assert self._snapshot is not None
+            dec_flags, next_input = self._snapshot
+            offset = self._decoded_chars_used - len(next_input)
+            if offset:
+                self.buffer.seek(offset, SEEK_CUR)

--

I don't have an use case for setting the encoding of sys.stdout/stderr after startup, but I would like to be able to control the *error handler* after the startup! My implementation permits to change both (encoding, errors, encoding and errors).

For example, Lib/test/regrtest.py uses the following function to force the backslashreplace error handler on sys.stdout. It changes the error handler to avoid UnicodeEncodeError when displaying the result of tests.

def replace_stdout():
    """Set stdout encoder error handler to backslashreplace (as stderr error
    handler) to avoid UnicodeEncodeError when printing a traceback"""
    import atexit

    stdout = sys.stdout
    sys.stdout = open(stdout.fileno(), 'w',
        encoding=stdout.encoding,
        errors="backslashreplace",
        closefd=False,
        newline='\n')

    def restore_stdout():
        sys.stdout.close()
        sys.stdout = stdout
    atexit.register(restore_stdout)

The doctest module uses another trick to change the error handler:

        save_stdout = sys.stdout
        if out is None:
            encoding = save_stdout.encoding
            if encoding is None or encoding.lower() == 'utf-8':
                out = save_stdout.write
            else:
                # Use backslashreplace error handling on write
                def out(s):
                    s = str(s.encode(encoding, 'backslashreplace'), encoding)
                    save_stdout.write(s)
        sys.stdout = self._fakeout

----------
keywords: +patch
nosy: +haypo
Added file: http://bugs.python.org/file26715/set_encoding.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15216>
_______________________________________


More information about the Python-bugs-list mailing list