[issue15216] Support setting the encoding on a text stream after creation

INADA Naoki report at bugs.python.org
Wed Jan 11 05:35:16 EST 2017


INADA Naoki added the comment:

> Inada, I think you messed up the positioning of bits of the patch. E.g. there are now test methods declared > inside a helper function (rather than a test class).

I'm sorry.  `patch -p1` merged previous patch into wrong place, and test passed accidently.

> Since it seems other people are in favour of this API, I would like to expand it a bit to cover two uses  cases (see set_encoding-newline.patch):
> 
> * change the error handler without affecting the main character encoding
> * set the newline encoding (also suggested by Serhiy)

+1.  Since stdio is configured before running Python program, TextIOWrapper should be configurable after creation, as possible.

> Regarding Serhiy’s other suggestion about buffering parameters, perhaps TextIOWrapper.line_buffering could become a writable attribute instead, and the class could grow a similar write_through attribute. I don’t think these affect encoding or decoding, so they could be treated independently.

Could them go another new issue?
This issue is too long to read already.

> The algorithm for rewinding unread data is complicated and can fail. What is the advantage of using it? What is the use case for reading from a stream and then changing the encoding, without a guarantee that it will work?
>
> Even if it is enhanced to never “fail”, it will still have strange behaviour, such as data loss when a decoder is fed a single byte and produces multiple characters (e.g. CR newline, backslashreplace, UTF-7).

When I posted the set_encoding-7.patch, I hadn't read io module deeply.  I just solved conflict and ran test.
After that, I read the code and I feel same thing (see msg285111 and msg285112).
Let's drop support changing encoding while reading.
It's significant step that allowing changing stdin encoding only before reading anything from it.


> One step in the right direction IMO would be to only support calling set_encoding() when no extra read data has been buffered (or to explicitly say that any buffered data is silently dropped). So there is no support for changing the encoding halfway through a disk file, but it may be appropriate if you can regulate the bytes being read, e.g. from a terminal (user input), pipe, socket, etc.

Totally agree.


> But I would be happy enough without set_encoding(), and with something like my rewrap() function at the bottom of <https://github.com/vadmium/data/blob/master/data.py#L526>. It returns a fresh TextIOWrapper, but when you exit the context manager you can continue to reuse the old stream with the old settings.

I want one obvious way to control encoding and error handler from Python, (not from environment variable).
Rewrapping stream seems hacky way, rather than obvious way.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15216>
_______________________________________


More information about the Python-bugs-list mailing list