[Python-ideas] changing sys.stdout encoding

MRAB python at mrabarnett.plus.com
Wed Jun 6 12:09:06 EDT 2012


On 06/06/2012 08:09, Rurpy wrote:
> On 06/05/2012 05:56 PM, MRAB wrote:
>>  On 06/06/2012 00:34, Victor Stinner wrote:
>>>  2012/6/5 Rurpy<rurpy-/E1597aS9LQAvxtiuMwx3w at public.gmane.org>:
>>>>   In my first foray into Python3 I've encountered this problem:
>>>>   I work in a multi-language environment.  I've written a number
>>>>   of tools, mostly command-line, that generate output on stdout.
>>>>   Because these tools and their output are used by various people
>>>>   in varying environments, the tools all have an --encoding option
>>>>   to provide output that meets the needs and preferences of the
>>>>   output's ultimate consumers.
>>>
>>>  What happens if the specified encoding is different than the encoding
>>>  of the console? Mojibake?
>>>
>>>  If the output is used as in the input of another program, does the
>>>  other program use the same encoding?
>>>
>>>  In my experience, using an encoding different than the locale encoding
>>>  for input/output (stdout, environment variables, command line
>>>  arguments, etc.) causes various issues. So I'm curious of your use
>>>  cases.
>>>
>>>>   In converting them to Python3, I found the best (if not very
>>>>   pleasant) way to do this in Python3 was to put something like
>>>>   this near the top of each tool[*1]:
>>>>
>>>>     import codecs
>>>>     sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
>>>
>>>  In Python 3, you should use io.TextIOWrapper instead of
>>>  codecs.StreamWriter. It's more efficient and has less bugs.
>>>
>>>>   What I want to be able to put there instead is:
>>>>
>>>>     sys.stdout.set_encoding (opts.encoding)
>>>
>>>  I don't think that your use case merit a new method on
>>>  io.TextIOWrapper: replacing sys.stdout does work and should be used
>>>  instead. TextIOWrapper is generic and your use case if specific to
>>>  sys.std* streams.
>>>
>>>  It would be surprising to change the encoding of an arbitrary file
>>>  after it is opened. At least, I don't see the use case.
>>>
>>  [snip]
>>
>>  And if you _do_ want multiple encodings in a file, it's clearer to open
>>  the file as binary and then explicitly encode to bytes and write _that_
>>  to the file.
>
> But is it really?
>
> The following is very simple and the level of python
> expertise required is minimal.  It (would) works fine
> with redirection.  One could substitute any other ordinary
> open (for write) text file for sys.stdout.
>
>    [off the top of my head]
>    text = 'This is %s text: 世界へ、こんにちは!'
>    sys.stdout.set_encoding ('sjis')
>    print (text % 'sjis')
>    sys.stdout.set_encoding ('euc-jp')
>    print (text % 'euc-jp')
>    sys.stdout.set_encoding ('iso2022-jp')
>    print (text % 'iso2022-jp')
>
> As for your suggestion, how do I reopen sys.stdout in
> binary mode?  I don't need to do that often and don't
> know off the top of my head.  (And it's too late for
> me to look it up.)  And what happens to redirected output
> when I close and reopen the stream?  I can open a regular
> filename instead.  But remember to make the last two
> opens with "a" rather than "w".  And don't forget the
> "\n" at the end of the text line.
>
> Could you show me an code example of your suggestion
> for comparison?
>
> Disclaimer: As I said before, I am not particularly
> advocating for a for a set_encoding() method -- my
> primary suggestion is a programatic way to change the
> sys.std* encodings prior to first use.  Here I am just
> questioning the claim that a set_encoding() method
> would not be clearer than existing alternatives.
>
This example accesses the underlying binary output stream:


# -*- coding: utf-8 -*-

import sys

class Writer:
     def __init__(self, output):
         self.output = output
         self.encoding = output.encoding
     def write(self, string):
         self.output.buffer.write(string.encode(self.encoding))
     def set_encoding(self, encoding):
         self.output.buffer.flush()
         self.encoding = encoding

sys.stdout = Writer(sys.stdout)

initial_encoding = sys.stdout.encoding

text = 'This is %s text: 世界へ、こんにちは!'
sys.stdout.set_encoding('utf-8')
print (text % 'utf-8')
sys.stdout.set_encoding('sjis')
print (text % 'sjis')
sys.stdout.set_encoding('euc-jp')
print (text % 'euc-jp')
sys.stdout.set_encoding('iso2022-jp')
print (text % 'iso2022-jp')

sys.stdout.set_encoding(initial_encoding)



More information about the Python-list mailing list