[Python-ideas] changing sys.stdout encoding

Rurpy rurpy at yahoo.com
Wed Jun 6 09:09:34 CEST 2012


On 06/05/2012 05:56 PM, MRAB wrote:
> On 06/06/2012 00:34, Victor Stinner wrote:
>> 2012/6/5 Rurpy<rurpy-/E1597aS9LQAvxtiuMwx3w at public.gmane.org>:
>>>  In my first foray into Python3 I've encountered this problem:
>>>  I work in a multi-language environment.  I've written a number
>>>  of tools, mostly command-line, that generate output on stdout.
>>>  Because these tools and their output are used by various people
>>>  in varying environments, the tools all have an --encoding option
>>>  to provide output that meets the needs and preferences of the
>>>  output's ultimate consumers.
>>
>> What happens if the specified encoding is different than the encoding
>> of the console? Mojibake?
>>
>> If the output is used as in the input of another program, does the
>> other program use the same encoding?
>>
>> In my experience, using an encoding different than the locale encoding
>> for input/output (stdout, environment variables, command line
>> arguments, etc.) causes various issues. So I'm curious of your use
>> cases.
>>
>>>  In converting them to Python3, I found the best (if not very
>>>  pleasant) way to do this in Python3 was to put something like
>>>  this near the top of each tool[*1]:
>>>
>>>    import codecs
>>>    sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
>>
>> In Python 3, you should use io.TextIOWrapper instead of
>> codecs.StreamWriter. It's more efficient and has less bugs.
>>
>>>  What I want to be able to put there instead is:
>>>
>>>    sys.stdout.set_encoding (opts.encoding)
>>
>> I don't think that your use case merit a new method on
>> io.TextIOWrapper: replacing sys.stdout does work and should be used
>> instead. TextIOWrapper is generic and your use case if specific to
>> sys.std* streams.
>>
>> It would be surprising to change the encoding of an arbitrary file
>> after it is opened. At least, I don't see the use case.
>>
> [snip]
> 
> And if you _do_ want multiple encodings in a file, it's clearer to open
> the file as binary and then explicitly encode to bytes and write _that_
> to the file.

But is it really?

The following is very simple and the level of python
expertise required is minimal.  It (would) works fine 
with redirection.  One could substitute any other ordinary
open (for write) text file for sys.stdout.

  [off the top of my head]
  text = 'This is %s text: 世界へ、こんにちは!'
  sys.stdout.set_encoding ('sjis')
  print (text % 'sjis')
  sys.stdout.set_encoding ('euc-jp')
  print (text % 'euc-jp')
  sys.stdout.set_encoding ('iso2022-jp')
  print (text % 'iso2022-jp')

As for your suggestion, how do I reopen sys.stdout in 
binary mode?  I don't need to do that often and don't 
know off the top of my head.  (And it's too late for 
me to look it up.)  And what happens to redirected output
when I close and reopen the stream?  I can open a regular
filename instead.  But remember to make the last two 
opens with "a" rather than "w".  And don't forget the
"\n" at the end of the text line.

Could you show me an code example of your suggestion 
for comparison?

Disclaimer: As I said before, I am not particularly 
advocating for a for a set_encoding() method -- my 
primary suggestion is a programatic way to change the
sys.std* encodings prior to first use.  Here I am just
questioning the claim that a set_encoding() method 
would not be clearer than existing alternatives.




More information about the Python-ideas mailing list