[Python-ideas] changing sys.stdout encoding

Wed Jun 6 01:34:00 CEST 2012

2012/6/5 Rurpy <rurpy at yahoo.com>:
> In my first foray into Python3 I've encountered this problem:
> I work in a multi-language environment.  I've written a number
> of tools, mostly command-line, that generate output on stdout.
> Because these tools and their output are used by various people
> in varying environments, the tools all have an --encoding option
> to provide output that meets the needs and preferences of the
> output's ultimate consumers.

What happens if the specified encoding is different than the encoding
of the console? Mojibake?

If the output is used as in the input of another program, does the
other program use the same encoding?

In my experience, using an encoding different than the locale encoding
for input/output (stdout, environment variables, command line
arguments, etc.) causes various issues. So I'm curious of your use
cases.

> In converting them to Python3, I found the best (if not very
> pleasant) way to do this in Python3 was to put something like
> this near the top of each tool[*1]:
>
>  import codecs
>  sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)

In Python 3, you should use io.TextIOWrapper instead of
codecs.StreamWriter. It's more efficient and has less bugs.

> What I want to be able to put there instead is:
>
>  sys.stdout.set_encoding (opts.encoding)

I don't think that your use case merit a new method on
io.TextIOWrapper: replacing sys.stdout does work and should be used
instead. TextIOWrapper is generic and your use case if specific to
sys.std* streams.

It would be surprising to change the encoding of an arbitrary file
after it is opened. At least, I don't see the use case.

For example, tokenize.open() opens a Python source code file with the
right encoding. It starts by reading the file in binary mode to detect
the encoding, and then use TextIOWrapper to get a text file without
having to reopen the file. It would be possible to start with a text
file and then change the encoding, but it would be less elegant.

>  sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)

You should also flush sys.stdout (and maybe also sys.stdout.buffer)
before replacing it.

> It requires the import of the codecs module in programs that other-
> wise don't need it [*2], and the reading of the codecs docs (not
> a shining example of clarity themselves) to understand it.

It's maybe difficult to change the encoding of sys.stdout at runtime
because it is NOT a good idea :-)

> Needing to change the encoding of a sys.std* stream is not an
> uncommon need and a user should not have to go through the
> codecs dance above to do so IMO.

Replacing sys.std* works but has issues: output written before the
replacement is encoded to a different encoding for example. The best
way is to change your locale encoding (using LC_ALL, LC_CTYPE or LANG
environment variable on UNIX), or simply to set PYTHONIOENCODING
environment variable.

> [*1] There are other ways to change stdout's encoding but they
>  all have problems AFAICT.  PYTHONIOENCODING can't easily be
>  changed dynamically within program.

Ah? Detect if PYTHONIOENCODING is present (or if sys.stdout.encoding
is the requested encoding), if not: restart the program with
PYTHONIOENCODING=encoding.

>  Overloading print() is obscure
>  because it requires reader to notice print was overloaded.

Why not writing the output into a file, instead of stdout?

Victor