[Python-ideas] changing sys.stdout encoding

Sat Jun 9 06:07:07 CEST 2012

On 06/08/2012 05:11 AM, Stephen J. Turnbull wrote:
> Rurpy writes:
> 
>  > Python is inconsistent:
> 
> Yup, and I said there is support for dealing with that inconsistency.
> At least I'm +1 and Nick's +0.5.
> 
> So let's talk about what to do about it.  Nick has a pretty good
> channel on the BFDL, and since he doesn't seem to like an addition to
> the stdlib here, it may not go far.  But I don't see a reason to rule
> out stdlib changes yet.
> 
> As far as I'm concerned, there are three reasonable proposals:

Which were (summarizing, please correct if wrong)

1) A package on PyPI containing a function like
        import codecs
	def rewrap_stream_with_new_encoding (old_stream, encoding):
            new_stream = codecs.getwriter (encoding)(old_stream.buffer)
            return new_stream
 (or maybe three functions for each of the std* streams, 
 without the 'old_stream' parameter?)

2) Modify standard lib.  Add something like a 
 .reset_encoding() method to io.TextIOWrapper?  
 (Name and functionality to be bikeshedded to death.)

3) Modify the standard lib documentation (I assume 
 for sys.std* as described below)

Also 4?) Nathan Schneider suggested a hybrid (1) and
 (2): put the function in the codecs module.

>  > > [S]ince a 3-line function can do the job, it might make just as
>  > > much sense to put up a package on PyPI.
> 
>  > I hardly think it is worth the effort, for either the producer 
>  > or consumers, of putting a 3-line function on PyPI.  Nor would 
>  > such a solution address the discoverability and ease-of-use 
>  > problems I am complaining about.
> 
> Agreed that it's pretty weak, but it's not clear that other solutions
> will be much better in practice.

If (and when) I had the problem of figuring out how 
to change sys.stdout encoding PyPI would be (and was)
the last place I'd look.  It is just not the kind of 
problem one looks to a package to solve.  Rather like 
looking in PyPI if you want to capitalize a string.

Where I would look is where I did: 
* The Python docs io module.
* Then the sys module docs for std*.  They say how to change
 the buffering and how to change to binary.  They also say 
 how the default encoding is determined.  For this reason,
 this is where I would put any note about changing the encoding.
* Finally the internet.
* Had I not found an answer there I would have posted to 
 c.l.p.  I don't think I'd have looked on PyPI unless something 
 explicitly pointed me there.

> Discoverability depends on
> documentation, which can be written and improved.

Documentation where?

> I think "ease of use" is way off-target.

I would think ease of use would always be a consideration 
in any api change users were exposed to.  Or are you saying
some api's should be discouraged and making them hard to
use is better than a "not recommended" note in the documentation?
If so I suspect we'll just have to agree to disagree on that.

And in this case I don't even see any reason to disrecommend 
it -- writing to sys.stdout is the best answer in the circumstances
I've described.  

>  > I presume that would be a standard library change (in either the io
>  > or sys modules) and offered a .set_encoding() method as a
>  > placeholder for discussion.
> 
> Changing the stdlib is not a panacea.  In particular, it can't be
> applied to older Pythons.  I'm also not convinced (cf. Nick's post)
> that there's enough value-added and a good name for the restricted
> functionality we know we can provide.

Nothing is ever a panacea.  It seems like it could be
the cleanest, nicest (long term) solution but clearly
the most difficult.

>  > An inferior and bare minimum way to address this would be to at
>  > least add a note about how to change the encoding to the sys.std*
>  > documentation.  That encourages cargo-cult programming and doesn't
>  > address the WTF effect but it is at least better than the current
>  > state of affairs.
> 
> IMO, this may be the best, but again I doubt it can be added to older
> versions.

Does it need to be?  I'd have thought this would just
be a doc issue on the tracker (although perhaps getting 
agreement of the wording would be hard?)

> As for the "cargo cult" and "WTF" issues, I have little sympathy for
> either.  The real WTF problem is that multi-encoding environments are
> inherently complex and irregular (ie, a WTF waiting to happen), and
> Python can't fix that. 

But the WTF comes not from multi-encoding (in which 
case it would have occurred when the problem requirements 
were received) but from observing that doing the necessary 
output to a file is easy as pie, but doing the same to 
stdout (another file) isn't.  Python can avoid making a 
less than ideal situation (multi-encoding) worse by not 
making harder to do what needs to be done than necessary.

> It's very unlikely that typical programmers
> will bother to understand what happens "under the hood" of a stdlib
> function/method, so that is no better than cargo-cult programming

The point though is that programmers don't need to look
under the hood -- the fact that something is in stdlib
means (at least ideally) it is documented as a black box.
What goes in, what comees out, the relationship between
the two and any side effects are all concisely, fully
and accurately described (again, in an ideal world). 
But with a code snippet and a comment that says, "use this
to change the encoding of sys.stdout), the programmer has 
to figure out everything himself. (Of course that's not 
totally bad -- I know a lot more about text IO streams 
than I did 3 days ago. :-)

Sure, you could document the code snippet as well as a
packaged function, but that's stretching our ideal world
well past the breaking point -- it doesn't happen. :-)

> (and
> cargo-cult at least has the advantage that what is being done is
> explicit, allowing programmers who understand textio but not encodings
> to figure out what's happening).

True it's a double edged sword but I prefer to use code
packaged in stdlib.  If I didn't I would cut and paste
from there and I don't :-)

Also, there are programmers who understand encoding but
not textio (I'm one) but I'll concede we are probably a 
minority.