[Python-ideas] Fix default encodings on Windows

Steve Dower steve.dower at python.org
Mon Aug 15 09:23:44 EDT 2016


I guess I'm not sure what your question is then.

Using text internally is of course the best way to deal with it. But for those who insist on using bytes, this change at least makes Windows a feasible target without requiring manual encoding/decoding at every boundary.

Top-posted from my Windows Phone

-----Original Message-----
From: "Stephen J. Turnbull" <turnbull.stephen.fw at u.tsukuba.ac.jp>
Sent: ‎8/‎14/‎2016 22:06
To: "Steve Dower" <steve.dower at python.org>
Cc: "Victor Stinner" <victor.stinner at gmail.com>; "python-ideas" <python-ideas at python.org>; "Random832" <random832 at fastmail.com>
Subject: RE: [Python-ideas] Fix default encodings on Windows

Steve Dower writes:

 > I plan to use only Unicode to interact with the OS and then utf8
 > within Python if the caller wants bytes.

This doesn't answer Victor's questions, or mine.

This proposal requires identifying and transcoding bytes that
represent text in encodings other than UTF-8.

1.  How do you propose to identify "bytes that represent text (and
might be filenames)" if they did *not* originate in a filesystem or
console API?

2.  How do you propose to identify the non-UTF-8 encoding, if you have
forced all variables signifying bytes encodings to UTF-8?

Additional considerations:

As far as I can see, this is just a recipe for a different way to get
mojibake.  *The* way to avoid mojibake is to "let text be text"
*internally*.  Developers who insist on processing text as bytes are
going to get what they deserve *in edge cases*.  But mostly (ie, in
the mono-encoding environments of most users) it just (barely ;-) works.

And there are many use cases where you *can* process bytes that happen
to encode text as "just bytes" (eg, low-level networking code).  These
cases have performance issues if the bytes-text-bytes-text-bytes
double-round-trip implied for *stream content* (vs the OS APIs you're
concerned with, which effectively round-trip text-bytes-text) is
imposed on them.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160815/46f7e1e1/attachment.html>


More information about the Python-ideas mailing list