[Python-ideas] Fix default encodings on Windows

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Mon Aug 15 01:05:30 EDT 2016


Steve Dower writes:

 > I plan to use only Unicode to interact with the OS and then utf8
 > within Python if the caller wants bytes.

This doesn't answer Victor's questions, or mine.

This proposal requires identifying and transcoding bytes that
represent text in encodings other than UTF-8.

1.  How do you propose to identify "bytes that represent text (and
might be filenames)" if they did *not* originate in a filesystem or
console API?

2.  How do you propose to identify the non-UTF-8 encoding, if you have
forced all variables signifying bytes encodings to UTF-8?

Additional considerations:

As far as I can see, this is just a recipe for a different way to get
mojibake.  *The* way to avoid mojibake is to "let text be text"
*internally*.  Developers who insist on processing text as bytes are
going to get what they deserve *in edge cases*.  But mostly (ie, in
the mono-encoding environments of most users) it just (barely ;-) works.

And there are many use cases where you *can* process bytes that happen
to encode text as "just bytes" (eg, low-level networking code).  These
cases have performance issues if the bytes-text-bytes-text-bytes
double-round-trip implied for *stream content* (vs the OS APIs you're
concerned with, which effectively round-trip text-bytes-text) is
imposed on them.



More information about the Python-ideas mailing list