[Python-ideas] Fix default encodings on Windows
Stephen J. Turnbull
turnbull.stephen.fw at u.tsukuba.ac.jp
Mon Aug 15 01:05:30 EDT 2016
Steve Dower writes:
> I plan to use only Unicode to interact with the OS and then utf8
> within Python if the caller wants bytes.
This doesn't answer Victor's questions, or mine.
This proposal requires identifying and transcoding bytes that
represent text in encodings other than UTF-8.
1. How do you propose to identify "bytes that represent text (and
might be filenames)" if they did *not* originate in a filesystem or
console API?
2. How do you propose to identify the non-UTF-8 encoding, if you have
forced all variables signifying bytes encodings to UTF-8?
Additional considerations:
As far as I can see, this is just a recipe for a different way to get
mojibake. *The* way to avoid mojibake is to "let text be text"
*internally*. Developers who insist on processing text as bytes are
going to get what they deserve *in edge cases*. But mostly (ie, in
the mono-encoding environments of most users) it just (barely ;-) works.
And there are many use cases where you *can* process bytes that happen
to encode text as "just bytes" (eg, low-level networking code). These
cases have performance issues if the bytes-text-bytes-text-bytes
double-round-trip implied for *stream content* (vs the OS APIs you're
concerned with, which effectively round-trip text-bytes-text) is
imposed on them.
More information about the Python-ideas
mailing list