[Python-ideas] Fix default encodings on Windows

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Sat Aug 13 15:09:53 EDT 2016


Random832 writes:

 > And what's going to happen if you shovel those bytes into the
 > filesystem without conversion on Linux, or worse, OSX?

Off topic.  See Subject: field.

 > This proposal embodies an assumption that bytes from unknown sources
 > used as filenames are more likely to be UTF-8 than in the locale ACP

Then it's irrelevant: most bytes are not from "unknown sources",
they're from correspondents (or from yourself!) -- and for most users
most of the time, those correspondents share the locale encoding with
them.  At least where I live, they use that encoding frequently.

 > the only solution is to require the application to make a
 > considered decision

That's not a solution.  Code is not written with every decision
considered, and it never will be.  The (long-run) solution is a la
Henry Ford: "you can encode text any way you want, as long as it's
UTF-8".  Then it won't matter if people ever make considered decisions
about encoding!  But trying to enforce that instead of letting it
evolve naturally (as it is doing) will cause unnecessary pain for
Python programmers, and I believe quite a lot of pain.

I used to be in the "make them speak UTF-8" camp.  But in the 15 years
since PEP 263, experience has shown me that mostly it doesn't matter,
and that when it does matter, you have to deal with the large variety
of encodings anyway -- assuming UTF-8 is not a win.  For use cases
that can be encoding-agnostic because all cooperating participants
share a locale encoding, making them explicitly specify the locale
encoding is just a matter of "misery loves company".  Please, let's
not do things for that reason.

 > I think the use case that the proposal has in mind is a
 > file-names-are-just-bytes program (or set of programs) that reads
 > from the filesystem, converts to bytes for a file/network, and then
 > eventually does the reverse - either end may be on windows.

You have misspoken somewhere.  The programs under discussion do not
"convert" input to bytes; they *receive* bytes, either from POSIX APIs
or from Windows *A APIs, and use them as is.  Unless I am greatly
mistaken, Steve simply wants that to work as well on Windows as on
POSIX platforms, so that POSIX programmers who do encoding-agnostic
programming have one less barrier to supporting their software on
Windows.  But you'll have to ask Steve to rule on that.

Steve


More information about the Python-ideas mailing list