[Python-ideas] Fix default encodings on Windows

Sat Aug 13 18:00:46 EDT 2016

The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes.

Top-posted from my Windows Phone

-----Original Message-----
From: "Stephen J. Turnbull" <turnbull.stephen.fw at u.tsukuba.ac.jp>
Sent: ‎8/‎13/‎2016 12:11
To: "Random832" <random832 at fastmail.com>
Cc: "python-ideas at python.org" <python-ideas at python.org>
Subject: Re: [Python-ideas] Fix default encodings on Windows

Random832 writes:

 > And what's going to happen if you shovel those bytes into the
 > filesystem without conversion on Linux, or worse, OSX?

Off topic.  See Subject: field.

 > This proposal embodies an assumption that bytes from unknown sources
 > used as filenames are more likely to be UTF-8 than in the locale ACP

Then it's irrelevant: most bytes are not from "unknown sources",
they're from correspondents (or from yourself!) -- and for most users
most of the time, those correspondents share the locale encoding with
them.  At least where I live, they use that encoding frequently.

 > the only solution is to require the application to make a
 > considered decision

That's not a solution.  Code is not written with every decision
considered, and it never will be.  The (long-run) solution is a la
Henry Ford: "you can encode text any way you want, as long as it's
UTF-8".  Then it won't matter if people ever make considered decisions
about encoding!  But trying to enforce that instead of letting it
evolve naturally (as it is doing) will cause unnecessary pain for
Python programmers, and I believe quite a lot of pain.

I used to be in the "make them speak UTF-8" camp.  But in the 15 years
since PEP 263, experience has shown me that mostly it doesn't matter,
and that when it does matter, you have to deal with the large variety
of encodings anyway -- assuming UTF-8 is not a win.  For use cases
that can be encoding-agnostic because all cooperating participants
share a locale encoding, making them explicitly specify the locale
encoding is just a matter of "misery loves company".  Please, let's
not do things for that reason.

 > I think the use case that the proposal has in mind is a
 > file-names-are-just-bytes program (or set of programs) that reads
 > from the filesystem, converts to bytes for a file/network, and then
 > eventually does the reverse - either end may be on windows.

You have misspoken somewhere.  The programs under discussion do not
"convert" input to bytes; they *receive* bytes, either from POSIX APIs
or from Windows *A APIs, and use them as is.  Unless I am greatly
mistaken, Steve simply wants that to work as well on Windows as on
POSIX platforms, so that POSIX programmers who do encoding-agnostic
programming have one less barrier to supporting their software on
Windows.  But you'll have to ask Steve to rule on that.

Steve
_______________________________________________
Python-ideas mailing list
Python-ideas at python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160813/9927251f/attachment.html>