[Python-Dev] File system path encoding on Windows

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Mon Aug 22 05:47:06 EDT 2016


Nick Coghlan writes:
 > On 21 August 2016 at 06:31, Steve Dower <steve.dower at python.org> wrote:

 > > My biggest concern is that it then falls onto users to know how
 > > to start Python with that flag.

The users I'm most worried about belong to organizations where
concerted effort has been made to "purify" the environment so that
they *can* use bytes-oriented code.  That is, getfilesystemencoding()
== getpreferredencoding() == what is actually used throughout the
system.  Such organizations will be able to choose the flag correctly,
and implement it organization-wide, I'm pretty sure.  I doubt that all
will choose UTF-8 at this point in time, though I wish they would.

 > Not necessarily, as this is one of the areas where commercial
 > redistributors can earn their revenue stream - by deciding that
 > flipping the default behaviour is the right thing to do for *their*
 > user base (which is inevitably only a subset of the overall Python
 > user base).

This assumes that the Python applications are the mission-critical
ones for their clients.  What if they're not?  I think the commercial
redistributors will have to make their decisions on a client-by-client
basis, too.  They may be in a better position to do so, but why buy
trouble?  They'll be quite conservative (unless they're basically
monopoly IT supplier to a whole organization, but they'll still have
to face potential problems from existing files on users' storage, and
perhaps applications that they supply but don't "own").

I have real trouble seeing trying to force UTF-8 as a good idea until
the organization mandates UTF-8. :-(  This really is an organizational
decision, to be implemented with client resources.  We can't do it for
them, redistributors can't do it for them.  It needs to be an option
in Python.

Python itself is already ready for UTF-8, except that on Windows
getfilesystemencoding and getpreferredencoding can't honestly return
'utf-8', AIUI.  I understand that that is exactly what Steve wants to
change, but "honestly" is the rub.  What happens if Python 3.6 is only
part of a bytes-oriented system, receives a filename forced to UTF-8-
encoded bytes, and passes that over a pipe or in shared memory or in a
file to a non-Python-3.6 application that trusts the system defaults?
"Boom!", no?  Is there any experience anywhere in any implementation
language with systems used on Windows that use this approach of
pretending the Windows world is UTF-8?  If not, why is it a good idea
for Python to go first?

 > Making that possible doesn't mean redistributors will actually follow
 > through, but it's an option worth keeping in mind, as while it does
 > increase the ecosystem complexity in the near term (since default
 > behaviour may vary based on how you obtained your Python runtime), in
 > the longer term it can allow for better informed design decisions at
 > the reference interpreter level. (For business process wonks, it's
 > essentially like running through a deliberate divergence/convergence
 > cycle at the level of the entire language ecosystem:
 > http://theagilepirate.net/archives/1392 )

It's worse than "the entire language ecosystem" -- it's your whole
business.[1]  If the proposed change to getfilesystemencoding and file
system APIs creates issues at all, it matters because files on disk,
or other applications that receive bytes from Python, refer to
filenames encoded in the preferred encoding != UTF-8.  It's unlikely
in the extreme that all such files are exclusively used by Python,
which at best means individual users will need to manage encodings
file by file.  At worst, some of the filenames so encoded will be
shared with applications that expect the preferred encoding, and then
you've got a war on your hands.

 > > On the other hand, having code opt-in or out of the new handling
 > > requires changing code (which is presumably not going to happen,
 > > or we wouldn't consider keeping the old behaviour and/or letting
 > > the user control it),

I don't understand why this argument doesn't cut both ways equally.
If you believe that, you should also believe that the same people who
won't change code to opt in also won't use a Python containing fix #1,
and may not install it at all.  Doesn't that matter?

 > I think you'll want to escalate this to a PEP as well

+1 for the reasons Nick gives.  The conclusions of this discussion
need a canonical URL.


Footnotes: 
[1]  I'm assuming that readers are going to associated "language" <-->
"Python".  The blog post Nick refers to is about the whole business.




More information about the Python-Dev mailing list