[Python-Dev] File system path encoding on Windows
Steve Dower
steve.dower at python.org
Tue Aug 30 14:04:45 EDT 2016
On 30Aug2016 0806, Victor Stinner wrote:
> 2016-08-30 16:31 GMT+02:00 Steve Dower <steve.dower at python.org>:
>> It's the
>> random user on Windows who installed their library that has the problem.
>> They don't know the fix, and may not know how to apply it (e.g. if it's
>> their Jupyter notebook that won't find one of their files - no obvious
>> command line options here).
>
> There is already a DeprecationWarning. Sadly, it's hidden by default:
> you need a debug build of Python or more simply to pass -Wd command
> line option.
It also only appears on Windows, so developers who do the right thing on
POSIX never find out about it. Your average user isn't going to see it -
they'll just see the OSError when their file is not found due to the
lossy encoding.
> Maybe we should make this warning (Deprecation warning on bytes paths)
> visible by default, or add a new warning suggesting to enable -X utf8
> the first time a Python function gets a byte string (like a filename)?
The more important thing in my opinion is to make it visible on all
platforms, regardless of whether bytes paths are suitable or not. But
this will probably be seen as hostile by the majority of open-source
Python developers, which is why I'd rather just quietly fix the
incompatibility.
>> Any system that requires communication between two different versions of
>> Python must have install instructions (if it's public) or someone who
>> maintains it. It won't magically break without an upgrade, and it should not
>> get an upgrade without testing. The environment variable is available for
>> this kind of scenario, though I'd hope the testing occurs during beta and it
>> gets fixed by the time we release.
>
> I disagree that breaking backward compatibility is worth it. Most
> users don't care of Unicode since their application already "just
> works well" for their use case.
Again, the problem is libraries (code written by someone else that you
want to reuse), not applications (code written by you to solve your
business problem in your environment). Code that assumes the default
encodings are sufficient is already broken in the general case, and
libraries nearly always need to cover the general case while
applications do not. The stdlib needs to cover the general case, which
is why I keep using open(os.listdir(b'.')[-1]) as an example of
something that should never fail because of encoding issues.
In theory, we should encourage library developers to support Windows
properly by using str for paths, probably by disabling bytes paths
everywhere. Alternatively, we make it so that bytes paths work fine
everywhere and stop telling people that their code is wrong for a
platform they're already not hugely concerned about.
> Having to set an env var to "repair" their app to be able to upgrade
> Python is not really convenient.
Upgrading Python in an already running system isn't going to be really
convenient anyway. Going from x.y.z to x.y.z+1 should be convenient, but
from x.y to x.y+1 deserves testing and possibly code or environment
changes. I don't understand why changing Python at the same time we
change the version number is suddenly controversial.
Cheers,
Steve
More information about the Python-Dev
mailing list