[Python-ideas] Fix default encodings on Windows

Paul Moore p.f.moore at gmail.com
Tue Aug 16 06:53:12 EDT 2016


On 15 August 2016 at 19:26, Steve Dower <steve.dower at python.org> wrote:
> Passing path_as_bytes in that location has been deprecated since 3.3, so we
> are well within our rights (and probably overdue) to make it a TypeError in
> 3.6. While it's obviously an invalid assumption, for the purposes of
> changing the language we can assume that no existing code is passing bytes
> into any functions where it has been deprecated.
>
> As far as I'm concerned, there are currently no filesystem APIs on Windows
> that accept paths as bytes.

[...]

On 16 August 2016 at 03:00, Nick Coghlan <ncoghlan at gmail.com> wrote:
> The problem is that bytes-as-paths actually *does* work for Mac OS X
> and systemd based Linux distros properly configured to use UTF-8 for
> OS interactions. This means that a lot of backend network service code
> makes that assumption, especially when it was originally written for
> Python 2, and rather than making it work properly on Windows, folks
> just drop Windows support as part of migrating to Python 3.
>
> At an ecosystem level, that means we're faced with a choice between
> implicitly encouraging folks to make their code *nix only, and finding
> a way to provide a more *nix like experience when running on Windows
> (where UTF-8 encoded binary data just works, and either other
> encodings lead to mojibake or else you use chardet to figure things
> out).
>
> Steve is suggesting that the latter option is preferable, a view I
> agree with since it lowers barriers to entry for Windows based
> developers to contribute to primarily *nix focused projects.

So does this mean that you're recommending reverting the deprecation
of bytes as paths in favour of documenting that bytes as paths is
acceptable, but it will require an encoding of UTF-8 rather than the
current behaviour? If so, that raises some questions:

1. Is it OK to backtrack on a deprecation by changing the behaviour
like this? (I think it is, but others who rely on the current,
deprecated, behaviour may not).
2. Should we be making "always UTF-8" the behaviour on all platforms,
rather than just Windows (e.g., Unix systems which haven't got UTF-8
as their locale setting)? This doesn't seem to be a Windows-specific
question any more (I'm assuming that if bytes-as-paths are deprecated,
that's a cross-platform change, but see below).

Having said all this, I can't find the documentation stating that
bytes paths are deprecated - the open() documentation for 3.5 says
"file is either a string or bytes object giving the pathname (absolute
or relative to the current working directory) of the file to be opened
or an integer file descriptor of the file to be wrapped" and there's
no mention of a deprecation. Steve - could you provide a reference?

Paul


More information about the Python-ideas mailing list