[Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8

Nick Coghlan ncoghlan at gmail.com
Sat Sep 3 12:27:44 EDT 2016


On 4 September 2016 at 00:49, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 2 September 2016 at 08:31, Steve Dower <steve.dower at python.org> wrote:
>> This proposal would remove all use of the *A APIs and only ever call the *W
>> APIs. When Windows returns paths to Python as str, they will be decoded from
>> utf-16-le and returned as text (in whatever the minimal representation is).
>> When
>> Windows returns paths to Python as bytes, they will be decoded from
>> utf-16-le to
>> utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it
>> is
>> possible to have invalid surrogates in filenames). Equally, when paths are
>> provided as bytes, they are decoded from utf-8 into utf-16-le and passed to
>> the
>> *W APIs.
>
> The overall proposal looks good to me, there's just a terminology
> glitch here: utf-8 <-> utf-16-le should either be described as
> transcoding, or else as decoding and then re-encoding. As they're both
> text codecs, there's no "decoding" operation that switches between
> them.

After also reading the Windows console encoding PEP, I realised
there's a couple of missing discussions here regarding the impacts on
sys.argv, os.environ, and os.environb.

The reason that's relevant is that "sys.getfilesystemencoding" is a
bit of a misnomer, as it's also used to determine the assumed encoding
of command line arguments and environment variables.

With the PEP currently stating that all use of the "*A" Windows APIs
will be removed, I'm guessing these will just start working as
expected, but it should be convered explicitly.

In addition, if the subprocess module is going to be excluded from
these changes, that should be called out explicitly (Keeping in mind
that on *nix, the only subprocess pipe configurations that are
straightforward to set up in Python 3 are raw binary mode and
universal newlines mode, with the latter implicitly treating the pipes
as UTF-8 text)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list