[Python-ideas] Fix default encodings on Windows

Steve Dower steve.dower at python.org
Thu Aug 18 11:54:26 EDT 2016


On 18Aug2016 0829, Chris Angelico wrote:
> The second call to glob doesn't have any Unicode characters at all,
> the way I see it - it's all bytes. Am I completely misunderstanding
> this?

You're not the only one - I think this has been the most common 
misunderstanding.

On Windows, the paths as stored in the filesystem are actually all text 
- more precisely, utf-16-le encoded bytes, represented as 16-bit 
characters strings.

Converting to an 8-bit character representation only exists for 
compatibility with code written for other platforms (either Linux, or 
much older versions of Windows). The operating system has one way to do 
the conversion to bytes, which Python currently uses, but since we 
control that transformation I'm proposing an alternative conversion that 
is more reliable than compatible (with Windows 3.1... shouldn't affect 
compatibility with code that properly handles multibyte encodings, which 
should include anything developed for Linux in the last decade or two).

Does that help? I tried to keep the explanation short and focused :)

Cheers,
Steve


More information about the Python-ideas mailing list