[Python-ideas] Fix default encodings on Windows

eryk sun eryksun at gmail.com
Mon Aug 15 21:19:03 EDT 2016


On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower <steve.dower at python.org> wrote:
>
> (Frankly I don't mind what encoding we use, and I'd be quite happy to force bytes
> paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate
> pairs. But that would prevent basic manipulation which seems to be a higher
> priority.)

The CRT manually decodes and encodes using the private functions
__acrt_copy_path_to_wide_string and __acrt_copy_to_char. These use
either the ANSI or OEM codepage, depending on the value returned by
WinAPI AreFileApisANSI. CPython could follow suit. Doing its own
encoding and decoding would enable using filesystem functions that
will never get an [A]NSI version (e.g. GetFileInformationByHandleEx),
while still retaining backward compatibility.

Filesystem encoding could use WC_NO_BEST_FIT_CHARS and raise a warning
when lpUsedDefaultChar is true. Filesystem decoding could use
MB_ERR_INVALID_CHARS and raise a warning and retry without this flag
for ERROR_NO_UNICODE_TRANSLATION (e.g. an invalid DBCS sequence). This
could be implemented with a new "warning" handler for
PyUnicode_EncodeCodePage and PyUnicode_DecodeCodePageStateful. A new
'fsmbcs' encoding could be added that checks AreFileApisANSI to choose
betwen CP_ACP and CP_OEMCP.


More information about the Python-ideas mailing list