[issue45120] Windows cp encodings "UNDEFINED" entries update

Fri Sep 17 03:34:38 EDT 2021

Marc-Andre Lemburg <mal at egenix.com> added the comment:

Just to be clear: The Python code page encodings are (mostly) taken from the unicode.org set of mappings (ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/). This is our standards body for such mappings, where possible. In some cases, the Unicode consortium does not provide such mappings and we resort to other standards (ISO, commonly used mapping files in OSes, Wikipedia, etc).

Changes to the existing mapping codecs should only be done in case corrections are applied to the mappings under those names by the standard bodies.

If you want to add variants such as the best fit ones from MS, we'd have to add them under a different name, e.g. bestfit1252 (see ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/).

Otherwise, interop with other systems would no longer.

>From Eryk's description it sounds like we should always add WC_NO_BEST_FIT_CHARS as an option to MultiByteToWideChar() in order to make sure it doesn't use best fit variants unless explicitly requested.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45120>
_______________________________________