[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

Inada Naoki report at bugs.python.org
Thu Oct 8 04:34:22 EDT 2020


Inada Naoki <songofacandy at gmail.com> added the comment:

> I think that it is more correct to use the locale encoding. If error messages are translated for readability, we should not ruin this by outputting \xXX.

* PyUnicode_DecodeLocale() doesn't support "backslashescape" error handler.
* Error message is usually encoded in locale encoding, but it is not guaranteed.
* Error message may contain path, it may be not locale encoding too.
* \xXX is far better than UnicodeDecodeError, anyway. We need to fix the UnicodeDecodeError first.
* non-UTF-8 locale is rare. We used this code for long time but we haven't reported this issue until now.

I don't against adding "backslashescape" to PyUnicode_DecodeLocale(). But to backport the bugfix for UnicodeDecodeError, change should be minimum.

So the main problem is: should we allow surrogateescape in error message?

For the record, PyUnicode_DecodeLocale() is using mbstowcs(). I don't know how reliable the function is in various platforms. That is why I had suggested PyUnicode_DecodeFSDefault() at first.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41894>
_______________________________________


More information about the Python-bugs-list mailing list