[issue5604] imp.find_module() mixes UTF8 and MBCS

Ezio Melotti report at bugs.python.org
Thu Feb 18 14:23:18 CET 2010


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

Also the test has a few problems:
1) the keys of known_locales are lowercase, but locale_encoding = locale.getpreferredencoding() can return uppercase encodings (e.g. UTF-8);
2) this masks another error: the b'\xe4' is not a valid utf-8 byte and it can be decoded;
3) the test should be skipped properly if the preferred encoding is not among the ones of the known_locales dict;
4) the 'encoded_char' should be 'decoded_char'.

It seems that in the failure linked by Florent, find_module tries to encode the filename with the wrong encoding and with error='replace' and the char at the end of 'test_imp_helper_' is converted to U+FFFD.
If the file is created with the correct name (e.g. 'test_imp_helper_\xc0') and find_module tries to search the wrong name (i.e. 'test_imp_helper_\ufffd'), then the error is raised (but then cp1252 used by the terminal can't encode that char and the second exception is raised).
Now the tests are run on C: and the filesystem encoding might be different, so it might not match anymore the encoding returned by locale.getpreferredencoding().

----------
resolution:  -> fixed
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5604>
_______________________________________


More information about the Python-bugs-list mailing list