[issue10952] Don't normalize module names to NFKC?

Thu Jan 20 23:23:22 CET 2011

STINNER Victor <victor.stinner at haypocalc.com> added the comment:

> A packaging mechanism that prepares code developed on a Latin-1
> filesystem for distribution, would have to NFKC-normalize 
> filenames before encoding them using UTF-8.

It causes portability issues: if you copy a non-ASCII module on a new
host, the program will work or not depending on the filesystem encoding.
Having to transform the filename when you copy a file, just to fix a
corner case, is a pain.

> One possible solution to this problem is to define a 'compat' error
> handler that would detect unencodable strings with encodable
> compatibility equivalents and produce encoding of an NFKC equivalent
> string instead of raising an error.

Only few people use non-ASCII module names and most operating systems
are able to store all Unicode characters, so I don't think that we need
to support U+00B5 in a module name with Latin1 filesystem at all. If you
use an old system using Latin1 filesystem, you have to limit your
expectation on Python unicode support :-)

os.fsencode() and os.fsdecode() already use a custom error handler:
surrogateescape. compat will conflict with surrogateescape. Loading a
module concatenates two parts: a path from sys.path (decoded from the
filesystem encoding and surrogateescape error handler) and a module
name. If custom is used to encode the filename, the module name will be
encoded correctly, but not the path.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10952>
_______________________________________