[Python-Dev] Unicode Imports
"Martin v. Löwis"
martin at v.loewis.de
Sat Sep 9 21:16:45 CEST 2006
David Hopwood schrieb:
> On Windows, file system pathnames can contain arbitrary Unicode characters
> (well, almost). Despite the existence of "ANSI" filesystem APIs, and
> regardless of what 'sys.getfilesystemencoding()' returns, the underlying
> file system encoding for NTFS and FAT filesystems is UTF-16LE.
>
> Thus, either:
> - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding
> on Windows is a bug, or
> - any program that relies on sys.getfilesystemencoding() being able to
> encode arbitrary Windows pathnames has a bug.
>
> We need to decide which of these is the case.
There is a third option:
- the operating system has a bug
It is actually this option that rules out the other two.
sys.getfilesystemencoding() returns "mbcs" on Windows, which means
CP_ACP. The file system encoding is an encoding that converts a
file name into a byte string. Unfortunately, on Windows, there are
file names which cannot be converted into a byte string in a standard
manner. This is an operating system bug (or mis-design; they should
have chosen UTF-8 as the byte encoding of file names, instead of
making it depend on the system locale, but they of course did so
for backwards compatibility with Windows 3.1 and 9x).
As a side note: every encoding in Python is a Unicode encoding;
so there aren't any "non-Unicode encodings".
Programs that rely on sys.getfilesystemencoding() being able to
represent arbitrary file names on Windows might have a bug;
programs that rely on sys.getfilesystemencoding() being able
to encode all elements of sys.path do not (atleast not for
Python 2.5 and earlier).
Regards,
Martin
More information about the Python-Dev
mailing list