[Python-Dev] Unicode and the Windows file system.

Neil Hodgson nhodgson@bigpond.net.au
Mon, 19 Mar 2001 23:06:40 +1100


Mark Hammond:

> To make Python work nicely with the file system, we really
> should handle Unicode characters somehow.  It is not too
> uncommon to find the "program files" or the "user" directory
> have Unicode characters in non-english version of Win2k.

   The "program files" and "user" directory should still have names
representable in the normal locale used by the user so they are able to
access them by using their standard encoding in a Python narrow character
string to the open function.

> The way I see it, to fix this we have 2 basic choices when a Unicode
object
> is passed as a filename:
> * we call the Unicode versions of the CRTL.

   This is by far the better approach IMO as it is more general and will
work for people who switch locales or who want to access files created by
others using other locales. Although you can always use the horrid mangled
"*~1" names.

> * we auto-encode using the "mbcs" encoding, and still call the non-Unicode
> versions of the CRTL.

   This will improve things but to a lesser extent than the above. May be
the best possible on 95.

> The first option has a problem in that determining what Unicode support
> Windows 95/98 have may be more trouble than it is worth.

    None of the *W file calls are listed as supported by 95 although Unicode
file names can certainly be used on FAT partitions.

> * I can switch to a German locale, and create a file using the
> keystrokes "`atest`o".  The "`" is the dead-char so I get an
> umlaut over the first and last characters.

   Its more fun playing with a non-roman locale, and one that doesn't fit in
the normal Windows code page for this sort of problem. Russian is reasonably
readable for us English speakers.

M.-A. Lemburg:
> I don't know if this is an issue (can there
> be more than one encoding per process ?

   There is an input locale and keyboard layout per thread.

> is the encoding a user or system setting ?

   There are system defaults and a menu through which you can change the
locale whenever you want.

> Also, what would os.listdir() return ? Unicode strings or 8-bit
> strings ?

   There is the Windows approach of having an os.listdirW() ;) .

   Neil