[Python-Dev] Unicode and the Windows file system.

Mark Hammond MarkH@ActiveState.com
Mon, 19 Mar 2001 20:40:24 +1100


I understand the issue of "default Unicode encoding" is a loaded one,
however I believe with the Windows' file system we may be able to use a
default.

Windows provides 2 versions of many functions that accept "strings" - one
that uses "char *" arguments, and another using "wchar *" for Unicode.
Interestingly, the "char *" versions of function almost always support
"mbcs" encoded strings.

To make Python work nicely with the file system, we really should handle
Unicode characters somehow.  It is not too uncommon to find the "program
files" or the "user" directory have Unicode characters in non-english
version of Win2k.

The way I see it, to fix this we have 2 basic choices when a Unicode object
is passed as a filename:
* we call the Unicode versions of the CRTL.
* we auto-encode using the "mbcs" encoding, and still call the non-Unicode
versions of the CRTL.

The first option has a problem in that determining what Unicode support
Windows 95/98 have may be more trouble than it is worth.  Sticking to purely
ascii versions of the functions means that the worst thing that can happen
is we get a regular file-system error if an mbcs encoded string is passed on
a non-Unicode platform.

Does anyone have any objections to this scheme or see any drawbacks in it?
If not, I'll knock up a patch...

Mark.