[Python-Dev] Unicode and the Windows file system.

Mark Hammond MarkH@ActiveState.com
Mon, 19 Mar 2001 21:53:01 +1100


Sorry, I notice I didn't answer your specific question:

> Also, what would os.listdir() return ? Unicode strings or 8-bit
> strings ?

This would not change.

This is what my testing shows:

* I can switch to a German locale, and create a file using the keystrokes
"`atest`o".  The "`" is the dead-char so I get an umlaut over the first and
last characters.

* os.listdir() returns '\xe0test\xf2' for this file.

* That same string can be passed to "open" etc to open the file.

* The only way to get that string to a Unicode object is to use the
encodings "Latin1" or "mbcs".  Of them, "mbcs" would have to be safer, as at
least it has a hope of handling non-latin characters :)

So - assume I am passed a Unicode object that represents this filename.  At
the moment we simply throw that exception if we pass that Unicode object to
open().  I am proposing that "mbcs" be used in this case instead of the
default "ascii"

If nothing else, my idea could be considered a "short-term" solution.  If
ever it is found to be a problem, we can simply move to the unicode APIs,
and nothing would break - just possibly more things _would_ work :)

Mark.