[Python-Dev] Unicode and the Windows file system.

M.-A. Lemburg mal@lemburg.com
Mon, 19 Mar 2001 12:17:18 +0100


Mark Hammond wrote:
> 
> Sorry, I notice I didn't answer your specific question:
> 
> > Also, what would os.listdir() return ? Unicode strings or 8-bit
> > strings ?
> 
> This would not change.
> 
> This is what my testing shows:
> 
> * I can switch to a German locale, and create a file using the keystrokes
> "`atest`o".  The "`" is the dead-char so I get an umlaut over the first and
> last characters.
> 
> * os.listdir() returns '\xe0test\xf2' for this file.
> 
> * That same string can be passed to "open" etc to open the file.
> 
> * The only way to get that string to a Unicode object is to use the
> encodings "Latin1" or "mbcs".  Of them, "mbcs" would have to be safer, as at
> least it has a hope of handling non-latin characters :)
> 
> So - assume I am passed a Unicode object that represents this filename.  At
> the moment we simply throw that exception if we pass that Unicode object to
> open().  I am proposing that "mbcs" be used in this case instead of the
> default "ascii"
> 
> If nothing else, my idea could be considered a "short-term" solution.  If
> ever it is found to be a problem, we can simply move to the unicode APIs,
> and nothing would break - just possibly more things _would_ work :)

Sounds like a good idea. We'd only have to assure that whatever
os.listdir() returns can actually be used to open the file, but that
seems to be the case... at least for Latin-1 chars (I wonder how
well this behaves with Japanese chars).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Pages:                           http://www.lemburg.com/python/