[Python-Dev] Unicode and the Windows file system.

Tim Peters tim_one@email.msn.com
Tue, 20 Mar 2001 00:57:23 -0500


[Mark Hammond]
> * os.listdir() returns '\xe0test\xf2' for this file.

[Guido]
> I don't understand.  This is a Latin-1 string.  Can you explain again
> how the MBCS encoding encodes characters outside the Latin-1 range?

I expect this is a coincidence.  MBCS is a generic term for a large number of
distinct variable-length encoding schemes, one or more specific to each
language.  Latin-1 is a subset of some MBCS schemes, but not of others; Mark
was using a German mblocale, right?  Across MS's set of MBCS schemes, there's
little consistency:  a one-byte encoding in one of them may well be a "lead
byte" (== the first byte of a two-byte encoding) in another.

All this stuff is hidden under layers of macros so general that, if you code
it right, you can switch between compiling MBCS code on Win95 and Unicode
code on NT via setting one compiler #define.  Or that's what they advertise.
The multi-lingual Windows app developers at my previous employer were all
bald despite being no older than 23 <wink>.

ascii-boy-ly y'rs  - tim