[Python-Dev] RE: Unicode and the Windows file system.

Mark Hammond MarkH@ActiveState.com
Tue, 20 Mar 2001 10:57:21 +1100


OK - it appears everyone agrees we should go the "Unicode API" route.  I
actually thought my scheme did not preclude moving to this later.

This is a much bigger can of worms than I have bandwidth to take on at the
moment.  As Martin mentions, what will os.listdir() return on Win9x vs
Win2k?  What does passing a Unicode object to a non-Unicode Win32 platform
mean? etc.  How do Win95/98/ME differ in their Unicode support?  Do the
various service packs for each of these change the basic support?

So unfortunately this simply means the status quo remains until someone
_does_ have the time and inclination.  That may well be me in the future,
but is not now.  It also means that until then, Python programmers will
struggle with this and determine that they can make it work simply by
encoding the Unicode as an "mbcs" string.  Or worse, they will note that
"latin1 seems to work" and use that even though it will work "less often"
than mbcs.  I was simply hoping to automate that encoding using a scheme
that works "most often".

The biggest drawback is that by doing nothing we are _encouraging_ the user
to write broken code.  The way things stand at the moment, the users will
_never_ pass Unicode objects to these APIs (as they dont work) and will
therefore manually encode a string.  To my mind this is _worse_ than what my
scheme proposes - at least my scheme allows Unicode objects to be passed to
the Python functions - python may choose to change the way it handles these
in the future.  But by forcing the user to encode a string we have lost
_all_ meaningful information about the Unicode object and can only hope they
got the encoding right.

If anyone else decides to take this on, please let me know.  However, I fear
that in a couple of years we may still be waiting and in the meantime people
will be coding hacks that will _not_ work in the new scheme.

c'est-la-vie-ly,

Mark.