[Python-Dev] RE: Unicode and the Windows file system.

M.-A. Lemburg mal@lemburg.com
Wed, 21 Mar 2001 11:46:01 +0100


Mark Hammond wrote:
> 
> OK - it appears everyone agrees we should go the "Unicode API" route.  I
> actually thought my scheme did not preclude moving to this later.
> 
> This is a much bigger can of worms than I have bandwidth to take on at the
> moment.  As Martin mentions, what will os.listdir() return on Win9x vs
> Win2k?  What does passing a Unicode object to a non-Unicode Win32 platform
> mean? etc.  How do Win95/98/ME differ in their Unicode support?  Do the
> various service packs for each of these change the basic support?
> 
> So unfortunately this simply means the status quo remains until someone
> _does_ have the time and inclination.  That may well be me in the future,
> but is not now.  It also means that until then, Python programmers will
> struggle with this and determine that they can make it work simply by
> encoding the Unicode as an "mbcs" string.  Or worse, they will note that
> "latin1 seems to work" and use that even though it will work "less often"
> than mbcs.  I was simply hoping to automate that encoding using a scheme
> that works "most often".
> 
> The biggest drawback is that by doing nothing we are _encouraging_ the user
> to write broken code.  The way things stand at the moment, the users will
> _never_ pass Unicode objects to these APIs (as they dont work) and will
> therefore manually encode a string.  To my mind this is _worse_ than what my
> scheme proposes - at least my scheme allows Unicode objects to be passed to
> the Python functions - python may choose to change the way it handles these
> in the future.  But by forcing the user to encode a string we have lost
> _all_ meaningful information about the Unicode object and can only hope they
> got the encoding right.
> 
> If anyone else decides to take this on, please let me know.  However, I fear
> that in a couple of years we may still be waiting and in the meantime people
> will be coding hacks that will _not_ work in the new scheme.

Ehm, AFAIR, the Windows CRT APIs can take MBCS character input,
so why don't we go that route first and then later switch on
to full Unicode support ?

After all, I added the "es#" parser markers because you bugged me about
wanting to use them for Windows in the MBCS context -- you even
wrote up the MBCS codec... all this code has to be good for 
something ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Pages:                           http://www.lemburg.com/python/