[Python-Dev] Unicode strings as filenames

Neil Hodgson nhodgson@bigpond.net.au
Tue, 8 Jan 2002 18:46:01 +1100


Martin:

> I'd be all in favour of bringing ntmodule back into life, especially
> if that is to become a module that does not need to work on
> Win9x. Perhaps it can be compiled twice, once into w9x.pyd and once
> into nt.pyd, or the common code can be shared by means if #include.

   I reversed again, posixmodule now detects Unicode arguments and handles
them in UCS-2 rather than converting to UTF-8 and back again. This now looks
like the right way to me. The total amount of code bloat is about 8K over a
150K file and this doesn't appear to be too much for me.

   A check is made to see if the platform supports Unicode file names and if
it does not then the old conversion to Py_FileSystemDefaultEncoding is done.
This means that Windows 9x should work the same as it currently does. This
check is exposed as os.unicodefilenames() so that client code can decide
whether to use Unicode.

   For other OSs that can support Unicode file names, adiitional cases can
be added into posixmodule. The other platforms (OS X for example) may not
provide these functions as taking UCS-2 arguments but instead UTF-8
arguments. They should still work similarly to the NT code but encode into
UTF-8 before making system calls.

   The basic idea is that if you use a Unicode string for a file or path
name in a call then returned information is in Unicode strings.

> >    I'm feeling more like making f_name be wide now but I'd expect some
> > opposition now from backwards compatibility advocates.

   This is now done.

> I think the major problem is that performing repr on a file should
> work. If that turns out to use the repr of the string (can't check
> right now), instead of raising UnicodeErrors, my oposition to putting
> Unicode objects into file names is not that strong anymore.

   Changed the repr to display Unicode names using escapes so it does not
raise errors.

   _getfullpathname which is available from nt and is used in ntpath now
accepts a Unicode argument and then returns a Unicode path. Haven't checked
ntpath to see if it will work with Unicode.

   New code at
http://scintilla.sourceforge.net/winunichanges.zip

   After waiting a while for comments, I'll package this up as a patch.

   Neil