[Python-Dev] Unicode strings as filenames

Martin v. Loewis martin@v.loewis.de
Tue, 8 Jan 2002 21:52:27 +0100


>    I reversed again, posixmodule now detects Unicode arguments and handles
> them in UCS-2 rather than converting to UTF-8 and back again. This now looks
> like the right way to me. The total amount of code bloat is about 8K over a
> 150K file and this doesn't appear to be too much for me.

I agree. We still should keep "mbcs", so extension modules that don't
want to go through the troubles of special-casing Windows will be able
to get it right most of the time.

>    A check is made to see if the platform supports Unicode file names and if
> it does not then the old conversion to Py_FileSystemDefaultEncoding is done.
> This means that Windows 9x should work the same as it currently does. This
> check is exposed as os.unicodefilenames() so that client code can decide
> whether to use Unicode.

That has unclear semantics for me. It sounds like "if true, you can
pass Unicode strings to open etc." However, then it should return 1 on
all systems, since you always can - the default encoding may apply,
and restrict file names to ASCII. Or, it may mean "if true, you can
pass all Unicode strings to open". This is not true, either, because
there are always reserved characters (such as the path delimiter).

>    For other OSs that can support Unicode file names, adiitional cases can
> be added into posixmodule. The other platforms (OS X for example) may not
> provide these functions as taking UCS-2 arguments but instead UTF-8
> arguments. They should still work similarly to the NT code but encode into
> UTF-8 before making system calls.

I think this is not needed. Instead, using setting the file system
encoding to UTF-8 should be sufficient.

> After waiting a while for comments, I'll package this up as a patch.

Very good. Would you also write the PEP? If not, I will, but that may
take some time.

Regards,
Martin