[Python-Dev] Unicode strings as filenames

Martin v. Loewis martin@v.loewis.de
Fri, 4 Jan 2002 00:16:29 +0100


>    I want to be able to open all files on my English W2K install and can
> with many applications even if some have Chinese names and some have
> Russian. The big advance W2K made over NT was to only have one real version
> of the OS instead of multiple language versions. 

I understand all that, but I can't agree with all your conclusions.

>    Locales are a really poor choice for people who need to operate in
> multiple languages and much software is moving to allowing concurrent use of
> multiple languages through the use of Unicode. 

On Windows, locales and Unicode don't contradict each other. You can
create files through the locale's code page, and they still end up on
disk correctly. This is a much better situation than you have on Unix.

In any case, there is no alternative. Locales may be good or bad - you
must follow system conventions, if you want to write usable software.

> > To my knowledge, VFAT32 doesn't - only NTFS does (which is not
> > available on W9x).
> 
>    I have a file called u"C:\\z\u0439\u0446.html" on my W2K FAT partition
> which displays correctly in the explorer and can be opened in, for example,
> notepad.

Oops, you are right - the long file name is in Unicode. It is only
when you do not have a long file name that the short one is
interpreted in OEM encoding.

> >>> import glob
> >>> glob.glob("C:\\*.html")
> ['C:\\l2.html', 'C:\\list.html', 'C:\\m4.html', 'C:\\x.html',
> 'C:\\z??.html']
> >>> for i in glob.glob("C:\\*.html"):
> ...    f = open(i)
> ...
> Traceback (most recent call last):
>   File "<stdin>", line 2, in ?
> IOError: [Errno 22] Invalid argument: 'C:\\z??.html'

I agree this is unfortunate; patches are welcome. Please notice that
the strategy of using wchar_t API on Windows has explicitly been
considered and rejected, for the complexity of the code changes
involved. So anybody proposing a patch would need to make it both
useful, and easy to maintain. With these constraints, the current
implementation is the best thing Mark could come up with.

Software always has limitations, which are removed only if somebody is
bothered so much as to change the software.

Regards,
Martin