[Python-Dev] Re: PEP 277: Unicode file name support for Windows NT, was PEP-time ? ...

Martin v. Loewis martin@v.loewis.de
Thu, 17 Jan 2002 12:42:21 +0100


> Sounds like the run-time error solution would at least "solve"
> the issue in terms of making it depend on the used file name
> and underlying OS or file system.

Such a solution is impossible to implement in some case. E.g. on
Windows, if you use the ANSI (*A) APIs to list the directory contents,
Windows will *silently* (AFAIK) give you incorrect file names, i.e. it
will replace unrepresentable characters with the replacement char
(QUESTION MARK).

OTOH, on Unix, there is a better approach for listdir and
unconvertable names: just return the byte strings to the user.

> I'd say: let the different file name based APIs try hard enough
> and then have them bail out if they can't handle the particular
> case.

That is a good idea. However, in case of the WinNT replacement
strategy, the application may still want to know.

Passing *in* Unicode objects is no issue at all: If they cannot be
converted to a reasonable file name, you clearly get an exception.

> > It turns out that only OS X really got it right: For each file, there
> > is both a byte string name, and a Unicode name.
> 
> I suppose this is due to the fact that Mac file systems store
> extended attributes (much like what OS/2 does too) along with the
> file -- that's a really nice way of being able to extend file
> system semantics on a per-file basis; much better than the Windows
> Registry or the MIME guess-by-extension mechanisms.

I'd assume it is different: They just *define* that all local file
systems they have control over use UTF-8 on disk, atleast for BSD ufs;
for HFS, it might be that they 'just know' what encoding is used on an
HFS partition. I doubt they use extended attributes for this, as they
reportedly return UTF-8 even for file systems they've never seen
before; this may be either due to static knowledge (e.g. that VFAT is
UCS-2LE), or through guessing.

It may be that there are also limitations and restrictions, but
atleast they remove the burden from the application.

Regards,
Martin