Filename type (Was: Re: finding file size)

Martin v. Loewis martin at v.loewis.de
Sun Jan 4 10:19:48 EST 2004


Gerrit Holl wrote:

>>Are unicode filenames something we should care about?
> 
> 
> That's a difficult issue. I don't know how to solve that. 

It depends on the platform. There are:

1. platforms on which Unicode is the natural string type
    for file names, with byte strings obtained by conversion
    only. On these platforms, all filenames can be represented
    by a Unicode string, but some file names cannot
    be represented by a byte string.
    Windows NT+ is the class of such systems.
2. platforms on which Unicode and byte string filenames
    work equally well; they can be converted forth and
    back without any loss of accuracy or expressiveness.
    OS X is one such example; probably Plan 9 as well.
3. platforms on which byte strings are the natural string
    type for filenames. They often have only a weak notion
    of file name encoding, causing
    a) not all Unicode strings being available as filenames
    b) not all byte string filenames being convertible to
       Unicode
    c) the conversion may depend on user settings, so for
       the same file, Unicode conversion may give different
       results for different users.
    POSIX systems fall in this category.

So if filenames where a datatype, I think they should be
able to use both Unicode strings and byte strings as their
own internal representation, and declare one of the two
as "accurate". Conversion of filenames to both Unicode
strings and byte strings should be supported, but may
fail at runtime (unless conversion into the "accurate"
type is attempted).

Regards,
Martin




More information about the Python-list mailing list