Filename type (Was: Re: finding file size)
Martin v. Loewis
martin at v.loewis.de
Sun Jan 4 10:19:48 EST 2004
Gerrit Holl wrote:
>>Are unicode filenames something we should care about?
>
>
> That's a difficult issue. I don't know how to solve that.
It depends on the platform. There are:
1. platforms on which Unicode is the natural string type
for file names, with byte strings obtained by conversion
only. On these platforms, all filenames can be represented
by a Unicode string, but some file names cannot
be represented by a byte string.
Windows NT+ is the class of such systems.
2. platforms on which Unicode and byte string filenames
work equally well; they can be converted forth and
back without any loss of accuracy or expressiveness.
OS X is one such example; probably Plan 9 as well.
3. platforms on which byte strings are the natural string
type for filenames. They often have only a weak notion
of file name encoding, causing
a) not all Unicode strings being available as filenames
b) not all byte string filenames being convertible to
Unicode
c) the conversion may depend on user settings, so for
the same file, Unicode conversion may give different
results for different users.
POSIX systems fall in this category.
So if filenames where a datatype, I think they should be
able to use both Unicode strings and byte strings as their
own internal representation, and declare one of the two
as "accurate". Conversion of filenames to both Unicode
strings and byte strings should be supported, but may
fail at runtime (unless conversion into the "accurate"
type is attempted).
Regards,
Martin
More information about the Python-list
mailing list