unicode filenames

Neil Hodgson nhodgson at bigpond.net.au
Tue Feb 4 07:02:31 EST 2003


Andrew Dalke:

> Okay, so it seems like no one knows how to handle unicode filenames
> under Unix.  Perhaps the following is the proper behaviour?
>
> ...
>   2) there is a registration system which is used to define encodings
>       used for different mount locations.  If a filename/dirname is
>       not covered, sue the default filesystem encoding

   The encoding registry uses byte strings.

   I'd hope there would be an attempt to discover file systems encodings
automatically such as reading /etc/fstab to find the utf8 flag mentioned.
Some Unix distributions (MacOS X, Red Hat 8.0) seem to be moving towards
making UTF-8 be the only exposed file system encoding.

> def listdir(dirname):
>    if not isinstance(dirname, unicode):
>      return os.listdir(dirname)
>    encoding = filesystem_encodings.lookup(os.path.abspath(dirname))

   How does os.path.abspath deal with a Unicode string?

 > If this makes sense, should it be added to Python's core?

   There are quite a few calls that need to change - from the file
constructor to stat ...

   To be robust it needs to deal with multiple encodings in a path.

   Neil






More information about the Python-list mailing list