unicode filenames
Neil Hodgson
nhodgson at bigpond.net.au
Tue Feb 4 07:02:31 EST 2003
Andrew Dalke:
> Okay, so it seems like no one knows how to handle unicode filenames
> under Unix. Perhaps the following is the proper behaviour?
>
> ...
> 2) there is a registration system which is used to define encodings
> used for different mount locations. If a filename/dirname is
> not covered, sue the default filesystem encoding
The encoding registry uses byte strings.
I'd hope there would be an attempt to discover file systems encodings
automatically such as reading /etc/fstab to find the utf8 flag mentioned.
Some Unix distributions (MacOS X, Red Hat 8.0) seem to be moving towards
making UTF-8 be the only exposed file system encoding.
> def listdir(dirname):
> if not isinstance(dirname, unicode):
> return os.listdir(dirname)
> encoding = filesystem_encodings.lookup(os.path.abspath(dirname))
How does os.path.abspath deal with a Unicode string?
> If this makes sense, should it be added to Python's core?
There are quite a few calls that need to change - from the file
constructor to stat ...
To be robust it needs to deal with multiple encodings in a path.
Neil
More information about the Python-list
mailing list