unicode filenames

Neil Hodgson nhodgson at bigpond.net.au
Mon Feb 3 06:00:19 EST 2003


Andrew Dalke:

> And what happens when a remote file is mounted, say, from a MS
> Windows OS?  Are they represented as UTF-8?  Something else?
> Is that standardized or is it a property of the mount mechanism
> and can change accordingly?

   The default mount options I have seen turn the Unicode file names into
'?'s. However, with the a VFAT file system that has some Unicode file names
on my machine, mounting the partition from Linux with the utf8 option in
fstab:
/dev/hda5 /eff vfat auto,shortname=winnt,utf8,owner 0 0
   leads to UTF-8 strings being returned to user programs. Since Red Hat 8.0
defaults to UTF-8 locales, many programs such as Nautilus and the standard
GTK+ file open dialog display these file names correctly although some
characters are still not seen because the default UI fonts do not have all
the required characters. Still, European, Cyrillic, Greek, were OK and Asian
characters often displayed as boxes with codes inside.

>    if os.path.supports_unicode_filenames:
>      cwd = os.getcwdu()
>    else:
>      encoding = .. get default filesystem encoding ... or 'latin-1'
>      cwd = unicode(os.getcwd(), encoding)
>
> Ugly .. quite ugly.  And suggestions on the proper way to
> handle this is not documented as far as I can find.

   Yes, it is ugly but I don't know how to handle this well on Unix. In my
above example there is one partition mounted in UTF-8 mode but other
partitions could be using other encodings. I imagine there is some way to
reach the mount options for a given directory...

   Neil






More information about the Python-list mailing list