unicode filenames

Andrew Dalke adalke at mindspring.com
Thu Feb 6 03:30:01 EST 2003


Neil Hodgson wrote:
>    Red Hat defaults the locale to <something>.UTF-8 whch on my machine is
> en_US.UTF-8.

My default LANG is "en_US" which is, I believe, a Latin-1 encoding.
I'm running RH 7.2.  You say you use 8.0....

Yep, looks like
http://www.redhat.com/docs/manuals/linux/RHL-8.0-Manual/release-notes/x86/
concurs that that was an 8.0 change

] Red Hat Linux now installs using UTF-8 (Unicode) locales by default in
]  languages other than Chinese, Japanese, or Korean.
]
] This has been known to cause various issues:
]
] · Line drawing characters in applications such as make menuconfig
]  do not always appear correctly in certain locales.
]
] · On the console, the latarcyrheb-sun16 font is used for best Unicode
]  coverage. Due to the use of this font, bold colors are not available.
]
] · Certain third party applications, such as the Adobe® Acrobat
]  Reader®, may not function correctly (or crash upon startup) because
]  they lack support for Unicode locales. Until third party developers
]  provide such support in their products, you may work around this
]  issue by setting the LANG environment variable at the shell prompt to
]  C prior to typing the application name. For example:
]
] env LANG=C acroread

Hence my difficulties stem partially from using a too-old RH install.


>    A screenshot from Red Hat Linux 8.0 with, on the left, Nautilus showing a
> directory on VFAT where the Windows script was run (displaying the ASCII,
> European, and Cyrillic well, the Greek with one problem on an accented
> character, the Hebrew invisibly, and the Japanese and Chinese as code
> blocks), Nautilus showing a directory on ext3 where the Linux script was run
> (similar to VFAT case), an ls in a console (Cyrillic and European are
> displayed well). On the right hand side are two editors, gedit is a GNOME 2
> application so works similarly to Nautilus; SciTE is a GTK+ 1.x application
> with some Unicode fontset support.
>    http://scintilla.sourceforge.net/linuxss.png

What does 'os.listdir()' do for that directory?  I assume it's the byte
strings, which means I need to do the UTF-8 conversion myself, which
means dealing with unicode filenames on non-MS Windows machines is still
complicated for Python.

At the very least, it's more confusing than I prefer dealing with.

					Andrew
					dalke at dalkescientific.com





More information about the Python-list mailing list