unicode filenames
Piet van Oostrum
piet at cs.uu.nl
Sun Feb 16 13:24:54 EST 2003
>>>>> David Eppstein <eppstein at ics.uci.edu> (DE) wrote:
DE> Under Mac OS X, the shell displays text (e.g. from cat, or from ls
DE> without the -q option) as utf-8 by default, and the Finder (gui file
DE> browser) uses utf-8 for accented characters in file names. So I infer
DE> that the correct interpretation of filenames under my OS is utf-8.
DE> But other unixes may differ...
On Mac OS X, it is a bit more complicated. First cat will indeed show the
unicode (utf-8) contents of a file, but ls won't display filenames with
non-ASCII characters right. At least not in 10.1.5. Maybe 10.2 does it better.
Like if my filename is "€200", ls will display "???200".
Secondly, the filesystem requires the unicode characters to be normalized,
which means that accented characters like "é" will be broken up into "e"
followed by "´". So if the finder has a file with name "é200", the bytes
used in the filename will be 0x65 followed by 0xCC 0x81 (unicode character
0x301). ls will print this as "e??200".
And in the shell I can't even type a € sign or é. That, however, is a
problem of the Terminal application, as I can do it in emacs.
Although ... aftre I tried it out, and wanted to send this article out, my
emacs crashed (fortunately after saving it).
--
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl
More information about the Python-list
mailing list