unicode filenames

Ganesan R rganesan at myrealbox.com
Mon Feb 3 06:45:21 EST 2003


>>>>> "Alex" == Alex Martelli <aleax at aleax.it> writes:

> Neil Hodgson wrote:
>> Alex Martelli:
>> 
>>> Similar considerations apply for any other multibyte encoding
>>> (such as, UTF-8) that is NOT specifically and carefully
>>> designed to avoid ever needing a byte of value 47 (0x2F) in
>>> order to represent ANY character except a slash.  I am not
>>> aware of any such multi-byte encoding -- there may be some,
>>> but, even if one can be found, using it would still fall WELL
>>> short of "any other encoding whatsoever" as you claimed.
>> 
>> UTF-8 is a superset of ASCII. A slash has the same representation in
>> UTF-8 as ASCII. No multi-byte UTF-8 character may contain a byte < 128.

> Ah!  Wonderful, thanks -- and clearly this was one crucial
> point I was missing: UTF-8 *IS* "specifically and carefully 
> designed to avoid ever needing a byte of value 47 (0x2F) in 
> order to represent ANY character except a slash" (among
> other things;-), and therefore _IS_ usable as the encoding
> of Unicode names on a non-Unicode-aware Unix system.

Indeed. UTF-8 had it's origin in Plan 9 (if I remember correctly) as
a "File System Safe" unicode tranformation formation. You can find a
document titled FSS-UTF on the net.

Ganesan

-- 
Ganesan R





More information about the Python-list mailing list