unicode filenames
Ganesan R
rganesan at myrealbox.com
Mon Feb 3 06:45:21 EST 2003
>>>>> "Alex" == Alex Martelli <aleax at aleax.it> writes:
> Neil Hodgson wrote:
>> Alex Martelli:
>>
>>> Similar considerations apply for any other multibyte encoding
>>> (such as, UTF-8) that is NOT specifically and carefully
>>> designed to avoid ever needing a byte of value 47 (0x2F) in
>>> order to represent ANY character except a slash. I am not
>>> aware of any such multi-byte encoding -- there may be some,
>>> but, even if one can be found, using it would still fall WELL
>>> short of "any other encoding whatsoever" as you claimed.
>>
>> UTF-8 is a superset of ASCII. A slash has the same representation in
>> UTF-8 as ASCII. No multi-byte UTF-8 character may contain a byte < 128.
> Ah! Wonderful, thanks -- and clearly this was one crucial
> point I was missing: UTF-8 *IS* "specifically and carefully
> designed to avoid ever needing a byte of value 47 (0x2F) in
> order to represent ANY character except a slash" (among
> other things;-), and therefore _IS_ usable as the encoding
> of Unicode names on a non-Unicode-aware Unix system.
Indeed. UTF-8 had it's origin in Plan 9 (if I remember correctly) as
a "File System Safe" unicode tranformation formation. You can find a
document titled FSS-UTF on the net.
Ganesan
--
Ganesan R
More information about the Python-list
mailing list