[Python-Dev] PEP 277 (unicode filenames): please review

M.-A. Lemburg mal@lemburg.com
Tue, 13 Aug 2002 18:54:49 +0200


Guido van Rossum wrote:
>>Guido van Rossum wrote:
>>
>>>But if you pass the normalized string (or the Latin-1 string) to
>>>open(), will it find the file?  
>>
>>I tried opening a file using both "o\xcc\x88" and "\xc3\xb6". Both
>>result in the same file being opened.
>>
>>
>>>I.e. if the filesystem has the
>>>unnormalized name stored in its directory, will filesystem requests
>>>normalize filenames before comparing them?
>>
>>It could be that Apple is decomposing the filenames before comparing
>>them. Either way works.

The recommended way of doing normalization is to go by
Normalization Form C: Canonical Decomposition,
followed by Canonical Composition.

See http://www.unicode.org/unicode/reports/tr15/#Specification

Note that for proper collation suppotr, Unicode strings mus first be
normalized. See http://www.unicode.org/unicode/reports/tr10/#Main_Algorithm

> Hm, that sucks (either way) -- because you get unnormalized Unicode
> out of directory listings, which is harder to turn into local
> encodings.

You can easily normalize it again (provided you have a normalization
lib at hand).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/