[issue5200] unicode.normalize gives wrong result for some characters

Martin v. Löwis report at bugs.python.org
Tue Feb 10 19:59:23 CET 2009


Martin v. Löwis <martin at v.loewis.de> added the comment:

It is not true that normalize produces "aaoAAO". Instead, it produces

u'a\u030aa\u0308o\u0308A\u030aA\u0308O\u0308'

This is the correct result, according to the Unicode specification. It
would be incorrect to normalize them unchanged under the Unicode Normal
Form D (for decomposed); the decomposed character for 'LATIN SMALL
LETTER A WITH RING ABOVE' (for example) is 'LATIN SMALL LETTER A' +
'COMBINING RING ABOVE'.

The wikipedia article is irrelevant; refer to the Unicode specification
for a normative reference.

Closing as invalid.

----------
nosy: +loewis
resolution:  -> invalid
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5200>
_______________________________________


More information about the Python-bugs-list mailing list