[issue10254] unicodedata.normalize('NFC', s) regression
Martin v. Löwis
report at bugs.python.org
Sat Oct 30 23:01:20 CEST 2010
Martin v. Löwis <martin at v.loewis.de> added the comment:
The change from issue1054943 is indeed bogus. As written, the code will happily run over starters, even though a blocked start means that subsequent characters can't possibly be combinable. That way, the code manages to combine, in 'Li\u030dt-s\u1e73\u0301', the final U+0301 with the i - even though there are several starters in-between.
I think the code should work like this:
if comb!=0 and comb1==0:
#starter after character with higher class:
# not combinable, and all subsequent characters will be blocked
# as well
break
if comb!=0 and comb1==comb:
# blocked combining character, continue searching
i1++
continue
# candidate pair, check whether *i and *i1 are combinable
It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10254>
_______________________________________
More information about the Python-bugs-list
mailing list