[issue10254] unicodedata.normalize('NFC', s) regression

Martin v. Löwis report at bugs.python.org
Sat Oct 30 23:01:20 CEST 2010


Martin v. Löwis <martin at v.loewis.de> added the comment:

The change from issue1054943 is indeed bogus. As written, the code will happily run over starters, even though a blocked start means that subsequent characters can't possibly be combinable. That way, the code manages to combine, in 'Li\u030dt-s\u1e73\u0301', the final U+0301 with the i - even though there are several starters in-between.

I think the code should work like this:

if comb!=0 and comb1==0:
  #starter after character with higher class:
  # not combinable, and all subsequent characters will be blocked
  # as well
  break
if comb!=0 and comb1==comb:
  # blocked combining character, continue searching
  i1++
  continue
# candidate pair, check whether *i and *i1 are combinable

It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10254>
_______________________________________


More information about the Python-bugs-list mailing list