Python Unicode handling wins again -- mostly

Serhiy Storchaka storchaka at gmail.com
Sun Dec 1 13:00:21 EST 2013


30.11.13 02:44, Steven D'Aprano написав(ла):
> (2) If you reverse that string, does it give "lëon"? The implication of
> this question is that strings should operate on grapheme clusters rather
> than code points. Python fails this test:
>
> py> print("noe\u0308l"[::-1])
> leon

 >>> print(unicodedata.normalize('NFC', "noe\u0308l")[::-1])
lëon

> (3) What are the first three characters? The author suggests that the
> answer should be "noë", in which case Python fails again:
>
> py> print("noe\u0308l"[:3])
> noe

 >>> print(unicodedata.normalize('NFC', "noe\u0308l")[:3])
noë

> (4) Likewise, what is the length of the decomposed string? The author
> expects 4, but Python gives 5:
>
> py> len("noe\u0308l")
> 5

 >>> print(len(unicodedata.normalize('NFC', "noe\u0308l")))
4





More information about the Python-list mailing list