Python Unicode handling wins again -- mostly
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Mon Dec 2 07:39:26 EST 2013
Le dimanche 1 décembre 2013 21:54:48 UTC+1, Tim Delaney a écrit :
> On 2 December 2013 07:15, <wxjm... at gmail.com> wrote:
>
>
> 0.11.13 02:44, Steven D'Aprano написав(ла):
>
>
> > (2) If you reverse that string, does it give "lëon"? The implication of
>
> > this question is that strings should operate on grapheme clusters rather
>
> > than code points. ...
>
> >
>
>
>
> BTW, a grapheme cluster *is* a code points cluster.
>
>
>
> Anyone with a decent level of reading comprehension would have understood that Steven knows that. The implied word is "individual" i.e. "... rather than [individual] code points".
>
>
>
> Why am I responding to a troll? Probably because out of all his baseless complaints about the FSR, he *did* have one valid point about performance that has now been fixed.
>
>
> Tim Delaney
My English is far too be perfect, I think I understood
it correctly.
The point in not in the words "grapheme" or "code point",
neither in "individual", ;-), the point is in "rather".
If one wishes to work on a set of graphemes, one can
only work with the set of the corresponding code points.
To complete Serhiy Storchaka's example:
>>> len(unicodedata.normalize('NFKD', '\ufdfa')) == 18
True
is correct.
jmf
PS I did not even speak about the FSR.
More information about the Python-list
mailing list