Python Unicode handling wins again -- mostly

Mon Dec 2 16:14:13 EST 2013

On 12/2/13 3:38 PM, Ethan Furman wrote:
> On 11/29/2013 04:44 PM, Steven D'Aprano wrote:
>>
>> Out of the nine tests, Python 3.3 passes six, with three tests being
>> failures or dubious. If you believe that the native string type should
>> operate on code-points, then you'll think that Python does the right
>> thing.
>
> I think Python is doing it correctly.  If I want to operate on
> "clusters" I'll normalize the string first.
>
> Thanks for this excellent post.
>
> --
> ~Ethan~

This is where my knowledge about Unicode gets fuzzy.  Isn't it the case 
that some grapheme clusters (or whatever the right word is) can't be 
normalized down to a single code point?  Characters can accept many 
accents, for example.  In that case, you can't always normalize and use 
the existing string methods, but would need more specialized code.

--Ned.