Grapheme clusters, a.k.a.real characters

Chris Angelico rosuav at gmail.com
Fri Jul 14 08:33:23 EDT 2017


On Fri, Jul 14, 2017 at 10:05 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Marko Rauhamaa <marko at pacujo.net>:
>
>> Chris Angelico <rosuav at gmail.com>:
>>> If you're trying to use strings as identifiers in any way (say, file
>>> names, or document lookup references), using the NFC/NFD normalized
>>> form of the string should be sufficient.
>>
>> Show me ten Python3 database applications, and I'll show you ten Python3
>> database applications that don't normalize their primary keys.
>
> Besides the normal forms don't help you do text processing (no regular
> expression matching, no simple way to get a real character).

What do you mean about regular expressions? You can use REs with
normalized strings. And if you have any valid definition of "real
character", you can use it equally on an NFC-normalized or
NFD-normalized string than any other. They're just strings, you know.

ChrisA



More information about the Python-list mailing list