Grapheme clusters, a.k.a.real characters

Gregory Ewing greg.ewing at canterbury.ac.nz
Wed Jul 19 01:51:49 EDT 2017


Chris Angelico wrote:
> Once you NFC or NFD normalize both strings, identical strings will
> generally have identical codepoints... You should then be able to use normal regular expressions to
> match correctly.

Except that if you want to match a set of characters,
you can't reliably use [...], you would have to write
them out as alternatives in case some of them take
up more than one code point.

-- 
Greg



More information about the Python-list mailing list