Python Unicode handling wins again -- mostly

Chris Angelico rosuav at gmail.com
Mon Dec 2 16:23:02 EST 2013


On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder <ned at nedbatchelder.com> wrote:
> This is where my knowledge about Unicode gets fuzzy.  Isn't it the case that
> some grapheme clusters (or whatever the right word is) can't be normalized
> down to a single code point?  Characters can accept many accents, for
> example.

You can't normalize everything down to a single code point, but you
can normalize the other way by breaking out everything that can be
broken out.

>>> print(ascii(unicodedata.normalize("NFKC", "ä")))
'\xe4'
>>> print(ascii(unicodedata.normalize("NFKD", "ä")))
'a\u0308'

ChrisA



More information about the Python-list mailing list