Python Unicode handling wins again -- mostly
Chris Angelico
rosuav at gmail.com
Mon Dec 2 16:23:02 EST 2013
On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder <ned at nedbatchelder.com> wrote:
> This is where my knowledge about Unicode gets fuzzy. Isn't it the case that
> some grapheme clusters (or whatever the right word is) can't be normalized
> down to a single code point? Characters can accept many accents, for
> example.
You can't normalize everything down to a single code point, but you
can normalize the other way by breaking out everything that can be
broken out.
>>> print(ascii(unicodedata.normalize("NFKC", "ä")))
'\xe4'
>>> print(ascii(unicodedata.normalize("NFKD", "ä")))
'a\u0308'
ChrisA
More information about the Python-list
mailing list