Python Unicode handling wins again -- mostly

Mon Dec 2 16:27:08 EST 2013

On 12/02/2013 01:23 PM, Chris Angelico wrote:
> On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder <ned at nedbatchelder.com> wrote:
>> This is where my knowledge about Unicode gets fuzzy.  Isn't it the case that
>> some grapheme clusters (or whatever the right word is) can't be normalized
>> down to a single code point?  Characters can accept many accents, for
>> example.
>
> You can't normalize everything down to a single code point, but you
> can normalize the other way by breaking out everything that can be
> broken out.
>
>>>> print(ascii(unicodedata.normalize("NFKC", "ä")))
> '\xe4'
>>>> print(ascii(unicodedata.normalize("NFKD", "ä")))
> 'a\u0308'

Well, Stephen was right then!  There's room for a library to handle this situation.  Or is there one already?

--
~Ethan~