[Python-Dev] Re: String module

Fredrik Lundh fredrik@pythonware.com
Thu, 30 May 2002 11:23:35 +0200


Fran=E7ois Pinard wrote:
> This reminds me that I often miss, in the standard `ctype.h' and =
related,
> a function that would un-combine a character into its base character =
and
> its diacritic, and the complementary re-combining function.

import unicodedata

def uncombine(char):
    chars =3D unicodedata.decomposition(unichr(ord(char))).split()
    if not chars:
        return [char]
    return [unichr(int(x, 16)) for x in chars if x[0] !=3D "<"]

for char in "Fran=E7ois":
    print uncombine(char)

['F']
['r']
['a']
['n']
[u'c', u'\u0327']
['o']
['i']
['s']

(to go the other way, store all uncombinations longer than one
character in a dictionary)

</F>