PEP 3131: Supporting Non-ASCII Identifiers

Sat May 19 03:33:36 EDT 2007

> Providing a method that would translate an arbitrary string into a
> valid Python identifier would be helpful. It would be even more
> helpful if it could provide a way of converting untranslatable
> characters. However, I suspect that the translate (normalize?) routine
> in the unicode module will do.

Not at all. Unicode normalization only unifies different "spellings"
of the same character.

For transliteration, no simple algorithm exists, as it generally depends
on the language. However, if you just want any kind of ASCII string,
you can use the Unicode error handlers (PEP 293). For example, the
program

import unicodedata, codecs

def namereplace(exc):
    if isinstance(exc,
           (UnicodeEncodeError, UnicodeTranslateError)):
        s = u""
        for c in exc.object[exc.start:exc.end]:
            s += "N_"+unicode(unicodedata.name(c).replace(" ","_"))+"_"
        return (s, exc.end)
    else:
        raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error("namereplace", namereplace)

print u"Schl\xfcssel".encode("ascii", "namereplace")

prints SchlN_LATIN_SMALL_LETTER_U_WITH_DIAERESIS_ssel.

HTH,
Martin