Unicode error handler
Walter Dörwald
walter at livinglogic.de
Tue Jan 30 11:28:52 EST 2007
Rares Vernica wrote:
> Hi,
>
> Does anyone know of any Unicode encode/decode error handler that does a
> better replace job than the default replace error handler?
>
> For example I have an iso-8859-1 string that has an 'e' with an accent
> (you know, the French 'e's). When I use s.encode('ascii', 'replace') the
> 'e' will be replaced with '?'. I would prefer to be replaced with an 'e'
> even if I know it is not 100% correct.
>
> If only this letter would be the problem I would do it manually, but
> there is an entire set of letters that need to be replaced with their
> closest ascii letter.
>
> Is there an encode/decode error handler that can replace all the
> not-ascii letters from iso-8859-1 with their closest ascii letter?
You might try the following:
# -*- coding: iso-8859-1 -*-
import unicodedata, codecs
def transliterate(exc):
if not isinstance(exc, UnicodeEncodeError):
raise TypeError("don'ty know how to handle %r" % r)
return (unicodedata.normalize("NFD", exc.object[exc.start])[:1],
exc.start+1)
codecs.register_error("transliterate", transliterate)
print u"Frédéric Chopin".encode("ascii", "transliterate")
Running this script gives you:
$ python transliterate.py
Frederic Chopin
Hope that helps.
Servus,
Walter
More information about the Python-list
mailing list