Replace accented chars with unaccented ones
Jeff Epler
jepler at unpythonic.net
Mon Mar 15 18:55:18 EST 2004
You have two options. First, convert the string to Unicode and use code
like the following:
replacements = [(u'\xe9', 'e'), ...]
def remove_accents(u):
for a, b in replacements:
u = u.replace(a, b)
return u
>>> remove_accents(u'\xe9')
u'e'
Second, if you are using a single-byte encoding (iso8859-1, for
instance), then work with byte string:
replacement_map = string.maketrans('\xe9...', 'e...')
def remove_accents(s):
return s.translate(replacement_map)
>>> remove_accents('\xe9')
'e'
If you want to have strings like u'é' in your programs, you have to
include a line at the top of the source file that tells Python the
encoding, like the following line does:
# -*- coding: utf-8 -*-
(except you have to name the encoding your editor uses, if it's not
utf-8) See http://python.org/peps/pep-0263.html
Once you've done that, you can write
replacements = [(u'é', 'e'), ...]
instead of using the \xXX escape for it.
Jeff
More information about the Python-list
mailing list