Is there a string function to trim all non-ascii characters out of a string

Duncan Booth duncan.booth at invalid.invalid
Mon Dec 31 05:24:33 EST 2007


"silverburgh.meryl at gmail.com" <silverburgh.meryl at gmail.com> wrote:

> Hi,
> 
> Is there a string function to trim all non-ascii characters out of a
> string?
> Let say I have a string in python (which is utf8 encoded), is there a
> python function which I can convert that to a string which composed of
> only ascii characters?
> 
> Thank you.

Yes, just decode it to unicode (which you should do as the first thing for 
any encoded strings) and then encode it back to ascii with error handling 
set how you want:

>>> s = '\xc2\xa342'
>>> s.decode('utf8').encode('ascii', 'replace')
'?42'
>>> s.decode('utf8').encode('ascii', 'ignore')
'42'
>>> s.decode('utf8').encode('ascii', 'xmlcharrefreplace')
'£42'



More information about the Python-list mailing list