way to remove all non-ascii characters from a file?

Peter Hansen peter at engcorp.com
Tue Feb 17 14:45:07 EST 2004


Gerhard Häring wrote:
> 
> omission9 wrote:
> > I have a text file which contains the occasional non-ascii charcter.
> > What is the best way to remove all of these in python?
> 
> Here's a simple example that does what you want:
> 
>  >>> orig = "Häring"
>  >>> "".join([x for x in orig if ord(x) < 128])
> 'Hring'


Or, if performance is critical, it's possible something like this would
be faster.  (A regex might be even better, avoiding the redundant identity
transformation step.) :

>>> from string import maketrans, translate
>>> table = maketrans('', '')
>>> translate(orig, table, table[128:])
'Hring'


-Peter



More information about the Python-list mailing list