way to remove all non-ascii characters from a file?
Peter Hansen
peter at engcorp.com
Tue Feb 17 14:45:07 EST 2004
Gerhard Häring wrote:
>
> omission9 wrote:
> > I have a text file which contains the occasional non-ascii charcter.
> > What is the best way to remove all of these in python?
>
> Here's a simple example that does what you want:
>
> >>> orig = "Häring"
> >>> "".join([x for x in orig if ord(x) < 128])
> 'Hring'
Or, if performance is critical, it's possible something like this would
be faster. (A regex might be even better, avoiding the redundant identity
transformation step.) :
>>> from string import maketrans, translate
>>> table = maketrans('', '')
>>> translate(orig, table, table[128:])
'Hring'
-Peter
More information about the Python-list
mailing list