way to remove all non-ascii characters from a file?

Peter Otten __peter__ at web.de
Fri Feb 13 16:45:40 EST 2004


omission9 wrote:

> I have a text file which contains the occasional non-ascii charcter.
> What is the best way to remove all of these in python?

Read it in chunks, then remove the non-ascii charactors like so:

>>> t = "".join(map(chr, range(256)))
>>> d = "".join(map(chr, range(128,256)))
>>> "Törichte Logik böser Kobold".translate(t,d)
'Trichte Logik bser Kobold'

and finally write the maimed chunks to a file. However, it's not clear to
me, how removing characters could be a good idea in the first place.
Replacing them at least gives some mimimal hints that something is missing:

>>> t = "".join(map(chr, range(128))) + "?" * 128
>>> "Törichte Logik böser Kobold".translate(t)
'T?richte Logik b?ser Kobold'

Peter



More information about the Python-list mailing list