How do I automate the removal of all non-ascii characters from my code?

Vlastimil Brom vlastimil.brom at gmail.com
Tue Sep 13 09:33:00 EDT 2011


2011/9/13 ron <vacorama at gmail.com>:
>
> Depending on the load, you can do something like:
>
> "".join([x for x in string if ord(x) < 128])
>
> It's worked great for me in cleaning input on webapps where there's a
> lot of copy/paste from varied sources.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Well, for this kind of dirty "data cleaning" you may as well use e.g.

>>> u"äteöxt ÛÜÝ wiÉÊËÌthÞßà áânoûüýþn ASɔɕɖCɗɘəɚɛIɗɘəɚɛIεζ iηθιn жзbetийклweeჟრსn .ტუ..ფ".encode("ascii", "ignore").decode("ascii")
u'text  with non ASCII in between ...'
>>>

vbr



More information about the Python-list mailing list