How do I automate the removal of all non-ascii characters from my code?

ron vacorama at gmail.com
Tue Sep 13 08:31:59 EDT 2011


On Sep 12, 4:49 am, Steven D'Aprano <steve
+comp.lang.pyt... at pearwood.info> wrote:
> On Mon, 12 Sep 2011 06:43 pm Stefan Behnel wrote:
>
> > I'm not sure what you are trying to say with the above code, but if it's
> > the code that fails for you with the exception you posted, I would guess
> > that the problem is in the "[more stuff here]" part, which likely contains
> > a non-ASCII character. Note that you didn't declare the source file
> > encoding above. Do as Gary told you.
>
> Even with a source code encoding, you will probably have problems with
> source files including \xe2 and other "bad" chars. Unless they happen to
> fall inside a quoted string literal, I would expect to get a SyntaxError.
>
> I have come across this myself. While I haven't really investigated in great
> detail, it appears to happen when copying and pasting code from a document
> (usually HTML) which uses non-breaking spaces instead of \x20 space
> characters. All it takes is just one to screw things up.
>
> --
> Steven

Depending on the load, you can do something like:

"".join([x for x in string if ord(x) < 128])

It's worked great for me in cleaning input on webapps where there's a
lot of copy/paste from varied sources.



More information about the Python-list mailing list