[Tutor] Removing control characters

Kent Johnson kent37 at tds.net
Thu Feb 19 17:03:55 CET 2009


On Thu, Feb 19, 2009 at 10:14 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> I want a regex to remove control characters (< chr(32) and > chr(126)) from
> strings ie.
>
> line = re.sub(r"[^a-z0-9-';.]", " ", line)   # replace all chars NOT A-Z,
> a-z, 0-9, [-';.] with " "
>
> 1.  What is the best way to include all the required chars rather than list
> them all within the r"" ?

You have to list either the chars you want, as you have done, or the
ones you don't want. You could use
r'[\x00-\x1f\x7f-\xff]' or
r'[^\x20-\x7e]'

> 2.  How do you handle the inclusion of the quotation mark " ?

Use \", that works even in a raw string.

By the way string.translate() is likely to be faster for this purpose
than re.sub(). This recipe might help:
http://code.activestate.com/recipes/303342/

Kent


More information about the Tutor mailing list