Checking strings for "bad" characters

Peter Hansen peter at engcorp.com
Tue Aug 27 23:15:14 EDT 2002


Harvey Thomas wrote:
> 
> I've got some very long Unicode strings which I wish to test for the presence of ASCII characters 0-8 and 14-31. My first thought was to use regular expressions, e.g.:
> 
> import re
> r = re.compile(u'[%s%s]' % (''.join([unichr(x) for x in range(0, 9)]) , ''.join([unichr(x) for x in range(14, 32)])))
> amatch = r.search(r)
> if amatch:
>     print "Bad characters"
> else:
>     print "OK"
> 
> but is there a better or faster method.

If you could use string.maketrans and .translate() to convert all bad characters
that might be present into a single code (e.g. \x00), and then do a simple
.find() for that character, you might get the benefits of simplicity and extreme
speed.

-Peter



More information about the Python-list mailing list