Checking strings for "bad" characters
Harvey Thomas
hst at empolis.co.uk
Wed Aug 28 04:41:31 EDT 2002
Peter Hansen wrote:
> Harvey Thomas wrote:
> >
> > I've got some very long Unicode strings which I wish to
> test for the presence of ASCII characters 0-8 and 14-31. My
> first thought was to use regular expressions, e.g.:
> >
> > import re
> > r = re.compile(u'[%s%s]' % (''.join([unichr(x) for x in
> range(0, 9)]) , ''.join([unichr(x) for x in range(14, 32)])))
> > amatch = r.search(r)
> > if amatch:
> > print "Bad characters"
> > else:
> > print "OK"
> >
> > but is there a better or faster method.
>
> If you could use string.maketrans and .translate() to convert
> all bad characters
> that might be present into a single code (e.g. \x00), and
> then do a simple
> .find() for that character, you might get the benefits of
> simplicity and extreme
> speed.
>
> -Peter
Thanks for the suggestion Peter, but it's much slower than the RE. I guess it's because of the creation of a new string which is then scanned. I suppose the only thing faster than the RE is a C extension, but I don't think the effort needed for that is worthwhile.
Harvey
_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
More information about the Python-list
mailing list