[Tutor] Checking for Control Characters

Erik Price erikprice@mac.com
Mon, 8 Apr 2002 07:32:11 -0400


On Monday, April 8, 2002, at 01:13  AM, Sheila King wrote:

> In Python, regex is not as fast as in Perl (I believe). If you have
> some fairly comlicated searches to do, or a lot of searches, then it
> may be worth the overhead, and regex may be more efficient. But for
> something as simple as what I am doing, I believe the simple string
> methods will be quicker.
>
> I am not doing persistent cgi, nor do I expect the number of taboo
> characters to increase, nor will I be checking multiple passwords.
> Otherwise, I would look into the regex, as you suggest.

I'm not sure about Python's regex implementation, but in PHP it is 
definitely not as fast as a simple string replacement -- when possible I 
use string replacement functions, but if the search is even remotely 
complicated then regexes are the only way.

One thing this has taught me is that it is worth learning how to write 
an efficient regex -- an interesting corollary to Perl (even though 
regexes predate Perl IIRC) is that there's more than one way to write 
some of them, and you can write one better than another.  Yet another 
thing which may not be immediately obvious is that different regex 
engines work in different ways, sometimes even with different syntax.  
"Mastering Regular Expressions" is the definitive guide to learn more 
about optimizing a regex as well as which form to use for which language 
(covers Perl, Python, Emacs, Tcl/Tk, Yacc, and a host of other tools).

Oh, one other thing -- if the flavor of regular expressions happens to 
be Perl-style (NFA), then you are more likely to be able to "optimize" 
it, whereas if the flavor happens to be "egrep"-style (DFA) then the 
regex is more likely already working as fast as it can -- this really 
depends on the situation, though, and isn't a hard and fast rule.  
Optimizing regexes can be esoteric but it may be worth a few hours 
studying them if your application will make heavy use of them.


Erik