[Tutor] Checking for Control Characters
Erik Price
erikprice@mac.com
Mon, 8 Apr 2002 07:32:11 -0400
On Monday, April 8, 2002, at 01:13 AM, Sheila King wrote:
> In Python, regex is not as fast as in Perl (I believe). If you have
> some fairly comlicated searches to do, or a lot of searches, then it
> may be worth the overhead, and regex may be more efficient. But for
> something as simple as what I am doing, I believe the simple string
> methods will be quicker.
>
> I am not doing persistent cgi, nor do I expect the number of taboo
> characters to increase, nor will I be checking multiple passwords.
> Otherwise, I would look into the regex, as you suggest.
I'm not sure about Python's regex implementation, but in PHP it is
definitely not as fast as a simple string replacement -- when possible I
use string replacement functions, but if the search is even remotely
complicated then regexes are the only way.
One thing this has taught me is that it is worth learning how to write
an efficient regex -- an interesting corollary to Perl (even though
regexes predate Perl IIRC) is that there's more than one way to write
some of them, and you can write one better than another. Yet another
thing which may not be immediately obvious is that different regex
engines work in different ways, sometimes even with different syntax.
"Mastering Regular Expressions" is the definitive guide to learn more
about optimizing a regex as well as which form to use for which language
(covers Perl, Python, Emacs, Tcl/Tk, Yacc, and a host of other tools).
Oh, one other thing -- if the flavor of regular expressions happens to
be Perl-style (NFA), then you are more likely to be able to "optimize"
it, whereas if the flavor happens to be "egrep"-style (DFA) then the
regex is more likely already working as fast as it can -- this really
depends on the situation, though, and isn't a hard and fast rule.
Optimizing regexes can be esoteric but it may be worth a few hours
studying them if your application will make heavy use of them.
Erik