Regular Expressions - Python vs Perl

Roy Smith roy at panix.com
Fri Apr 22 08:37:51 EDT 2005


iny+news at iki.fi (Ilpo Nyyssönen) wrote:
> Of course it caches those when running. The point is that it needs to
> recompile every time you have restarted the program. With short lived
> command line programs this really can be a problem.

Are you speculating that it might be a problem, or saying that you have 
seen it be a problem in a real-life program?

I just generated a bunch of moderately simple regexes from a dictionary 
wordlist.  Looks something like:

Roy-Smiths-Computer:play$ head exps
a.*a[0-9]{34}
a.*ah[0-9]{34}
a.*ahed[0-9]{34}
a.*ahing[0-9]{34}
a.*ahs[0-9]{34}
a.*al[0-9]{34}
a.*alii[0-9]{34}
a.*aliis[0-9]{34}
a.*als[0-9]{34}
a.*ardvark[0-9]{34}

Then I ran them through a little script that does:

for exp in sys.stdin.readlines():
    regex = re.compile (exp)

and timed it for various numbers of lines.  On my G4 Powerbook (1 GHz 
PowerPC), I'm compiling about 1000 regex's per second:

Roy-Smiths-Computer:play$ time head -5000 < exps | ./regex.py

real    0m5.208s
user    0m4.690s
sys     0m0.090s

So, my guess is that unless you're compiling 100's of regexes each time you 
start up, the one-time compilation costs are probably not significant.

> And yes, I have read the source of sre.py and I have made an ugly
> module that digs the compiled data and pickles it to a file and then
> in next startup it reads that file and puts the stuff back to the
> cache.

That's exactly what I would have done if I really needed to improve startup 
speed.  In fact, I did something like that many moons ago, in a previous 
life.  See R. Smith, "A finite state machine algorithm for finding 
restriction sites and other pattern matching applications", CABIOS, Vol 4, 
no. 4, 1988.  In that case, I had about 1200 patterns I was searching for 
(and doing it on hardware running about 1% of the speed of my current 
laptop).

BTW, why did you have to dig out the compiled data before pickling it?  
Could you not have just pickled whatever re.compile() returned?



More information about the Python-list mailing list