re.search much slower then grep on some regular expressions

Sat Jul 5 01:58:14 EDT 2008

John Nagle wrote:

> Henning_Thornblad wrote:
>> What can be the cause of the large difference between re.search and
>> grep?
>> 
>> This script takes about 5 min to run on my computer:
>> #!/usr/bin/env python
>> import re
>> 
>> row=""
>> for a in range(156000):
>>     row+="a"
>> print re.search('[^ "=]*/',row)
>> 
>> 
>> While doing a simple grep:
>> grep '[^ "=]*/' input                  (input contains 156.000 a in
>> one row)
>> doesn't even take a second.
>> 
>> Is this a bug in python?
>> 
>> Thanks...
>> Henning Thornblad
> 
>     You're recompiling the regular expression on each use.
> Use "re.compile" before the loop to do it once.

Now that's premature optimization :-)

Apart from the fact that re.search() is executed only once in the above
script the re library uses a caching scheme so that even if the re.search()
call were in a loop the overhead would be a few microseconds for the cache
lookup.

Peter