re.search much slower then grep on some regular expressions

Sat Jul 5 04:12:05 EDT 2008

Paddy <paddy3118 at googlemail.com>:

> On Jul 4, 1:36 pm, Peter Otten <__pete... at web.de> wrote:
>> Henning_Thornblad wrote:
>> > What can be the cause of the large difference between re.search and
>> > grep?
>>
>> grep uses a smarter algorithm ;)
>>
>>
>>
>> > This script takes about 5 min to run on my computer:
>> > #!/usr/bin/env python
>> > import re
>>
>> > row=""
>> > for a in range(156000):
>> > row+="a"
>> > print re.search('[^ "=]*/',row)
>>
>> > While doing a simple grep:
>> > grep '[^ "=]*/' input                  (input contains 156.000 a in
>> > one row)
>> > doesn't even take a second.
>>
>> > Is this a bug in python?
>>
>> You could call this a performance bug, but it's not common enough in real
>> code to get the necessary brain cycles from the core developers.
>> So you can either write a patch yourself or use a workaround.
>>
>> re.search('[^ "=]*/', row) if "/" in row else None
>>
>> might be good enough.
>>
>> Peter
> 
> It is not a smarter algorithm that is used in grep. Python RE's have
> more capabilities than grep RE's which need a slower, more complex
> algorithm.

FWIW, grep itself can confirm this statement.  The following command roughly
takes as long as Python's re.search:

# grep -P '[^ "=]*/' input

-P tells grep to use real perl-compatible regular expressions. 

-- 
Freedom is always the freedom of dissenters.
                                      (Rosa Luxemburg)