re.search much slower then grep on some regular expressions

Fri Jul 4 16:43:11 EDT 2008

On Fri, Jul 4, 2008 at 8:36 AM, Peter Otten <__peter__ at web.de> wrote:
> Henning_Thornblad wrote:
>
>> What can be the cause of the large difference between re.search and
>> grep?
>
> grep uses a smarter algorithm ;)
>
>> This script takes about 5 min to run on my computer:
>> #!/usr/bin/env python
>> import re
>>
>> row=""
>> for a in range(156000):
>>     row+="a"
>> print re.search('[^ "=]*/',row)
>>
>>
>> While doing a simple grep:
>> grep '[^ "=]*/' input                  (input contains 156.000 a in
>> one row)
>> doesn't even take a second.
>>
>> Is this a bug in python?
>
> You could call this a performance bug, but it's not common enough in real
> code to get the necessary brain cycles from the core developers.
> So you can either write a patch yourself or use a workaround.
>
> re.search('[^ "=]*/', row) if "/" in row else None
>
> might be good enough.
>

Wow... I'm rather surprised at how slow this is... using re.match
yields much quicker results, but of course it's not quite the same as
re.search

Incidentally, if you add the '/' to "row" at the end of the string,
re.search returns instantly with a match object.

@ Peter
I'm not versed enough in regex to tell if this is a bug or not
(although I suspect it is), but why would you say this particular
regex isn't common enough in real code?

filipe