Why is regex so slow?

Tue Jun 18 23:16:39 EDT 2013

On Tue, 18 Jun 2013 22:11:01 -0400, Dave Angel wrote:

> On 06/18/2013 09:51 PM, Steven D'Aprano wrote:
> 
>     <SNIP>
>>
>> Even if the regex engine is just as efficient at doing simple character
>> matching as `in`, and it probably isn't, your regex tries to match all
>> eleven characters of "ENQUEUEING" while the `in` test only has to match
>> three, "ENQ".
>>
>>
> The rest of your post was valid, and useful, but there's a misconception
> in this paragraph;  I hope you don't mind me pointing it out.

Of course not, I'm always happy to learn if I'm mistaken.

> In general, for simple substring searches, you can search for a large
> string faster than you can search for a smaller one.  I'd expect
> 
> if "ENQUEUING" in bigbuffer
> 
> to be faster than
> 
> if "ENQ"  in bigbuffer

And so it is:

steve at runes:~$ python2.7 -m timeit -s "sub = 'ENQ'" \
> -s "s = 'blah '*10000 + 'ENQUIRING blah blah blah'" \
> "sub in s"
10000 loops, best of 3: 38.3 usec per loop
steve at runes:~$ python2.7 -m timeit -s "sub = 'ENQUIRING'" \
> -s "s = 'blah '*10000 + 'ENQUIRING blah blah blah'" \
> "sub in s"
100000 loops, best of 3: 15.4 usec per loop

Thank you.

-- 
Steven