Why is regex so slow?

Mark Lawrence breamoreboy at yahoo.co.uk
Tue Jun 18 13:34:01 EDT 2013


On 18/06/2013 18:08, Roy Smith wrote:
>
> On Jun 18, 2013, at 1:01 PM, Skip Montanaro wrote:
>
>>> I don't understand why the first way is so much slower.
>>
>> I have no obvious answers, but a couple suggestions:
>>
>> 1. Can you anchor the pattern at the beginning of the line?  (use
>> match() instead of search())
>
> That's one of the things we tried.  Didn't make any difference.
>
>> 2. Does it get faster it you eliminate the "(.*)" part of the pattern?
>
> Just tried that, it also didn't make any difference.
>
>> It seems that if you find a line matching the first part of the
>> pattern, you could just as easily split the line yourself instead of
>> creating a group.
>
>
> At this point, I'm not so much interested in making this faster as understanding why it's so slow.  I'm tempted to open this up as a performance bug against the regex module (which I assume will be rejected, at least for the 2.x series).
>
> ---
> Roy Smith
> roy at panix.com
>

Out of curiousity have the tried the new regex module from pypi rather 
than the stdlib version?  A heck of a lot of work has gone into it see 
http://bugs.python.org/issue2636

-- 
"Steve is going for the pink ball - and for those of you who are 
watching in black and white, the pink is next to the green." Snooker 
commentator 'Whispering' Ted Lowe.

Mark Lawrence




More information about the Python-list mailing list