re.search much slower then grep on some regular expressions
Sebastian "lunar" Wiesner
basti.wiesner at gmx.net
Sat Jul 5 04:12:05 EDT 2008
Paddy <paddy3118 at googlemail.com>:
> On Jul 4, 1:36 pm, Peter Otten <__pete... at web.de> wrote:
>> Henning_Thornblad wrote:
>> > What can be the cause of the large difference between re.search and
>> > grep?
>>
>> grep uses a smarter algorithm ;)
>>
>>
>>
>> > This script takes about 5 min to run on my computer:
>> > #!/usr/bin/env python
>> > import re
>>
>> > row=""
>> > for a in range(156000):
>> > row+="a"
>> > print re.search('[^ "=]*/',row)
>>
>> > While doing a simple grep:
>> > grep '[^ "=]*/' input (input contains 156.000 a in
>> > one row)
>> > doesn't even take a second.
>>
>> > Is this a bug in python?
>>
>> You could call this a performance bug, but it's not common enough in real
>> code to get the necessary brain cycles from the core developers.
>> So you can either write a patch yourself or use a workaround.
>>
>> re.search('[^ "=]*/', row) if "/" in row else None
>>
>> might be good enough.
>>
>> Peter
>
> It is not a smarter algorithm that is used in grep. Python RE's have
> more capabilities than grep RE's which need a slower, more complex
> algorithm.
FWIW, grep itself can confirm this statement. The following command roughly
takes as long as Python's re.search:
# grep -P '[^ "=]*/' input
-P tells grep to use real perl-compatible regular expressions.
--
Freedom is always the freedom of dissenters.
(Rosa Luxemburg)
More information about the Python-list
mailing list