re.search much slower then grep on some regular expressions

Sat Jul 5 04:21:41 EDT 2008

On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner"
<basti.wies... at gmx.net> wrote:
> Paddy <paddy3... at googlemail.com>:
>
>
>
> > On Jul 4, 1:36 pm, Peter Otten <__pete... at web.de> wrote:
> >> Henning_Thornblad wrote:
> >> > What can be the cause of the large difference between re.search and
> >> > grep?
>
> >> grep uses a smarter algorithm ;)
>
> >> > This script takes about 5 min to run on my computer:
> >> > #!/usr/bin/env python
> >> > import re
>
> >> > row=""
> >> > for a in range(156000):
> >> > row+="a"
> >> > print re.search('[^ "=]*/',row)
>
> >> > While doing a simple grep:
> >> > grep '[^ "=]*/' input                  (input contains 156.000 a in
> >> > one row)
> >> > doesn't even take a second.
>
> >> > Is this a bug in python?
>
> >> You could call this a performance bug, but it's not common enough in real
> >> code to get the necessary brain cycles from the core developers.
> >> So you can either write a patch yourself or use a workaround.
>
> >> re.search('[^ "=]*/', row) if "/" in row else None
>
> >> might be good enough.
>
> >> Peter
>
> > It is not a smarter algorithm that is used in grep. Python RE's have
> > more capabilities than grep RE's which need a slower, more complex
> > algorithm.
>
> FWIW, grep itself can confirm this statement.  The following command roughly
> takes as long as Python's re.search:
>
> # grep -P '[^ "=]*/' input
>
> -P tells grep to use real perl-compatible regular expressions.

This confirms that a particular engine might not be optimized for it,
but it's not necessarily a reflection that the engine is more complex.

I'm not sure of the specifics and maybe that is the case, but it could
also be a case of a different codebase which is optimized differently.

Carl Banks