Why TERRIBLE performance of regular expressions in re module ?

rturpin at my-deja.com rturpin at my-deja.com
Tue Oct 5 09:50:33 EDT 1999


In article <000201bf0eef$532e9e80$f92d153f at tim>,
  "Tim Peters" <tim_one at email.msn.com> wrote:
> .. Perl can do a fast Boyer-Moore search for a literal
> embedded substring that needs to match, skipping -- at
> high speed -- over vast stretches of input that can't
> possibly match overall due to not containing a match
> for the embedded literal.

I was wondering if re's were optimized against embedded
literal substrings.  I am surprised (1) that Perl's package
is substantially better than Python's, and (2) that Perl's
package beats the Boyer-Moore algorithm for substring search.
I had always thought the latter was pretty close to optimal.
Where is this faster algorithm described?  Is it
algorithmically faster, or merely a more efficient
implementation with same O(m)/O(n) behavior?

Regards,
Russell


Sent via Deja.com http://www.deja.com/
Before you buy.




More information about the Python-list mailing list