Why TERRIBLE performance of regular expressions in re module ?

Wed Oct 6 01:24:03 EDT 1999

[Tim]
> .. Perl can do a fast Boyer-Moore search for a literal
> embedded substring that needs to match, skipping -- at
> high speed -- over vast stretches of input that can't
> possibly match overall due to not containing a match
> for the embedded literal.

[rturpin at my-deja.com]
> I was wondering if re's were optimized against embedded
> literal substrings.  I am surprised (1) that Perl's package
> is substantially better than Python's,

It's years older, and has the benefit of many more person-years of work.
Alas, everyone who has tried to disentangle it from Perl's guts has given
up; so only Perl gets to use it.  OTOH, regexps are central to Perl's
operation; in Python, they're just another gimmick in just another module.

> and (2) that Perl's package beats the Boyer-Moore algorithm for
> substring search.

Don't know what this one's about -- I said Perl can use Boyer-Moore, not
that it beats it.

> I had always thought the latter was pretty close to optimal.

Well, there are many dozens of exact string-search algorithms.  Which is
"best" depends crucially on the distribution of pattern and target strings.
In the absence of a characterization, "optimal" has no meaning.  B-M is
certainly *good* <wink>.

searchingly y'rs  - tim