Compare regular expressions

Karthik Gurusamy kar1107 at gmail.com
Mon Apr 16 13:57:00 EDT 2007


On Apr 16, 2:50 am, Thomas Dybdahl Ahle <lob... at gmail.com> wrote:
> Hi, I'm writing a program with a large data stream to which modules can
> connect using regular expressions.
>
> Now I'd like to not have to test all expressions every time I get a line,
> as most of the time, one of them having a match means none of the others
> can have so.
>
> But ofcource there are also cases where a regular expression can
> "contain" another expression, like in:
> "^strange line (\w+) and (\w+)$" and "^strange line (\w+) (?:.*?)$" in
> which case I'd like to first test the seccond and only if it mathces test
> the seccond.
>
> Do anybody know if such a test is possible?
> if exp0.contains(exp1): ...


What you want is finding if R2 is a superset of R1 for two given
regular languages R1 and R2. I know of some methods for finding
intersection of two regular languages; and I think the time/space
complexity is big.

So the simple answer is it is not feasible to provide such support for
two generic r.e.s without a large time/space usage. You may consult
any of the math/theory groups for more insights.

If you know already R2 >= R1 (that is you precompute and remember),
then it's a trivial to skip checking for R1 if R2 turned up negative.
You can even arrange all the Rs in a binary tree like fashion and skip
checking a whole subtree if the sub-tree's root node gave negative for
r.e. match.

Karthik




More information about the Python-list mailing list