Compare regular expressions

Charles Sanders C.delete_this.Sanders at BoM.GOv.AU
Thu Apr 19 23:31:36 EDT 2007


Thomas Dybdahl Ahle wrote:
> Hi, I'm writing a program with a large data stream to which modules can 
> connect using regular expressions.
> 
> Now I'd like to not have to test all expressions every time I get a line, 
> as most of the time, one of them having a match means none of the others 
> can have so.

Not what you want, and would be a LOT of work, but if I
remember correctly (from university more than 20 years ago)
there is an algorithm that could be adapted to return a list
of all the regular expressions that match a string.

I thing the base algorithms were documented in Aho and Ullman
("The Dragon book" if I remember correctly). There was one
for converting a regular expression into a Non-deterministic
Finite-state Automaton, and another for converting the NFA
to a Deterministic Finite-state Automaton. The book mentioned
that this also handles multiple regular expressions which can
be treated as the sub-expressions combined using the or operator.
Other, newer books on compiler design would probably contain
similar information in their sections on lexical analysis.

The algorithm given (if my memory is correct) only handled the
case where a true/false match/no-match result was required, but
mentioned how to generalise it to return which of several
expressions separated by or operators was matched. I think
an additional generalisation to return a set of matches rather
than one would be relatively straight-forward.

I believe these algorithms are the basis of lex/flex and
similar utilities, as well as the regular expression handling
of many languages.

Of course, I would like to emphasise that all this is based on
memory of university courses more than 20 years ago, so the
details could be wrong.

Charles





More information about the Python-list mailing list