Regex Speed

John Machin sjmachin at lexicon.net
Fri Feb 23 20:26:23 EST 2007


On Feb 24, 11:51 am, greg <g... at cosc.canterbury.ac.nz> wrote:
> garri... at gmail.com wrote:
> > the author of this citation states that
> > any regex can be expressed as a  DFA machine. However ...
>
>  > I appear to have found one example of a regex
>
> > which breaks this assumption.
>
> > "ab+c|abd"
>
> > Am I correct?
>
> No. Any NFA can be converted to an equivalent DFA.

Correct. However ...

> This is how scanner generators like Lex work -- they
> first construct an NFA from the regex, and then
> convert it to a DFA. Going directly from the regex
> to a DFA, like you're trying to do, would be a lot
> harder, and I'd be surprised if anyone ever does
> it that way.

>From "Compilers; Principles, Techniques, and Tools" aka "the dragon
book" by Aho, Sethi and Ullman, 1986, page 134: "The first algorithm
is suitable for inclusion in a Lex compiler because it constructs a
DFA directly from a regular expression, without constructing an
intermediate NFA along the way."

>
> There's a description of the NFA-to-DFA algorithm
> here:
>
> http://www.gamedev.net/reference/articles/article2170.asp

which is on a really serious site (pop-up flashing whizz-bangs
inciting one to get an iPod now!) and which uses the (a|b)*abb example
from the dragon book (and the diagram of its Thompson-constructed NFA)
without any credit or mention of the book,  in fact no references or
attributions at all.





More information about the Python-list mailing list