How to get the "longest possible" match with Python's RE module?

Tim Peters tim.peters at gmail.com
Tue Sep 12 03:12:51 EDT 2006


[Licheng Fang]
>> Oh, please do have a look at the second link I've posted. There's a
>> table comparing the regexp engines. The engines you've tested probably
>> all use an NFA implementation.

[Bryan Olson]
> Unfortunately, the stuff about NFA's is wrong. Friedl's awful
> book

Strongly disagree:  it's an excellent book about the /pragmatics/ of
using "regular expressions" as most widely implemented.  It's not at
all about "regular expressions" in the  CompSci sense of the term,
which appears to be your complaint.

> was the first time I saw this confusion about what NFA is;
> I don't know if he originated the mess or just propagated it.

As far as I could tell at the time, he originated it.  I'm not happy
about that either.

> "Nondeterministic finite automata" is well defined in theory
> of computation. The set of languages accepted by NFA's is
> exactly the same as the set accepted by DFA's.

And which has very little to do with "regular expressions" as most
widely implemented -- gimmicks like backreferences are wholly outside
the DFA formalism.

> What Python uses is search-and-backtrack. Unfortunately such
> engines don't have much theory behind them, and it's hard to
> reason generally about what they do.

Yup X 3, and the last is precisely why Friedl's book is valuable for
people using "NFA" implementations:  Friedl does a good job of
explaining when and why you're likely to trigger atrocious runtime
performance, and gives practical general techniques for avoiding those
problems.  None of that has anything to do with CompSci regexps
either, but is vital knowledge for people hoping to make happy
non-trivial use of Python/Perl/etc regexps.



More information about the Python-list mailing list