search (match) backward

Tim Peters tim_one at email.msn.com
Fri Oct 29 02:31:07 EDT 1999


[Tim, sketches the usual NFA-based proof that the regular languages are
closed
 under reversal]

[John Farrell]
> In theory, yes, but I don't believe that REs can necessarily be
> converted to state machines, particularly not DFAs.

Definitions:  I was using RE, NFA and DFA in their long-established "comp
sci" senses, where it's been equally long established <wink> that the three
are interchangeable.

> For example:
>
>> (?P=name)
>> Matches whatever text was matched by the earlier group named name.
>> This implies that the value is stored somewhere, and DFAs cannot do
>> that.

Right, backreferences are too powerful for REs to capture; for that matter,
they're too powerful for full-blown context-free grammars!  That makes them
a very strange addition to the "regular expression" repertoire.

> \number is similar.

Well, identical <wink>:  named groups are light syntactic sugar for numbered
backreferences; Python's search engine doesn't even know named groups exist
(they're converted to numbered backreferences in a preprocessing step).

> I am not sure about these either:
> [lookahead assertions]

Those don't add real power to REs, although they're *convenient*.  One of
the strangest uses for positive lookahead assertions is the idiom

   (?=regexp1)regexp2

as a way to write a regexp that succeeds only if a string matches both
regexp1 and regexp2.  In  other words, it's a bizarre way to express
language intersection.  "comp sci" REs are provably closed under
intersection without tricks like that.

> I don't know whether there are other types of state machines that
> have the memory to implement the RE specification properly, and are
> still susceptible to Tim's argument.

I don't think there are, taking "RE" in the Perl/Python sense.  For example,
offhand I don't think it's possible to write a Perl/Python (ir)regular
expression that matches all and only the reversal of strings matched by this
Perl/Python IRE:

    ^(.+)(.+)(\1|\2)+$

Backreferences destroy many nice RE properties.  A practical one is that a
DFA can always be constructed for any (comp sci) RE that says yea or nay on
any string in worst-case linear (in the length of the string) time.  Add
backreferences, and matching suddenly becomes an NP-Complete problem (see,
e.g.,

    http://www.plover.com/~mjd/perl/NPC/

).

> However I believe for the subset of REs which do not use the constructs
> mentioned above, Tim's argument is correct.

Oh yes!  But that argument is even older than I am <wink>.

ire-is-well-named-ly y'rs  - tim






More information about the Python-list mailing list