search (match) backward

Magnus L. Hetland mlh at idi.ntnu.no
Wed Oct 27 09:14:39 EDT 1999


Greg Ewing <greg.ewing at compaq.com> writes:

> mehta at mama.indstate.edu wrote:
[...]
> This raises the interesting question of whether there is an
> algorithm for reversing an RE (i.e. find an RE which matches
> the reverse of the strings that the original RE matches).
> Can it even be done in general? My intuition says yes, but
> I can't think of a proof off the top of my head.

I want to try! :)

OK... A regular expression expresses a regular language, right? And
they are built from union, concatenation, and closure... (I'm assuming
that all other stuff in the re module follows from this... Though with
lookahead etc. this is probably not the case...)

So...A regular set is a singleton set, the union of two regular sets,
the concatenation of two regular sets, or the closure of a regular
sets. I guess I just have to show how to reverse all of these
constructs...

A singleton set consists of only one symbol, so it doesn't have to be
reversed. A union doesn't express order, so that doesn't have to be
reversed either (but of course both of the sets being used must be).
Concatenation must be reversed - and van, quite simply. Just
concatenate the sets in the opposite order (after they themselves have
been reversed). Finally - closure does not have to be reversed, since
patterns from the same set are repeated, and the order is
insignificant.

So - as far as I can see, it can be done, and only the concatenation
actually needs rearranging... So - while I'm at it...

There's no parser here, but I think it should be trivial to construct
(with this minimal regexp grammar).

----- rereverse.py -----

class RegExp:
    def __init__(self, left, right):
        self.left = left
        self.right = right

class Singleton(RegExp):
    def __init__(self, value):
        self.value = value

    def accept(self, visitor):
        return visitor.visitSingleton(self)

class Union(RegExp):
    def accept(self, visitor):
        return visitor.visitUnion(self)

class Concatenation(RegExp):
    def accept(self, visitor):
        return visitor.visitConcatenation(self)

class Closure(RegExp):
    def __init__(self, child):
        self.child = child
        
    def accept(self, visitor):
        return visitor.visitClosure(self)

class RegexpReverser:
    def visitSingleton(self, singleton):
        return singleton.value

    def visitUnion(self, union):
        l = union.left.accept(self)
        r = union.right.accept(self)
        return "("+l+"|"+r+")"

    def visitConcatenation(self, concat):
        l = concat.left.accept(self)
        r = concat.right.accept(self)
        return r+l

    def visitClosure(self, closure):
        c = closure.child.accept(self)
        return "("+c+")*"


if __name__=="__main__":
    r = Concatenation(Union(Singleton("a"),
                            Singleton("b")),
                      Closure(Singleton("c")))
    rr = RegexpReverser()
    result = r.accept(rr)
    print result

------------------------

> 
> Greg

--

  Magnus          Echelon jamming noise:
  Lie             FBI CIA NSA Handgun Assault Bomb Drug Terrorism
  Hetland         Special Forces Delta Force AK47 Hillary Clinton 




More information about the Python-list mailing list