regular expression reverse match?

Diez B. Roggisch nospam-deets at web.de
Wed Oct 29 05:32:45 EST 2003


Hi,

> As it is, the program checks the input buffer after each keystroke and
> determines if the buffer is acceptable against a pattern as the string
> is being built.   A reverse match.  It allows new keystrokes to be
> added to the buffer as long as it's within the pattern.

I don't think its possible with regular expressions, or at least the way you
are using them right now, that is to come from a "in the end, I want to
look it like this, but it shall accept everything that prefixes a valid
result".

That won't work - the reason is simply that regular expressions are
equivalent to finite state automata. I don't know if you are familiar with
these, but the consist of states, which are nodes, and transitions between
these, which are edges/arrows between the states.

Now ususally there is one special starting state S, and there should be at
least on reachable end-state. As the names suggest, recoginizing a
specified String starts at S, and then every read character advances the
internal state until one of the end-states is reached _and_ there is no
more input.

Now lets see at a simple example:

"abc"

Here every character becomes a state, and the automata looks like this:

S-a->(a)-b->(b)-c->[c]

The -*-> means that this transition is taken when * is matched by the
current input. So -a-> is taken when an "a" is input.

[c] is an end-state.

Now what you want is, that _all_ states that are legal are also end-states.
That is very easy to accomplish when you implement the automata directly
like this:

S-a->[a]-b->[b]-c->[c]

However, its way more complicated to create this as a rex:

"a(b(c)?)?"

This looks straightforward, so you might be successful to create rexes like
this using a preprocessing step - but the more complicated your rex gets,
this approach will be hard to follow. Actually, I have currently no idea.
Which doesn't mean its not doable, I just don't have enough time to think
about it right now :)

It looks to me, that if you need this feature not on a per-case base where
you can think about your rexes more thoroughly, you have to sort of roll
out your own rex-implmenetation. Which isn't too hard, but definitely
something thats more than just a couple of lines.

Regards,

Diez




More information about the Python-list mailing list