Match First Sequence in Regular Expression?
Alex Martelli
aleax at mail.comcast.net
Thu Jan 26 11:06:47 EST 2006
Tim Chase <python.list at tim.thechases.com> wrote:
> > Sorry for the confusion. The correct pattern should reject
> > all strings except those in which the first sequence of the
> > letter 'a' that is followed by the letter 'b' has a length of
> > exactly three.
>
> Ah...a little more clear.
>
> r = re.compile("[^a]*a{3}b+(a+b*)*")
> matches = [s for s in listOfStringsToTest if r.match(s)]
Unfortunately, the OP's spec is even more complex than this, if we are
to take to the letter what you just quoted; e.g.
aazaaab
SHOULD match, because the sequence 'aaz' (being 'a' NOT followed by the
letter 'b') should not invalidate the match that follows. I don't think
he means the strings contain only a's and b's.
Locating 'the first sequence of a followed by b' is easy, and reasonably
easy to check the sequence is exactly of length 3 (e.g. with a negative
lookbehind) -- but I don't know how to tell a RE to *stop* searching for
more if the check fails.
If a little more than just REs and matching was allowed, it would be
reasonably easy, but I don't know how to fashion a RE r such that
r.match(s) will succeed if and only if s meets those very precise and
complicated specs. That doesn't mean it just can't be done, just that I
can't do it so far. Perhaps the OP can tell us what constrains him to
use r.match ONLY, rather than a little bit of logic around it, so we can
see if we're trying to work in an artificially overconstrained domain?
Alex
More information about the Python-list
mailing list