regex question
Alex Ross
alex.j.ross at gmail.com
Thu Aug 3 21:05:15 EDT 2006
This might be more flexible:
pat = re.compile(r"^(a*(?=b)b*(?=[ac])c*(?=[abd])d*)+$")
tests = [('aabbbaabbcccbbbcccddd', True),
('aabcabcd', True),
('abcd', True),
('aabbccaabbccabcabababbbccccdddd', True),
('aabbccaabbccabcabababbbccccddddabcd', True),
('aaaaabbbbbccccaaaaadddd', False),
('aaaaaaaaaaabbbbbccc', False),
('aabbccaabbccacabababbbccccdddd', False),
('aabbccaabbccabcdddcabababbbccccdddd', False)]
It works with all of the test cases you've given, and will also work in
the case where
'd' is followed by '[abc]'.
Tim Chase wrote:
> Gabriel Murray wrote:
> > Hello, I'm looking for a regular expression which will match strings as
> > follows: if there are symbols a, b, c and d, then any pattern is valid if it
> > begins with a and ends with d and proceeds in order through the symbols.
> > However, at any point the pattern may reset to an earlier position in the
> > sequence and begin again from there.
> > For example, these would be valid patterns:
> > aabbbaabbcccbbbcccddd
> > aabcabcd
> > abcd
> >
> > But these would not:
> > aaaaabbbbbccccaaaaadddd (goes straight from a to d)
> > aaaaaaaaaaabbbbbccc (does not reach d)
> >
> > Can anyone think of a concise way of writing this regex? The ones I can
> > think of are very long and awkward.
> > Gabriel
> >
> >
>
> It's a bit ugly, but
>
> import re
>
> tests = [
> ('aabbbaabbcccbbbcccddd', True),
> ('aabcabcd', True),
> ('abcd', True),
> ('aaaaabbbbbccccaaaaadddd', False),
> ('aaaaaaaaaaabbbbbccc', False),
> ]
>
> regex = r'^(a+b+)+(c+(a*b+)*)+d+$'
> r = re.compile(regex)
> for test, expected in tests:
> matched = (r.match(test) is not None)
> if matched == expected:
> print "PASSED: %s with %s" % (test, expected)
> else:
> print "FAILED: %s with %s" % (test, expected)
>
>
> passes all the tests you suggested.
>
> One test that stands out to me as an undefined case would be
>
> abcdcd
>
> (where, after reaching D, the pattern backtracks again).
>
> It currently assumes nothing but "d"s follow "d"s.
>
> -tkc
More information about the Python-list
mailing list