regex question
Tim Chase
python.list at tim.thechases.com
Thu Aug 3 17:34:18 EDT 2006
Gabriel Murray wrote:
> Hello, I'm looking for a regular expression which will match strings as
> follows: if there are symbols a, b, c and d, then any pattern is valid if it
> begins with a and ends with d and proceeds in order through the symbols.
> However, at any point the pattern may reset to an earlier position in the
> sequence and begin again from there.
> For example, these would be valid patterns:
> aabbbaabbcccbbbcccddd
> aabcabcd
> abcd
>
> But these would not:
> aaaaabbbbbccccaaaaadddd (goes straight from a to d)
> aaaaaaaaaaabbbbbccc (does not reach d)
>
> Can anyone think of a concise way of writing this regex? The ones I can
> think of are very long and awkward.
> Gabriel
>
>
It's a bit ugly, but
import re
tests = [
('aabbbaabbcccbbbcccddd', True),
('aabcabcd', True),
('abcd', True),
('aaaaabbbbbccccaaaaadddd', False),
('aaaaaaaaaaabbbbbccc', False),
]
regex = r'^(a+b+)+(c+(a*b+)*)+d+$'
r = re.compile(regex)
for test, expected in tests:
matched = (r.match(test) is not None)
if matched == expected:
print "PASSED: %s with %s" % (test, expected)
else:
print "FAILED: %s with %s" % (test, expected)
passes all the tests you suggested.
One test that stands out to me as an undefined case would be
abcdcd
(where, after reaching D, the pattern backtracks again).
It currently assumes nothing but "d"s follow "d"s.
-tkc
More information about the Python-list
mailing list