regex question

Tim Chase python.list at tim.thechases.com
Thu Aug 3 17:34:18 EDT 2006


Gabriel Murray wrote:
> Hello, I'm looking for a regular expression which will match strings as
> follows: if there are symbols a, b, c and d, then any pattern is valid if it
> begins with a and ends with d and proceeds in order through the symbols.
> However, at any point the pattern may reset to an earlier position in the
> sequence and begin again from there.
> For example, these would be valid patterns:
> aabbbaabbcccbbbcccddd
> aabcabcd
> abcd
> 
> But these would not:
> aaaaabbbbbccccaaaaadddd   (goes straight from a to d)
> aaaaaaaaaaabbbbbccc (does not reach d)
> 
> Can anyone think of a concise way of writing this regex? The ones I can
> think of are very long and awkward.
> Gabriel
> 
> 

It's a bit ugly, but

import re

tests = [
     ('aabbbaabbcccbbbcccddd', True),
     ('aabcabcd', True),
     ('abcd', True),
     ('aaaaabbbbbccccaaaaadddd', False),
     ('aaaaaaaaaaabbbbbccc', False),
     ]

regex = r'^(a+b+)+(c+(a*b+)*)+d+$'
r = re.compile(regex)
for test, expected in tests:
     matched = (r.match(test) is not None)
     if matched == expected:
         print "PASSED: %s with %s" % (test, expected)
     else:
         print "FAILED: %s with %s" % (test, expected)


passes all the tests you suggested.

One test that stands out to me as an undefined case would be

	abcdcd

(where, after reaching D, the pattern backtracks again).

It currently assumes nothing but "d"s follow "d"s.

-tkc







More information about the Python-list mailing list