regex question

Alex Ross alex.j.ross at gmail.com
Thu Aug 3 21:05:15 EDT 2006


This might be more flexible:

pat = re.compile(r"^(a*(?=b)b*(?=[ac])c*(?=[abd])d*)+$")
tests = [('aabbbaabbcccbbbcccddd', True),
         ('aabcabcd', True),
         ('abcd', True),
         ('aabbccaabbccabcabababbbccccdddd', True),
         ('aabbccaabbccabcabababbbccccddddabcd', True),
         ('aaaaabbbbbccccaaaaadddd', False),
         ('aaaaaaaaaaabbbbbccc', False),
         ('aabbccaabbccacabababbbccccdddd', False),
         ('aabbccaabbccabcdddcabababbbccccdddd', False)]

It works with all of the test cases you've given, and will also work in
the case where
'd' is followed by '[abc]'.


Tim Chase wrote:
> Gabriel Murray wrote:
> > Hello, I'm looking for a regular expression which will match strings as
> > follows: if there are symbols a, b, c and d, then any pattern is valid if it
> > begins with a and ends with d and proceeds in order through the symbols.
> > However, at any point the pattern may reset to an earlier position in the
> > sequence and begin again from there.
> > For example, these would be valid patterns:
> > aabbbaabbcccbbbcccddd
> > aabcabcd
> > abcd
> >
> > But these would not:
> > aaaaabbbbbccccaaaaadddd   (goes straight from a to d)
> > aaaaaaaaaaabbbbbccc (does not reach d)
> >
> > Can anyone think of a concise way of writing this regex? The ones I can
> > think of are very long and awkward.
> > Gabriel
> >
> >
>
> It's a bit ugly, but
>
> import re
>
> tests = [
>      ('aabbbaabbcccbbbcccddd', True),
>      ('aabcabcd', True),
>      ('abcd', True),
>      ('aaaaabbbbbccccaaaaadddd', False),
>      ('aaaaaaaaaaabbbbbccc', False),
>      ]
>
> regex = r'^(a+b+)+(c+(a*b+)*)+d+$'
> r = re.compile(regex)
> for test, expected in tests:
>      matched = (r.match(test) is not None)
>      if matched == expected:
>          print "PASSED: %s with %s" % (test, expected)
>      else:
>          print "FAILED: %s with %s" % (test, expected)
>
>
> passes all the tests you suggested.
>
> One test that stands out to me as an undefined case would be
>
> 	abcdcd
>
> (where, after reaching D, the pattern backtracks again).
> 
> It currently assumes nothing but "d"s follow "d"s.
> 
> -tkc




More information about the Python-list mailing list