regex question
taleinat
taleinat at gmail.com
Fri Aug 4 06:11:27 EDT 2006
Gabriel Murray <gabriel.murray <at> gmail.com> writes:
>
> Hello, I'm looking for a regular expression which will match strings as
follows: if there are symbols a, b, c and d, then any pattern is valid if it
begins with a and ends with d and proceeds in order through the symbols.
However, at any point the pattern may reset to an earlier position in the
sequence and begin again from there.
> For example, these would be valid
patterns:aabbbaabbcccbbbcccdddaabcabcdabcdBut these would
not:aaaaabbbbbccccaaaaadddd (goes straight from a to d)aaaaaaaaaaabbbbbccc
(does not reach d)Can anyone think of a concise way of writing this regex? The
ones I can think of are very long and awkward.Gabriel
>
Your cirteria could be defined more simply as the following:
* must start with an 'a' and end with a 'd'
* an 'a' must not be followed by 'c' or 'd'
* a 'b' must not be followed by 'd'
Therefore the regexp can more simply be written as:
regexp = re.compile(r'''a
(
a(?!c|d) |
b(?!d) |
c |
d
)*
d''',
re.VERBOSE)
Test code:
tests = [
('abcd', True),
('aaaaaaaaaaabbbbbccc', False),
('aabbccaabbccabcdddcabababbbccccdddd', True),
('aabbccaabbccabcabababbbccccddddabcd', True),
('aaaaabbbbbccccaaaaadddd', False),
('aabbccaabbccacabababbbccccdddd', False),
('abccccdaaaabbbbccccd', True),
('abcdcd', True),
('aabbbaabbcccbbbcccddd', True),
('aabbccaabbccabcabababbbccccdddd', True),
('abccccdccccd', True),
('aabcabcd', True)
]
def checkit(regexp, tests=tests):
for test, expected in tests:
matched = regexp.match(test) is not None
if matched == expected:
print "PASSED: %s with %s" % (test, expected)
else:
print "FAILED: %s with %s" % (test, expected)
>>> checkit(regexp, tests)
PASSED: abcd with True
PASSED: aaaaaaaaaaabbbbbccc with False
PASSED: aabbccaabbccabcdddcabababbbccccdddd with True
PASSED: aabbccaabbccabcabababbbccccddddabcd with True
PASSED: aaaaabbbbbccccaaaaadddd with False
PASSED: aabbccaabbccacabababbbccccdddd with False
PASSED: abccccdaaaabbbbccccd with True
PASSED: abcdcd with True
PASSED: aabbbaabbcccbbbcccddd with True
PASSED: aabbccaabbccabcabababbbccccdddd with True
PASSED: abccccdccccd with True
PASSED: aabcabcd with True
- Tal
More information about the Python-list
mailing list