regex question

taleinat taleinat at gmail.com
Fri Aug 4 06:11:27 EDT 2006


Gabriel Murray <gabriel.murray <at> gmail.com> writes:

> 
> Hello, I'm looking for a regular expression which will match strings as
follows: if there are symbols a, b, c and d, then any pattern is valid if it
begins with a and ends with d and proceeds in order through the symbols.
However, at any point the pattern may reset to an earlier position in the
sequence and begin again from there.
> For example, these would be valid
patterns:aabbbaabbcccbbbcccdddaabcabcdabcdBut these would
not:aaaaabbbbbccccaaaaadddd   (goes straight from a to d)aaaaaaaaaaabbbbbccc
(does not reach d)Can anyone think of a concise way of writing this regex? The
ones I can think of are very long and awkward.Gabriel
> 

Your cirteria could be defined more simply as the following:
* must start with an 'a' and end with a 'd'
* an 'a' must not be followed by 'c' or 'd'
* a 'b' must not be followed by 'd'

Therefore the regexp can more simply be written as:
regexp = re.compile(r'''a
                        (
                        a(?!c|d) |
                        b(?!d) |
                        c |
                        d
                        )*
                        d''',
                    re.VERBOSE)

Test code:

tests = [
    ('abcd', True),
    ('aaaaaaaaaaabbbbbccc', False),
    ('aabbccaabbccabcdddcabababbbccccdddd', True),
    ('aabbccaabbccabcabababbbccccddddabcd', True),
    ('aaaaabbbbbccccaaaaadddd', False),
    ('aabbccaabbccacabababbbccccdddd', False),
    ('abccccdaaaabbbbccccd', True),
    ('abcdcd', True),
    ('aabbbaabbcccbbbcccddd', True),
    ('aabbccaabbccabcabababbbccccdddd', True),
    ('abccccdccccd', True),
    ('aabcabcd', True)
    ]

def checkit(regexp, tests=tests):
    for test, expected in tests:
        matched = regexp.match(test) is not None
        if matched == expected:
            print "PASSED: %s with %s" % (test, expected)
        else:
            print "FAILED: %s with %s" % (test, expected)

>>> checkit(regexp, tests)
PASSED: abcd with True
PASSED: aaaaaaaaaaabbbbbccc with False
PASSED: aabbccaabbccabcdddcabababbbccccdddd with True
PASSED: aabbccaabbccabcabababbbccccddddabcd with True
PASSED: aaaaabbbbbccccaaaaadddd with False
PASSED: aabbccaabbccacabababbbccccdddd with False
PASSED: abccccdaaaabbbbccccd with True
PASSED: abcdcd with True
PASSED: aabbbaabbcccbbbcccddd with True
PASSED: aabbccaabbccabcabababbbccccdddd with True
PASSED: abccccdccccd with True
PASSED: aabcabcd with True


- Tal




More information about the Python-list mailing list