re question - finiding matching ()
Jeff Epler
jepler at unpythonic.net
Sun Jan 18 13:07:39 EST 2004
Regular Expressions cannot perform the simple task of matching an
arbitrary number of parentheses. You could write an expression that
will work as long as the nesting isn't more than N levels, but the
expression quickly becomes very painful.
Instead, you can use some method to split the string into parts: one
part for "(", one for ")" and one for any other sequence of characters:
tokenize_re = re.compile("\(|\)|[^()]*")
Now, use this expression to tokenize your input string:
>>> tokens = tokenize_re.findall("add((2 * 2), 100)")
>>> print tokens
['add', '(', '(', '2 * 2', ')', ', 100', ')']
To match your language, the first token must be 'add':
if tokens[0] != 'add': # no match
The second token must be '(':
if tokens[1] != '(': # no match
Now, start scanning and counting parens, until you get back to 0
nesting_level = 0
for t in tokens[1:]:
if t == ')':
nesting_level -= 1
if nesting_level == 0:
# End of desired string
else if t == '(':
nesting_level += 1
# if you end the loop without reaching nesting_level 0
# then there were not enough close-parens
# no match
You could also implement this part recursively (as you likely would if
you were actually compiling the string into bytecode).
Jeff
More information about the Python-list
mailing list