re question - finiding matching ()

Jeff Epler jepler at unpythonic.net
Sun Jan 18 13:07:39 EST 2004


Regular Expressions cannot perform the simple task of matching an
arbitrary number of parentheses.  You could write an expression that
will work as long as the nesting isn't more than N levels, but the
expression quickly becomes very painful.

Instead, you can use some method to split the string into parts: one
part for "(", one for ")" and one for any other sequence of characters:

tokenize_re = re.compile("\(|\)|[^()]*")

Now, use this expression to tokenize your input string:

>>> tokens = tokenize_re.findall("add((2 * 2), 100)")
>>> print tokens
['add', '(', '(', '2 * 2', ')', ', 100', ')']
To match your language, the first token must be 'add':
	if tokens[0] != 'add': # no match
The second token must be '(':
	if tokens[1] != '(': # no match
Now, start scanning and counting parens, until you get back to 0
	nesting_level = 0
	for t in tokens[1:]:
		if t == ')':
			nesting_level -= 1
			if nesting_level == 0:
				# End of desired string
		else if t == '(':
			nesting_level += 1
	# if you end the loop without reaching nesting_level 0
	# then there were not enough close-parens
	# no match
You could also implement this part recursively (as you likely would if
you were actually compiling the string into bytecode).

Jeff




More information about the Python-list mailing list