regular expression for nested parentheses

Noah Hoffman noah.hoffman at gmail.com
Sun Dec 9 16:13:28 EST 2007


I have been trying to write a regular expression that identifies a
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for example, my text
editor, BBEdit, which uses the PCRE library), but have not been able
to replicate it using python's re module.

Here's a version that works using the PCRE syntax, along with the
python error message. I'm hoping for this to identify the string '(foo
(bar) (baz))'

% python -V
Python 2.5.1
% python
py> import re
py> text = 'buh (foo (bar) (baz)) blee'
py> no_ws = lambda s: ''.join(s.split())
py> rexp = r"""(?P<parens>
...     \(
...         (?>
...             (?> [^()]+ ) |
...             (?P>parens)
...         )*
...     \)
... )"""
py> print re.findall(no_ws(rexp), text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 167, in findall
    return _compile(pattern, flags).findall(string)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 233, in _compile
    raise error, v # invalid expression
sre_constants.error: unexpected end of pattern

>From what I understand of the PCRE syntax, the (?>) construct is a non-
capturing subpattern, and (?P>parens) is a recursive call to the
enclosing (named) pattern. So my best guess at a python equivalent is
this:

py> rexp2 = r"""(?P<parens>
...     \(
...         (?=
...             (?= [^()]+ ) |
...             (?P=parens)
...         )*
...     \)
... )"""
py> print re.findall(no_ws(rexp2), text)
[]

...which results in no match. I've played around quite a bit with
variations on this theme, but haven't been able to come up with one
that works.

Can anyone help me understand how to construct a regular expression
that does the job in python?

Thanks -




More information about the Python-list mailing list