regex recursive matching (regex 2015.07.19)

Terry Reedy tjreedy at udel.edu
Tue Aug 18 11:27:43 EDT 2015


On 8/18/2015 10:25 AM, Neal Becker wrote:
> Trying regex 2015.07.19
>
> I'd like to match recursive parenthesized expressions, with groups such that
> '(a(b)c)'

Extended regular expressions can only match strings in extended regular 
languages.  General nested expressions are too general for that.  You 
need a context-free parser.  You can find them on pypi or write your 
own, which in this case is quite simple.
---
from xploro.test import ftest  # my personal function test function

io_pairs = (('abc', []), ('(a)', [(0, '(a)')]), ('a(b)c', [(1, '(b)')]),
             ('(a(b)c)', [(0, '(a(b)c)'), (2, '(b)')]),
             ('a(b(cd(e))(f))g', [(1, '(b(cd(e))(f))'), (3, '(cd(e))'),
                                  (6, '(e)'), (10, '(f)')]),)

def parens(text):
     '''Return sorted list of paren tuples for text.

     Paren tuple is start index (for sorting) and substring.
     '''
     opens = []
     parens = set()
     for i, char in enumerate(text):
         if char == '(':
             opens.append(i)
         elif char == ')':
             start = opens.pop()
             parens.add((start, text[start:(i+1)]))
     return sorted(parens)

ftest(parens, io_pairs)
---
all pass


> would give
> group(0) -> '(a(b)c)'
> group(1) -> '(b)'
>
> but that's not what I get
>
> import regex
>
> #r = r'\((?>[^()]|(?R))*\)'
> r = r'\(([^()]|(?R))*\)'
> #r = r'\((?:[^()]|(?R))*\)'
> m = regex.match (r, '(a(b)c)')
>
>   m.groups()
> Out[28]: ('c',)
>


-- 
Terry Jan Reedy




More information about the Python-list mailing list