parenthesis
Michele Simionato
mis6 at pitt.edu
Tue Nov 5 13:10:34 EST 2002
> Wondering why I didn't just write:
>
> >>> import re
> >>> rx = re.compile(r'([()]|[^()]+)')
> >>> class Addelim:
> ... def __init__(self, delim):
> ... self.parens=0; self.delim=delim
> ... def __call__(self, m):
> ... s = m.group(1)
> ... if s=='(': self.parens+=1
> ... if self.parens==1 and s==')':
> ... self.parens=0
> ... return s+self.delim
> ... if s==')': self.parens -=1
> ... return s
> ...
> >>> exp = '(a*(b+c*(2-x))+d)+f(s1)'
>
> It was natural to be able to specify the delimiter. And the + is probably
> better than the * on the non-paren "[^()]+" part of the pattern.
Not really. My benchmark gives essentially the same for "[^()]+*" and
"[^()]*", no sensible difference.
> Then using \n as delimiter to break into lines one can just print it.
>
> >>> print rx.sub(Addelim('\n'),exp)
> (a*(b+c*(2-x))+d)
> +f(s1)
>
> Which you could also use like:
>
> >>> print rx.sub(Addelim('\n'),exp).splitlines()
> ['(a*(b+c*(2-x))+d)', '+f(s1)']
>
> Or to get back to your original requirement,
>
> >>> print rx.sub(Addelim('\n'),exp).splitlines()[0]
> (a*(b+c*(2-x))+d)
>
> But I suspect it would run faster to let a regex split the string and then use
> a loop like yours on the pieces, which would be '(' or ')' or some other string
> that you don't need to look at character by character. E.g.,
>
> >>> rx = re.compile(r'([()])')
> >>> ss = rx.split(exp)
> >>> ss
> ['', '(', 'a*', '(', 'b+c*', '(', '2-x', ')', '', ')', '+d', ')', '+f', '(', 's1', ')', '']
>
> Notice that the splitter matches wind up at the odd indices. I think that's generally true
> when you put parens around the splitting expression, to return the matches as part of the list,
> but I'm not 100% certain. Anyway, you could make use of that, something like:
>
> >>>
> >>> parens = 0
> >>> endix = []
> >>> for i in range(1,len(ss),2):
> ... if parens==1 and ss[i]==')':
> ... parens=0; endix.append(i+1)
> ... elif ss[i]=='(': parens += 1
> ... else: parens -= 1
> ...
> >>> endix
> [12, 16]
>
> You could break the loop like you did if you just want the first expression,
> or you could grab it by
>
> >>> print ''.join(ss[:endix[0]])
> (a*(b+c*(2-x))+d)
>
> or list the bunch,
>
> >>> lo=0
> >>> for hi in endix:
> ... print ''.join(ss[lo:hi])
> ... lo = hi
> ...
> (a*(b+c*(2-x))+d)
> +f(s1)
>
> or whatever. Which is not as slick, but probably faster if you had to do a bi-ig bunch of them.
>
> I think when the fenceposts are simple, but you are mainly interested in the data between, splitting
> on a fencepost regex and processing the resulting list can be simpler and faster than trying to
> do it all with a complex regex.
>
> Regards,
> Bengt Richter
I strongly suspect that in this simple problem the simple approach is by far
the fastest.
Bye,
Michele
More information about the Python-list
mailing list