parenthesis

Michele Simionato mis6 at pitt.edu
Tue Nov 5 12:56:54 EST 2002


bokr at oz.net (Bengt Richter) wrote in message news:<aq6qun$ib4$0 at 216.39.172.122>...
> Well, they don't count, so if you want to count you have to throw in
> something extra. E.g., you could do this, to insert a delimiter after
> a closing right paren, and then split on the delimiter. Probably not
> wonderfully efficient, and I am just duplicating what you did, except
> the regex separates the chunks for me.
> 
>  >>> import re
>  >>> rx = re.compile(r'([()]|[^()]*)')
>  >>> class Addelim:
>  ...     def __init__(self): self.parens=0
>  ...     def __call__(self, m):
>  ...         s = m.group(1)
>  ...         if s=='(': self.parens+=1
>  ...         if self.parens==1 and s==')':
>  ...             self.parens=0
>  ...             return s+'\x00'
>  ...         if s==')': self.parens -=1
>  ...         return s
>  ...
>  >>> for e in rx.sub(Addelim(),exp).split('\x00'): print e
>  ...
>  (a*(b+c*(2-x))+d)
>  +f(s1)
> 
> Where exp was
>  >>> exp
>  '(a*(b+c*(2-x))+d)+f(s1)'
> 
> Regards,
> Bengt Richter

Very interesting approach. But it is even more interesting to compare
its
speed with the simple minded approach. I thought your algorithm was
going to
be the fastest, since you do not split the initial string chars by
chars in Python, but let the regular expression do the job.
However a simple benchmark (not subtracting the overhead times) gives:

parenthesized_group:  130-140 microseconds
Addelim:           620-640 microseconds

The simple minded approach is more than four-five times faster!

I think this is rather instructive. Addelim is particular inefficient
for
long expressions, since it analizes the full expression whereas
parenthesized_group stops at the end of the first parenthesized group.
For fun I run Addelim on exp*100 (i.e. the 100 times the original
string): it takes more than 50000 microseconds whereas
parenthesized_group
is still on 140 microseconds.

It is good to have yet another proof of the dangers involved with
regular
expressions !

Bye,

                                  Michele



More information about the Python-list mailing list