Seeking regex optimizer
Mirco Wahab
peace.is.our.profession at gmx.de
Tue Jun 20 09:05:38 EDT 2006
Thus spoke andrewdalke at gmail.com (on 2006-06-20 01:39):
Hi, are you the A.Dalke from the Schulten group (VMD) as
listed here: http://www.ks.uiuc.edu/Overview/People/former.cgi
> Replying to me Mirco Wahab wrote:
>> If you pull the strings into (?>( ... )) (atomic groups),
>> this would't happen.
>
> Given that Python's re engine doesn't support this feature
> it doesn't really help the original poster's problem.
Yeah, right. I misunderstood the problem
- where it was meant to be 'literally'
a problem of capture group id's. I took
it metaphorical, which meant to me
'problems by backtracking' (which was wrong).
> Even if some future Python did support it, the limit
> to 100 named groups is unaffected by backtracking.
Right for that one, what I had in mind was something
like the following:
# naive regex '\d+9'
# find some number only if it ends by 9
my $str="1009900000000000000000000000000000000000";
print "non-atomic backtracks: $1\n" if $str =~ / ( \d+9 )/x;
print "atomic: matches $1\n" if $str =~ / ( (?>\d+)9 )/x;
Second line would not backtrack and not find the 9 behind \d+
because atomic grouping prevents that, so line 1 matches
10099 and line 2 nothing (but very fast, which is the point).
>>>> import re
>>>> re.compile("(.)"*100)
> Traceback (most recent call last):
> ...
> ...
> AssertionError: sorry, but this version only supports 100 named groups
>>>>
I tried:
my $re = '(\d+)\D' x 10000;
my $text = join ' ', 0..10000;
print $+ if $text =~ /$re/;
print "\n", $1001;
which prints:
9999
1000
Why is there a named group count
restriction in Python? Any reasons
one should know?
Regards & thanks
Mirco
More information about the Python-list
mailing list