index for regex.search() beyond which the RE engine will not go.

Steve D'Aprano steve+python at pearwood.info
Fri Aug 19 10:09:05 EDT 2016


On Fri, 19 Aug 2016 09:14 pm, iMath wrote:

> 
> for
> regex.search(string[, pos[, endpos]])
> The optional parameter endpos is the index into the string beyond which
> the RE engine will not go, while this lead me to believe the RE engine
> will still search on till the endpos position even after it returned the
> matched object, is this Right ?

No.

Once the RE engine finds a match, it stops. You can test this for yourself
with a small timing test, using the "timeit" module.

from timeit import Timer
huge_string = 'aaabc' + 'a'*1000000 + 'dea'
re1 = r'ab.a'
re2 = r'ad.a'

# set up some code to time.
setup = 'import re; from __main__ import huge_string, re1, re2'
t1 = Timer('re.search(re1, huge_string)', setup)
t2 = Timer('re.search(re2, huge_string)', setup)

# Now run the timers.
best = min(t1.repeat(number=1000))/1000
print("Time to locate regex at the start of huge string:", best)
best = min(t2.repeat(number=1000))/1000
print("Time to locate regex at the end of the huge string:", best)



When I run that on my computer, it prints:

Time to locate regex at the start of huge string: 4.9710273742675785e-06
Time to locate regex at the end of the huge string: 0.0038938069343566893


So it takes about 4.9 microseconds to find the regex at the beginning of the
string. To find the regex at the end of the string takes about 3893
microseconds.


The "endpos" parameter tells the RE engine to stop at that position if the
regex isn't found before it. It won't go beyond that point.






-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list