Why is re.search() so much faster than re.sub() when there are no matches?

News colin_lipworth at cs.com.au
Tue May 15 20:11:41 EDT 2001


I don't understand why re.sub() is so slow if no substitutions are done:

The first loop in Active Python build 203 on Windows 2000 takes 1.26 seconds
and the second loop takes 49.3 seconds.

That's a huge difference. I would have thought that sub() must do a regular
expression search() internally to see if there is anything to substitute,
and don't see why I can make it 39 times faster by explicitly doing the
search first instead of letting re.sub() do it..

import re
line = "fsfsaf sf saf sdafsfsadf sadfdsafsadfdsafsf fdsf sf sd f s f sf saf
safsfffff sdfsadf  f  sadf sa"
pattern = re.compile(r"\bword\b")
for i in range(1,100000):
    if pattern.search("line"):
        line = pattern.sub("new word", line)
for i in range(1,100000):
    line = pattern.sub("new word", line)

Perl does not exhibit this behavior and has similar speed to the first loop
in both cases

I hope that this is not a FAQ, but I did a Deja search and looked in the
Python FAQ and could not find this point:

Regards,

Colin Lipworth





More information about the Python-list mailing list