How to escape strings for re.finditer?

David Raymond David.Raymond at tomtom.com
Tue Feb 28 14:40:10 EST 2023


> I wrote my previous message before reading this.  Thank you for the test you ran -- it answers the question of performance.  You show that re.finditer is 30x faster, so that certainly recommends that over a simple loop, which introduces looping overhead.  

>>      def using_simple_loop(key, text):
>>          matches = []
>>          for i in range(len(text)):
>>              if text[i:].startswith(key):
>>                  matches.append((i, i + len(key)))
>>          return matches
>>
>>      using_simple_loop: [0.13952950000020792, 0.13063130000000456, 0.12803450000001249, 0.13186180000002423, 0.13084610000032626]
>>      using_re_finditer: [0.003861400000005233, 0.004061900000124297, 0.003478999999970256, 0.003413100000216218, 0.0037320000001273]


With a slight tweak to the simple loop code using .find() it becomes a third faster than the RE version though.


def using_simple_loop2(key, text):
    matches = []
    keyLen = len(key)
    start = 0
    while (foundSpot := text.find(key, start)) > -1:
        start = foundSpot + keyLen
        matches.append((foundSpot, start))
    return matches


using_simple_loop: [0.1732664997689426, 0.1601669997908175, 0.15792609984055161, 0.1573973000049591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394, 0.0033694999292492867, 0.003354900050908327, 0.0033336998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614, 0.0025424999184906483, 0.0025831996463239193, 0.0025555999018251896]


More information about the Python-list mailing list