How to escape strings for re.finditer?

Tue Feb 28 16:16:54 EST 2023

David,

Your results suggest we need to be reminded that lots depends on other
factors. There are multiple versions/implementations of python out there
including some written in C but also other underpinnings. Each can often
have sections of pure python code replaced carefully with libraries of
compiled code, or not. So your results will vary.

Just as an example, assume you derive a type of your own as a subclass of
str and you over-ride the find method by writing it in pure python using
loops and maybe add a few bells and whistles. If you used your improved
algorithm using this variant of str, might it not be quite a bit slower?
Imagine how much slower if your improvement also implemented caching and
logging and the option of ignoring case which are not really needed here.

This type of thing can happen in many other scenarios and some module may be
shared that is slow and a while later is updated but not everyone installs
the update so performance stats can vary wildly. 

Some people advocate using some functional programming tactics, in various
languages, partially because the more general loops are SLOW. But that is
largely because some of the functional stuff is a compiled function that
hides the loops inside a faster environment than the interpreter.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
Behalf Of David Raymond
Sent: Tuesday, February 28, 2023 2:40 PM
To: python-list at python.org
Subject: RE: How to escape strings for re.finditer?

> I wrote my previous message before reading this.  Thank you for the test
you ran -- it answers the question of performance.  You show that
re.finditer is 30x faster, so that certainly recommends that over a simple
loop, which introduces looping overhead.  

>>      def using_simple_loop(key, text):
>>          matches = []
>>          for i in range(len(text)):
>>              if text[i:].startswith(key):
>>                  matches.append((i, i + len(key)))
>>          return matches
>>
>>      using_simple_loop: [0.13952950000020792, 0.13063130000000456,
0.12803450000001249, 0.13186180000002423, 0.13084610000032626]
>>      using_re_finditer: [0.003861400000005233, 0.004061900000124297,
0.003478999999970256, 0.003413100000216218, 0.0037320000001273]

With a slight tweak to the simple loop code using .find() it becomes a third
faster than the RE version though.

def using_simple_loop2(key, text):
    matches = []
    keyLen = len(key)
    start = 0
    while (foundSpot := text.find(key, start)) > -1:
        start = foundSpot + keyLen
        matches.append((foundSpot, start))
    return matches

using_simple_loop: [0.1732664997689426, 0.1601669997908175,
0.15792609984055161, 0.1573973000049591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394,
0.0033694999292492867, 0.003354900050908327, 0.0033336998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614,
0.0025424999184906483, 0.0025831996463239193, 0.0025555999018251896]
-- 
https://mail.python.org/mailman/listinfo/python-list