How to escape strings for re.finditer?

Thomas Passin list1 at tompassin.net
Tue Feb 28 10:41:11 EST 2023


On 2/28/2023 10:05 AM, Roel Schroeven wrote:
> Op 28/02/2023 om 14:35 schreef Thomas Passin:
>> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>>> [...]
>>> (2) Searching for a string in another string, in a performant way, is 
>>> not as simple as it first appears. Your version works correctly, but 
>>> slowly. In some situations it doesn't matter, but in other cases it 
>>> will. For better performance, string searching algorithms jump ahead 
>>> either when they found a match or when they know for sure there isn't 
>>> a match for some time (see e.g. the Boyer–Moore string-search 
>>> algorithm). You could write such a more efficient algorithm, but then 
>>> it becomes more complex and more error-prone. Using a well-tested 
>>> existing function becomes quite attractive.
>>
>> Sure, it all depends on what the real task will be.  That's why I 
>> wrote "Without knowing how general your expressions will be". For the 
>> example string, it's unlikely that speed will be a factor, but who 
>> knows what target strings and keys will turn up in the future?
> On hindsight I think it was overthinking things a bit. "It all depends 
> on what the real task will be" you say, and indeed I think that should 
> be the main conclusion here.


It is interesting, though, how pre-processing the search pattern can 
improve search times if you can afford the pre-processing.  Here's a 
paper on rapidly finding matches when there may be up to one misspelled 
character.  It's easy enough to implement, though in Python you can't 
take the additional step of tuning it to stay in cache.

https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf



More information about the Python-list mailing list