How to escape strings for re.finditer?

Jen Kris jenkris at tutanota.com
Mon Feb 27 19:13:32 EST 2023


I went to the re module because the specified string may appear more than once in the string (in the code I'm writing).  For example:  

a = "X - abc_degree + 1 + qq + abc_degree + 1"
 b = "abc_degree + 1"
 q = a.find(b)

print(q)
4

So it correctly finds the start of the first instance, but not the second one.  The re code finds both instances.  If I knew that the substring occurred only once then the str.find would be best.  

I changed my re code after MRAB's comment, it now works.  

Thanks much.  

Jen


Feb 27, 2023, 15:56 by cs at cskk.id.au:

> On 28Feb2023 00:11, Jen Kris <jenkris at tutanota.com> wrote:
>
>> When matching a string against a longer string, where both strings have spaces in them, we need to escape the spaces. 
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I want. 
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq')
>> find_string = re.escape('cty_degrees + 1')
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no match. 
>>
>
> You need to print those strings out. You're escaping the _example_ string, which would make it:
>
>  X - cty_degrees \+ 1 \+ qq
>
> because `+` is a special character in regexps and so `re.escape` escapes it. But you don't want to mangle the string you're searching! After all, the text above does not contain the string `cty_degrees + 1`.
>
> My secondary question is: if you're escaping the thing you're searching _for_, then you're effectively searching for a _fixed_ string, not a pattern/regexp. So why on earth are you using regexps to do your searching?
>
> The `str` type has a `find(substring)` function. Just use that! It'll be faster and the code simpler!
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list