How to escape strings for re.finditer?

Jen Kris jenkris at tutanota.com
Mon Feb 27 19:39:57 EST 2023


string.count() only tells me there are N instances of the string; it does not say where they begin and end, as does re.finditer.  

Feb 27, 2023, 16:20 by bobmellowood at gmail.com:

> Would string.count() work for you then?
>
> On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> python-list at python.org> > wrote:
>
>>
>> I went to the re module because the specified string may appear more than once in the string (in the code I'm writing).  For example: 
>>  
>>  a = "X - abc_degree + 1 + qq + abc_degree + 1"
>>   b = "abc_degree + 1"
>>   q = a.find(b)
>>  
>>  print(q)
>>  4
>>  
>>  So it correctly finds the start of the first instance, but not the second one.  The re code finds both instances.  If I knew that the substring occurred only once then the str.find would be best.  
>>  
>>  I changed my re code after MRAB's comment, it now works.  
>>  
>>  Thanks much.  
>>  
>>  Jen
>>  
>>  
>>  Feb 27, 2023, 15:56 by >> cs at cskk.id.au>> :
>>  
>>  > On 28Feb2023 00:11, Jen Kris <>> jenkris at tutanota.com>> > wrote:
>>  >
>>  >> When matching a string against a longer string, where both strings have spaces in them, we need to escape the spaces. 
>>  >>
>>  >> This works (no spaces):
>>  >>
>>  >> import re
>>  >> example = 'abcdefabcdefabcdefg'
>>  >> find_string = "abc"
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> That gives me the start and end character positions, which is what I want. 
>>  >>
>>  >> However, this does not work:
>>  >>
>>  >> import re
>>  >> example = re.escape('X - cty_degrees + 1 + qq')
>>  >> find_string = re.escape('cty_degrees + 1')
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> I’ve tried several other attempts based on my reseearch, but still no match. 
>>  >>
>>  >
>>  > You need to print those strings out. You're escaping the _example_ string, which would make it:
>>  >
>>  >  X - cty_degrees \+ 1 \+ qq
>>  >
>>  > because `+` is a special character in regexps and so `re.escape` escapes it. But you don't want to mangle the string you're searching! After all, the text above does not contain the string `cty_degrees + 1`.
>>  >
>>  > My secondary question is: if you're escaping the thing you're searching _for_, then you're effectively searching for a _fixed_ string, not a pattern/regexp. So why on earth are you using regexps to do your searching?
>>  >
>>  > The `str` type has a `find(substring)` function. Just use that! It'll be faster and the code simpler!
>>  >
>>  > Cheers,
>>  > Cameron Simpson <>> cs at cskk.id.au>> >
>>  > -- 
>>  > >> https://mail.python.org/mailman/listinfo/python-list
>>  >
>>  
>>  -- 
>>  >> https://mail.python.org/mailman/listinfo/python-list
>>
>
>
> -- 
> **** Listen to my CD at > http://www.mellowood.ca/music/cedars>  ****
> Bob van der Poel ** Wynndel, British Columbia, CANADA **
> EMAIL: > bob at mellowood.ca
> WWW:   > http://www.mellowood.ca
>



More information about the Python-list mailing list