How to escape strings for re.finditer?

Jen Kris jenkris at tutanota.com
Tue Feb 28 12:57:52 EST 2023


The code I sent is correct, and it runs here.  Maybe you received it with a carriage return removed, but on my copy after posting, it is correct:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
 find_string = re.escape('abc_degree + 1')
 for match in re.finditer(find_string, example):
     print(match.start(), match.end())

One question:  several people have made suggestions other than regex (not your terser example with regex you shown below).  Is there a reason why regex is not preferred to, for example, a list comp?  Performance?  Reliability?  



  


Feb 27, 2023, 18:16 by avi.e.gross at gmail.com:

> Jen,
>
> Can you see what SOME OF US see as ASCII text? We can help you better if we get code that can be copied and run as-is.
>
>  What you sent is not terse. It is wrong. It will not run on any python interpreter because you somehow lost a carriage return and indent.
>
> This is what you sent:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> This is code indentedproperly:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') 
> for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> Of course I am sure you wrote and ran code more like the latter version but somewhere in your copy/paste process, ....
>
> And, just for fun, since there is nothing wrong with your code, this minor change is terser:
>
>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>>
> ...     print(match.start(), match.end())
> ... 
> ... 
> 4 18
> 26 40
>
> But note once you use regular expressions, and not in your case, you might match multiple things that are far from the same such as matching two repeated words of any kind in any case including "and and" and "so so" or finding words that have multiple doubled letter as in the  stereotypical bookkeeper. In those cases, you may want even more than offsets but also show the exact text that matched or even show some characters before and/or after for context.
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On Behalf Of Jen Kris via Python-list
> Sent: Monday, February 27, 2023 8:36 PM
> To: Cameron Simpson <cs at cskk.id.au>
> Cc: Python List <python-list at python.org>
> Subject: Re: How to escape strings for re.finditer?
>
>
> I haven't tested it either but it looks like it would work.  But for this case I prefer the relative simplicity of:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> 4 18
> 26 40
>
> I don't insist on terseness for its own sake, but it's cleaner this way. 
>
> Jen
>
>
> Feb 27, 2023, 16:55 by cs at cskk.id.au:
>
>> On 28Feb2023 01:13, Jen Kris <jenkris at tutanota.com> wrote:
>>
>>> I went to the re module because the specified string may appear more than once in the string (in the code I'm writing).
>>>
>>
>> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>>
>>  pos = 0
>>  while True:
>>  found = s.find(substring, pos)
>>  if found < 0:
>>  break
>>  start = found
>>  end = found + len(substring)
>>  ... do whatever with start and end ...
>>  pos = end
>>
>> Many people go straight to the `re` module whenever they're looking for strings. It is often cryptic error prone overkill. Just something to keep in mind.
>>
>> Cheers,
>> Cameron Simpson <cs at cskk.id.au>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list