Ask for help on using re

ast ast at invalid
Fri Aug 6 05:18:27 EDT 2021


Le 06/08/2021 à 02:57, Jach Feng a écrit :
> ast 在 2021年8月5日 星期四下午11:29:15 [UTC+8] 的信中寫道:
>> Le 05/08/2021 à 17:11, ast a écrit :
>>> Le 05/08/2021 à 11:40, Jach Feng a écrit :

>> import regex
>>
>> # regex is more powerful that re
>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>> regex.findall(r'ch \d++(?!\.)', text)
>>
>> ['ch 4', 'ch 56']
>>
>> ## ++ means "possessive", no backtrack is allowed

> Can someone explain how the difference appear? I just can't figure it out:-(
> 

+, *, ? are greedy, means they try to catch as many characters
as possible. But if the whole match doesn't work, they release
some characters once at a time and try the whole match again.
That's backtrack.
With ++, backtrack is not allowed. This works with module regex
and it is not implemented in module re

with string = "ch 23." and pattern = r"ch \d+\."

At first trial \d+  catch 23
but whole match will fail because next character is . and . is not 
allowed (\.)

A backtrack happens:

\d+  catch only 2
and the whole match is successful because the next char 3 is not .
But this is not what we want.

with ++, no backtrack, so no match
"ch 23." is rejected
this is what we wanted


Using re only, the best way is probably

re.findall(r"ch \d+(?![.0-9])", text)
['ch 4', 'ch 56']


More information about the Python-list mailing list