Regular expression question -- exclude substring

Kent Johnson kent37 at tds.net
Mon Nov 7 20:31:40 EST 2005


James Stroud wrote:
> On Monday 07 November 2005 16:18, google at fatherfrost.com wrote:
> 
>>Ya, for some reason your non-greedy "?" doesn't seem to be taking.
>>This works:
>>
>>re.sub('(.*)(00.*?01) target_mark', r'\2', your_string)
> 
> 
> The non-greedy is actually acting as expected. This is because non-greedy 
> operators are "forward looking", not "backward looking". So the non-greedy 
> finds the start of the first start-of-the-match it comes accross and then 
> finds the first occurrence of '01' that makes the complete match, otherwise 
> the greedy operator would match .* as much as it could, gobbling up all '01's 
> before the last because these match '.*'. For example:
> 
> py> rgx = re.compile(r"(00.*01) target_mark")
> py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
> ['00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01']
> py> rgx = re.compile(r"(00.*?01) target_mark")
> py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
> ['00 noise1 01 noise2 00 target 01', '00 dowhat 01']

??? not in my Python:
 >>> rgx = re.compile(r"(00.*01) target_mark")
 >>> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
['00 noise1 01 noise2 00 target 01']
 >>> rgx = re.compile(r"(00.*?01) target_mark")
 >>> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
['00 noise1 01 noise2 00 target 01']

Since target_mark only occurs once in the string the greedy and non-greedy match is the same in this case.

Kent



More information about the Python-list mailing list