Regular expression question -- exclude substring
Kent Johnson
kent37 at tds.net
Mon Nov 7 20:31:40 EST 2005
James Stroud wrote:
> On Monday 07 November 2005 16:18, google at fatherfrost.com wrote:
>
>>Ya, for some reason your non-greedy "?" doesn't seem to be taking.
>>This works:
>>
>>re.sub('(.*)(00.*?01) target_mark', r'\2', your_string)
>
>
> The non-greedy is actually acting as expected. This is because non-greedy
> operators are "forward looking", not "backward looking". So the non-greedy
> finds the start of the first start-of-the-match it comes accross and then
> finds the first occurrence of '01' that makes the complete match, otherwise
> the greedy operator would match .* as much as it could, gobbling up all '01's
> before the last because these match '.*'. For example:
>
> py> rgx = re.compile(r"(00.*01) target_mark")
> py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
> ['00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01']
> py> rgx = re.compile(r"(00.*?01) target_mark")
> py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
> ['00 noise1 01 noise2 00 target 01', '00 dowhat 01']
??? not in my Python:
>>> rgx = re.compile(r"(00.*01) target_mark")
>>> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
['00 noise1 01 noise2 00 target 01']
>>> rgx = re.compile(r"(00.*?01) target_mark")
>>> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
['00 noise1 01 noise2 00 target 01']
Since target_mark only occurs once in the string the greedy and non-greedy match is the same in this case.
Kent
More information about the Python-list
mailing list