Regular expression question -- exclude substring

James Stroud jstroud at mbi.ucla.edu
Mon Nov 7 19:38:11 EST 2005


On Monday 07 November 2005 16:18, google at fatherfrost.com wrote:
> Ya, for some reason your non-greedy "?" doesn't seem to be taking.
> This works:
>
> re.sub('(.*)(00.*?01) target_mark', r'\2', your_string)

The non-greedy is actually acting as expected. This is because non-greedy 
operators are "forward looking", not "backward looking". So the non-greedy 
finds the start of the first start-of-the-match it comes accross and then 
finds the first occurrence of '01' that makes the complete match, otherwise 
the greedy operator would match .* as much as it could, gobbling up all '01's 
before the last because these match '.*'. For example:

py> rgx = re.compile(r"(00.*01) target_mark")
py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
['00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01']
py> rgx = re.compile(r"(00.*?01) target_mark")
py> rgx.findall('00 noise1 01 noise2 00 target 01 target_mark 00 dowhat 01')
['00 noise1 01 noise2 00 target 01', '00 dowhat 01']

My understanding is that backward looking operators are very resource 
expensive to implement.

James

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/



More information about the Python-list mailing list