Understanding '?' in regular expressions
Jussi Piitulainen
jpiitula at ling.helsinki.fi
Fri Nov 16 03:05:32 EST 2012
krishna.k.kishor3 at gmail.com writes:
> Can someone explain the below behavior please?
>
> >>> re1 = re.compile(r'(?:((?:1000|1010|1020))[ ]*?[\,]?[ ]*?){1,3}')
> >>> re.findall(re_obj,'1000,1020,1000')
> ['1000']
> >>> re.findall(re_obj,'1000,1020, 1000')
> ['1020', '1000']
>
> However when I use "[\,]??" instead of "[\,]?" as below, I see a
> different result
> >>> re2 = re.compile(r'(?:((?:1000|1010|1020))[ ]*?[\,]??[ ]*?){1,3}')
> >>> re.findall(re_obj,'1000,1020,1000')
> ['1000', '1020', '1000']
Those re_obj should be re1 and re2, respectively. With that
correction, the behaviour appears to be as you say.
> I am not able to understand what's causing the difference of
> behavior here, I am assuming it's not 'greediness' if "?"
But the greed seems to be the only the difference.
I can't wrap my mind around this (at the moment at least) and I need
to rush away, but may I suggest the removal of all that is not
relevant to the problem at hand. Study these instead:
>>> re.findall(r'(10.0,?){1,3}', '1000,1020,1000')
['1000']
>>> re.findall(r'(10.0,??){1,3}', '1000,1020,1000')
['1000', '1020', '1000']
More information about the Python-list
mailing list