Understanding '?' in regular expressions

Jussi Piitulainen jpiitula at ling.helsinki.fi
Fri Nov 16 03:05:32 EST 2012


krishna.k.kishor3 at gmail.com writes:

> Can someone explain the below behavior please?
> 
> >>> re1 = re.compile(r'(?:((?:1000|1010|1020))[ ]*?[\,]?[ ]*?){1,3}')
> >>> re.findall(re_obj,'1000,1020,1000')
> ['1000']
> >>> re.findall(re_obj,'1000,1020, 1000')
> ['1020', '1000']
> 
> However when I use "[\,]??" instead of "[\,]?" as below, I see a
> different result
> >>> re2 = re.compile(r'(?:((?:1000|1010|1020))[ ]*?[\,]??[ ]*?){1,3}')
> >>> re.findall(re_obj,'1000,1020,1000')
> ['1000', '1020', '1000']

Those re_obj should be re1 and re2, respectively. With that
correction, the behaviour appears to be as you say.

> I am not able to understand what's causing the difference of
> behavior here, I am assuming it's not 'greediness' if "?"

But the greed seems to be the only the difference.

I can't wrap my mind around this (at the moment at least) and I need
to rush away, but may I suggest the removal of all that is not
relevant to the problem at hand. Study these instead:

>>> re.findall(r'(10.0,?){1,3}', '1000,1020,1000')
['1000']
>>> re.findall(r'(10.0,??){1,3}', '1000,1020,1000')
['1000', '1020', '1000']



More information about the Python-list mailing list