possible bug in re expression?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Apr 26 03:24:35 EDT 2014


On Fri, 25 Apr 2014 14:32:30 -0400, Terry Reedy wrote:

> On 4/25/2014 12:30 PM, Robin Becker wrote:

[...]
>> should
>>
>> re.compile('.{1,+3}')
>>
>> raise an error? It doesn't on python 2.7 or 3.3.
> 
> And it should not because it is not an error. '+' means 'match 1 or more
> occurrences of the preceding re' and the preceding re is ','.

Actually, no. Braces have special meaning, and are used to specify a 
number of matches. R{m,n} matches from m to n repetitions of the 
preceding regex R:


py> re.search('(spam){2,4}', 'spam-spamspamspam-spam').group()
'spamspamspam'


This surprises me:

>  >>> re.match('a{1,+3}', 'a{1,,,3}').group()
> 'a{1,,,3}'


I would have expected that either +3 would have been interpreted as just 
"3", or that it would have been an invalid regex. It appears that what is 
happening is that if the braces cannot be interpreted as a repetition 
group, they are interpreted as regular characters. Those sort of silent 
errors is why I hate programming in regexes.

> I suppose that one could argue that '{' alone should be treated as
> special immediately, and not just when a matching '}' is found, and
> should disable other special meanings. I wonder what JS does if there is
> no matching '}'?

Probably silently do the wrong thing :-)


-- 
Steven D'Aprano
http://import-that.dreamwidth.org/



More information about the Python-list mailing list