Anomalous behaviour when compiling regular expressions?

Fredrik Lundh fredrik at pythonware.com
Mon Mar 13 06:14:39 EST 2006


Harvey.Thomas at informa.com wrote:

> >>> import re
> >>> r = re.compile('(a|b*)+')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "c:\python24\lib\sre.py", line 180, in compile
>     return _compile(pattern, flags)
>   File "c:\python24\lib\sre.py", line 227, in _compile
>     raise error, v # invalid expression
> sre_constants.error: nothing to repeat
>
> but
>
> >>> r = re.compile('(a|b*c*)+')
> >>> r.match('def').group()
> ''
>
> Why is there a difference in behaviour between the two cases. Surely the
> two cases are equivalent to:
>
> >>> r = re.compile('(a|b)*')
> >>> r.match('def').group()
> ''

equivalent?


>>> re.match("(a|b*c*)", "abc").groups()
('a',)

>>> re.match("(a|b)*", "abc").groups()
('b',)


I have no time to sort out why your second example doesn't give the
same error (that might be a bug in the RE compiler), but no, a repeated
group with a min-length of 1 is not, in general, the same thing as a re-
peated group with a min-length of zero.

</F>






More information about the Python-list mailing list