Anomalous behaviour when compiling regular expressions?

Fredrik Lundh fredrik at pythonware.com
Mon Mar 13 06:23:42 EST 2006


Harvey.Thomas at informa.com wrote:

> >>> import re
> >>> r = re.compile('(a|b*)+')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "c:\python24\lib\sre.py", line 180, in compile
>     return _compile(pattern, flags)
>   File "c:\python24\lib\sre.py", line 227, in _compile
>     raise error, v # invalid expression
> sre_constants.error: nothing to repeat
>
> but
>
> >>> r = re.compile('(a|b*c*)+')
> >>> r.match('def').group()
> ''
>
> Why is there a difference in behaviour between the two cases. Surely the
> two cases are equivalent to:
>
> >>> r = re.compile('(a|b)*')
> >>> r.match('def').group()
> ''
>
> and
>
> >>> r = re.compile('(a|b|c)*')
> >>> r.match('def').group()
> ''

your definition of "equivalent" is a bit unusual:

>>> re.match("(a|b*c*)+", "abc").groups()
('',)
>>> re.match("(a|b)*", "abc").groups()
('b',)
>>> re.match("(a|b|c)*", "abc").groups()
('c',)

that you don't get an error for

> >>> r = re.compile('(a|b*c*)+')
> >>> r.match('def').group()

might be a compiler bug.  running it on 2.3 gives you another error,
though:

>>> re.match("(a|b*c*)+", "abc").groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\python23\lib\sre.py", line 132, in match
    return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded

(a repeated group with a min-length of zero can match anything an
infinite number of times, which is, in general, not what you want)

</F>






More information about the Python-list mailing list