regexp compilation error

Vlastimil Brom vlastimil.brom at gmail.com
Fri Sep 30 09:56:20 EDT 2011


2011/9/30 Ovidiu Deac <ovidiudeac at gmail.com>:
> This is only part of a regex taken from an old perl application which
> we are trying to understand/port to our new Python implementation.
>
> The original regex was considerably more complex and it didn't compile
> in python so I removed all the parts I could in order to isolate the
> problem such that I can ask help here.
>
> So the problem is that this regex doesn't compile. On the other hand
> I'm not really sure it should. It's an anchor on which you apply *.
> I'm not sure if this is legal.
>
> On the other hand if I remove one of the * it compiles.
>
>>>> re.compile(r"""^(?: [^y]* )*""", re.X)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/usr/lib/python2.6/re.py", line 190, in compile
>    return _compile(pattern, flags)
>  File "/usr/lib/python2.6/re.py", line 245, in _compile
>    raise error, v # invalid expression
> sre_constants.error: nothing to repeat
>>>> re.compile(r"""^(?: [^y] )*""", re.X)
> <_sre.SRE_Pattern object at 0x7f4069cc36b0>
>>>> re.compile(r"""^(?: [^y]* )""", re.X)
> <_sre.SRE_Pattern object at 0x7f4069cc3730>
>
> Is this a bug in python regex engine? Or maybe some incompatibility with Perl?
>
> On Fri, Sep 30, 2011 at 12:29 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> On Fri, Sep 30, 2011 at 7:26 PM, Ovidiu Deac <ovidiudeac at gmail.com> wrote:
>>> $ python --version
>>> Python 2.6.6
>>
>> Ah, I think I was misinterpreting the traceback. You do actually have
>> a useful message there; it's the same error that my Py3.2 produced:
>>
>> sre_constants.error: nothing to repeat
>>
>> I'm not sure what your regex is trying to do, but the problem seems to
>> be connected with the * at the end of the pattern.
>>
>> ChrisA
>> --

I believe, this is a limitation of the builtin re engine concerning
nested infinite quantifiers - (...*)*  - in your pattern.
You can try a more powerful recent regex implementation, which appears
to handle it:

http://pypi.python.org/pypi/regex

using the VERBOSE flag - re.X all (unescaped) whitespace outside of
character classes is ignored,
http://docs.python.org/library/re.html#re.VERBOSE
the pattern should be equivalent to:
r"^(?:[^y]*)*"
ie. you are not actually gaining anything with double quantifier, as
there isn't anything "real" in the pattern outside [^y]*

It appears, that you have oversimplified the pattern (if it had worked
in the original app),
however, you may simply try with
import regex as re
and see, if it helps.

Cf:
>>>
>>> regex.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X)
['a bcd e']
>>> re.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "re.pyc", line 177, in findall
  File "re.pyc", line 244, in _compile
error: nothing to repeat
>>>
>>> re.findall(r"^(?:[^y]*)*", "a bcd e")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "re.pyc", line 177, in findall
  File "re.pyc", line 244, in _compile
error: nothing to repeat
>>> regex.findall(r"^(?:[^y]*)*", "a bcd e")
['a bcd e']
>>> regex.findall(r"^[^y]*", "a bcd e")
['a bcd e']
>>>


hth,
  vbr



More information about the Python-list mailing list