[Python-bugs-list] [ python-Bugs-786970 ] re doesn't like (^$)*

SourceForge.net noreply at sourceforge.net
Mon Aug 11 14:21:43 EDT 2003


Bugs item #786970, was opened at 2003-08-11 14:21
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=786970&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew Dalke (dalke)
Assigned to: Fredrik Lundh (effbot)
Summary: re doesn't like (^$)*

Initial Comment:
Nor, for that matter, does it like "(^)*"

% python
Python 2.3 (#1, Aug  3 2003, 02:47:49) 
[GCC 3.1 20020420 (prerelease)] on darwin
>>> import re
>>> re.compile("(^$)*").match("")
Segmentation fault
%

It's trying real hard to match 0 characters an infinite
number of time.  :)

The segfault is caused in part by the low stacksize limit
on my OS X machine,

% limit stacksize
stacksize       512 kbytes
% limit stacksize 2000kbytes
% limit stacksize
stacksize       2000 kbytes
% python
Python 2.3 (#1, Aug  3 2003, 02:47:49) 
[GCC 3.1 20020420 (prerelease)] on darwin
Type "help", "copyright", "credits" or "license" for more 
information.
>>> import re
>>> re.compile("(^$)*").match("")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
RuntimeError: maximum recursion limit exceeded
>>> 

which suggests that the stack recursion limit
test for the re library is not the same as the one
used for the rest of Python.  (def f(): f() gives
me the expected recursion limit, and not a
segfault)

Seems like the bug could be in several places:
  - the compiler doesn't handle infinite loops of
      zero-character tests well (it could convert
      them to a finite-loop test)
  - the re matcher doesn't check that it's been
      in the same place several times without
      advancing any character positions
  - the re matcher doesn't use the same stack
     check used elsewhere in Python
  - the Mac stacksize default is too low for
     Python's 

BTW, checking pcre ...

>>> import pre
/usr/local/lib/python2.3/pre.py:94: DeprecationWarning: 
Please use the 're' module, not the 'pre' module
  DeprecationWarning)
>>> pre.compile("(^$)*").match("")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.3/pre.py", line 251, in compile
    code=pcre_compile(pattern, flags, groupindex)
pcre.error: ('operand of unlimited repeat could match the 
empty string', 4)
>>> 

which is true, but the pattern I used should (IMHO)
be allowed to match the empty string.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=786970&group_id=5470



More information about the Python-bugs-list mailing list