[Python-bugs-list] [Bug #114660] sre hangs with "collapse whitespace" RE

noreply@sourceforge.net noreply@sourceforge.net
Sun, 17 Sep 2000 18:18:51 -0700


Bug #114660, was updated on 2000-Sep-17 17:38
Here is a current snapshot of the bug.

Project: Python
Category: Modules
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Summary: sre hangs with "collapse whitespace" RE

Details: In Python 1.5.2, 1.6, and 2.0b1 using "pre", the following regex substitution works fine:

>>> s = "hello      there     how are you?"
>>> re.sub (r'(\S)\s+(\S)', r'\1 \2', s)
'hello there how are you?'

But in 2.0b1 using the standard "re" module, Python goes into an apparently infinite loop.  (CPU is pegged, memory doesn't increase at all.)

Follow-Ups:

Date: 2000-Sep-17 18:18
By: tim_one

Comment:
It's in an infinite loop trying to parse the \1 in the replacement pattern (parse_sre.py, function parse_template).  It recognizes that "1" is a valid group name, but apparently doesn't recognize that "1 " (i.e., the digit one followed by a blank) is not a valid group name), in turn perhaps because int("1 ") returns 1.

I can fix it by adding the clause

s.next not in DIGITS or

after the existing

if (not s.next or

on line 637 and then this gets the result you expect, but I'll defer to /F on whether that's the right way to fix this.

I hope you (Greg) aren't actually using this regexp to normalize whitespace!  split followed by join is the usual way to do that, and this regexp-based way looks wrong (e.g., if somewhere in the middle there's a single letter followed by mounds of whitespace on each side:  the space following the letter won't get collapsed).
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=114660&group_id=5470