[ python-Bugs-1662581 ] the re module can perform poorly: O(2**n) versus O(n**2)

SourceForge.net noreply at sourceforge.net
Thu Feb 22 09:51:06 CET 2007


Bugs item #1662581, was opened at 2007-02-17 15:39
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1662581&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Performance
Group: None
Status: Open
Resolution: None
Priority: 4
Private: No
Submitted By: Gregory P. Smith (greg)
Assigned to: Nobody/Anonymous (nobody)
Summary: the re module can perform poorly: O(2**n) versus O(n**2)

Initial Comment:
in short, the re module can degenerate to really really horrid performance.  See this for how and why:

 http://swtch.com/~rsc/regexp/regexp1.html

exponential decline instead of squared.

I don't have a patch so i'm filing this bug as a starting point for future work.  The Modules/_sre.c files implementation could be updated to use the parallel stepping Thompson approach instead of recursive backtracking.

filing this as a bug until me or someone else comes up with a patch.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-02-22 00:51

Message:
Logged In: YES 
user_id=341410
Originator: NO

I would file this under "feature request"; the current situation isn't so
much buggy, as slow.  While you can produce a segfault with the current
regular expression engine (due to stack overflow), you can do the same
thing with regular Python on Linux (with sys.setrecursionlimit), ctypes,
etc., and none of those are considered as buggy.

My only concern with such a change is that it may or may not change the
semantics of the repeat operators '*' and '+', which are currently defined
as "greedy".  If I skimmed the article correctly late at night, switching
to a Thompson family regular expression engine may result in those
operators no longer being greedy.  Please correct me if I am wrong.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1662581&group_id=5470


More information about the Python-bugs-list mailing list