[ python-Feature Requests-1701452 ] Feature request: Comparing regexps

Wed Apr 18 16:33:24 CEST 2007

Feature Requests item #1701452, was opened at 2007-04-16 14:28
Message generated for change (Comment added) made by thomasda
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Thomas Dybdahl Ahle (thomasda)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: Feature request: Comparing regexps

Initial Comment:
It would be very nice with a function in the re module to compare two regular expressions and see how they overlap.
Return value would perhaps be an a constant like DISJOINT, SUBSET etc.

----------------------------------------------------------------------

>Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-18 16:33

Message:
Logged In: YES 
user_id=1304417
Originator: YES

I talked with the guy who wrote the ZZ comparator.

"""I can give you the source code under the GPL if you like. However, I  
think it would be difficult to port to python. It's 1100 lines of  
very SML-style mostly-uncommented code. Do you know SML?

It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp.  
I would expect credit for the algorithm. :-) Let me know if you  
succeed in porting it."""

I don't know any SML though.
If anybody does or can write a python equaliant of the algorithm:

"""1. Parse the regular expression (in GNU regular expression syntax)
2. Convert that parse tree into an NFA
3. Convert the NFA into a DFA by an algorithm I invented (it's why I
wrote this program; I thought of the algorithm and found it amusing
to see it in action)

For comparing regular expressions I take the following additional  
steps
4. Combine the two DFA's (with the cross product)
4a. For finding common words, I intersect them
4b. For finding words in A, but not B, I intersect A with the negated
DFA of B
4c. ...
5. Find the minimal DFA corresponding to this intersection
6. Run a weighted depth-first search (similar to Dijkstra's) to find
the least-weighted path from the initial state to an accepting state
(the weighting makes it prefer human readable characters in the
examples)"""

It could be cool.
Otherwise I can also try to learn sml and port it.

----------------------------------------------------------------------

Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-17 09:51

Message:
Logged In: YES 
user_id=1304417
Originator: YES

I found this page with the function to do so:
http://terpstra.ca/compare.html

I also found this thread:
http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses
how to do this with some canonical formed expressions. 

A function like this would really be able to speed some applications up,
which matches a lot of strings with a number of expressions. If you know
which ones are disjoint, you can break the test when one test matches.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-16 22:43

Message:
Logged In: YES 
user_id=80475
Originator: NO

Can this be done in the existing implementation by comparing the racetrack
diagrams (character transitions)?

Thomas, do you know of anywhere this have been done (third-party modules
or in other languages)?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470