[ python-Feature Requests-1701452 ] Feature request: Comparing regexps
SourceForge.net
noreply at sourceforge.net
Wed Apr 18 16:33:24 CEST 2007
Feature Requests item #1701452, was opened at 2007-04-16 14:28
Message generated for change (Comment added) made by thomasda
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Thomas Dybdahl Ahle (thomasda)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: Feature request: Comparing regexps
Initial Comment:
It would be very nice with a function in the re module to compare two regular expressions and see how they overlap.
Return value would perhaps be an a constant like DISJOINT, SUBSET etc.
----------------------------------------------------------------------
>Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-18 16:33
Message:
Logged In: YES
user_id=1304417
Originator: YES
I talked with the guy who wrote the ZZ comparator.
"""I can give you the source code under the GPL if you like. However, I
think it would be difficult to port to python. It's 1100 lines of
very SML-style mostly-uncommented code. Do you know SML?
It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp.
I would expect credit for the algorithm. :-) Let me know if you
succeed in porting it."""
I don't know any SML though.
If anybody does or can write a python equaliant of the algorithm:
"""1. Parse the regular expression (in GNU regular expression syntax)
2. Convert that parse tree into an NFA
3. Convert the NFA into a DFA by an algorithm I invented (it's why I
wrote this program; I thought of the algorithm and found it amusing
to see it in action)
For comparing regular expressions I take the following additional
steps
4. Combine the two DFA's (with the cross product)
4a. For finding common words, I intersect them
4b. For finding words in A, but not B, I intersect A with the negated
DFA of B
4c. ...
5. Find the minimal DFA corresponding to this intersection
6. Run a weighted depth-first search (similar to Dijkstra's) to find
the least-weighted path from the initial state to an accepting state
(the weighting makes it prefer human readable characters in the
examples)"""
It could be cool.
Otherwise I can also try to learn sml and port it.
----------------------------------------------------------------------
Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-17 09:51
Message:
Logged In: YES
user_id=1304417
Originator: YES
I found this page with the function to do so:
http://terpstra.ca/compare.html
I also found this thread:
http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses
how to do this with some canonical formed expressions.
A function like this would really be able to speed some applications up,
which matches a lot of strings with a number of expressions. If you know
which ones are disjoint, you can break the test when one test matches.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-16 22:43
Message:
Logged In: YES
user_id=80475
Originator: NO
Can this be done in the existing implementation by comparing the racetrack
diagrams (character transitions)?
Thomas, do you know of anywhere this have been done (third-party modules
or in other languages)?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470
More information about the Python-bugs-list
mailing list