sre_dump.py

Andrew Dalke adalke@mindspring.com
Thu, 7 Aug 2003 14:13:05 -0600


sre_dump.py -- http://www.dalkescientific.com/Python/
  Converts an sre_parse parse tree back into a regular expression,
  and includes information about where subexpressions can be found
  in that string.

I have a certain fondness for writing regular expression based parsers.
/F's sre_parse module (an internal part of the standard library) has
been very fun to use in these projects.  It turns a regular expression
into a simple parse tree, which I use to generate my own parsers.

When debugging, I want to know which part of the regular
expression contributed to the part of the parser with the bug.
The sre_dump module helps by using the parse tree to recreate
the original pattern and tracking where the branches are in
the string.  The result is that I can do

>>> s, offsets = sre_dump.dump_and_offsets("AB|CD")
>>> def show_offsets(s, offsets):
...     print s
...     for expr, i, j, text in offsets:
...        print " "*i + "-"*(j-i) + " " *(len(s)-j+1), s[i:j]
...
>>> show_offsets(s, offsets)
AB|CD
-      A
 -     B
   -   C
    -  D
-----  AB|CD
>>>

                    Andrew
                    dalke@dalkescientific.com