using regular express to analyze lisp code
Tim Chase
python.list at tim.thechases.com
Thu Oct 4 13:50:01 EDT 2007
> i've spent couple of hours trying to figure out the correct regular
> expression to catch a VisualLisp
[snipped]
> "(defun foo", but it is hard to find the ")" at the end of code block.
> if eventually i can't come up with the solution using regular
> expression only, what i was thinking is after finding the beginning
> part, which is "(defun foo" in this case, i can count the parenthesis,
> ignoring anything inside "" and any line for comment, until i find the
> closing ")".
"""
Some people, when confronted with a problem, think
"I know, I'll use regular expressions!"
Now they have two problems
"""
Regular expressions are a wonderful tool when the domain is
correct. However, when your domain involves processing
arbitrarily nested syntax, regexps are not your friend. It is
sometimes feasible to mung them into a fixed-depth-nesting
parser, but it's always fairly painful, and the fixed-depth is an
annoying limitation.
Use a parsing lib. I've tinkered a bit with PyParsing[1] which
is fairly easy to pick up, but powerful enough that you're not
banging your head against limitations. There are a number of
other parsing libraries[2] with various domain-specific features
and audiences, but I'd go browsing through them only if PyParsing
doesn't fill the bill.
As you don't detail what you want to do with the content or how
pathological the input can be, but you might be able to get away
with just skimming through the input and counting open-parens and
close-parens, stopping when they've been balanced, skipping lines
with comments.
-tkc
[1] http://pyparsing.wikispaces.com/
[2] http://nedbatchelder.com/text/python-parsers.html
More information about the Python-list
mailing list