using regular express to analyze lisp code

Thu Oct 4 13:50:01 EDT 2007

> i've spent couple of hours trying to figure out the correct regular
> expression to catch a VisualLisp 
[snipped]
> "(defun foo", but it is hard to find the ")" at the end of code block.
> if eventually i can't come up with the solution using regular
> expression only, what i was thinking is after finding the beginning
> part, which is "(defun foo" in this case, i can count the parenthesis,
> ignoring anything inside "" and any line for comment, until i find the
> closing ")".

"""
	Some people, when confronted with a problem, think
	"I know, I'll use regular expressions!"
	Now they have two problems
"""

Regular expressions are a wonderful tool when the domain is 
correct.  However, when your domain involves processing 
arbitrarily nested syntax, regexps are not your friend.  It is 
sometimes feasible to mung them into a fixed-depth-nesting 
parser, but it's always fairly painful, and the fixed-depth is an 
annoying limitation.

Use a parsing lib.  I've tinkered a bit with PyParsing[1] which 
is fairly easy to pick up, but powerful enough that you're not 
banging your head against limitations.  There are a number of 
other parsing libraries[2] with various domain-specific features 
and audiences, but I'd go browsing through them only if PyParsing 
doesn't fill the bill.

As you don't detail what you want to do with the content or how 
pathological the input can be, but you might be able to get away 
with just skimming through the input and counting open-parens and 
close-parens, stopping when they've been balanced, skipping lines 
with comments.

-tkc

[1] http://pyparsing.wikispaces.com/
[2] http://nedbatchelder.com/text/python-parsers.html