Python syntax in Lisp and Scheme

Sun Oct 12 14:26:19 EDT 2003

"Andrew Dalke" <adalke at mindspring.com> writes:

> Peter Seibel:
> >So, to write a new test function, here's what I
> > write:
> >
> >   (deftest foo-tests ()
> >     (check
> >      (= (foo 1 2 3) 42)
> >      (= (foo 4 5 6) 99)))
> 
> Python bases its unit tests on introspection.  Including the
> full scaffolding, the equivalent for Python would be
> 
>         import unittest
>         import foo_module  # I'm assuming 'foo' is in some other module
> 
>         class FooTestCase(unittest.TestCase):
>             def testFoo(self):
>                 self.assertEquals(foo_module.foo(1, 2, 3), 42)
>                 self.assertEquals(foo_module.foo(4, 5, 6), 99)
> 
>         if __name__ == '__main__':
>             unittest.main()
> 

Yup. I'd certainly be loath to claim that there's anything that *only*
be done using macros. And it *is* the case that dynamic and
introspective features make lots of things that might otherwise be
done with macros less painful.

So here's another example of using macros to define a domain specific
language. In this case the domain is lexing and parsing. In this case
I wrote a parser generator (like ANTLR in Java on YACC in C) by
defining the macros DEFPROD, DEFCHARTYPE, and DEFLEXE macros that
allow me to define the gramatical rules for a lexer in an s-expression
notation and then stitch them together into a procedure that
implements a tokenizer based on the grammar rules. The notation may
look a bit funny if you're not used to Lisp syntax but you can
probably observe that what I have hear is pretty much a translation
from the BNF in the Java language standard to my own s-exp notation.
(I'll give just a sample of the production rules, the rest are
similar. The code required to implement the lexer proper is 91 lines
of actual code, not counting a preprocessor function that translates
Java's special unicode escape syntax.)

In other languages parser generators usually have to define their own
grammar language and write a program that parsers *that* format in
order to generate the parsing code. Macros let me do the same thing
with much less work because Lisp does parsing for me--what follows
*is* a Lisp program given that I've defined the macros it uses.

As with DEFTEST, the point is not that this is the only way to do it
but rather that macros give me a way to concisely and directly express
the stuff I *care* about (i.e. what is the grammar I'm trying to
parse) while abstracting away the mechanims by which it gets
translated into efficient code that actually does the work.

(A sample of the productions from the code that generates a Java lexer.)

  ;; 3.4 Line terminators

  (defprod line-terminator () (/ #\newline (#\return (? #\newline))))

  (defchartype input-character
    '(and character (not (member #\newline #\return))))

  ;; 3.5 Input Elements and Tokens

  (defprod input () ((* input-element) (? #\Sub)))

  (defprod input-element () (/ white-space comment token))

  (defprod token () (/ identifier java-keyword literal separator operator))

  ;; 3.6 White space

  (defprod white-space () (/ #\space #\tab #\page line-terminator))

  ;; 3.9 Keywords
  (defprod java-keyword ()
    (/
     "abstract" "boolean" "break" "byte" "case" "catch" "char" "class" "const"
     "continue" "default" ("do" (? "uble")) "else" "extends" ("final" (? "ly"))
     "float" "for" "goto" "if" "implements" "import" "instanceof"
     ("int" (? "erface")) "long" "native" "new" "package" "private" "protected"
     "public" "return" "short" "static" "strictfp" "super" "switch"
     "synchronized" "this" ("throw" (? "s")) "transient" "try" "void"
     "volatile" "while"))

  ;; 3.10.2 Floating-Point Literals

  (defprod floating-point-literal ()
    (/
     ((+ digit)
      (/
       (#\. (* digit) (? exponent-part) (? float-type-suffix))
       (exponent-part (? float-type-suffix))
       float-type-suffix))
     (#\. (+ digit) (? exponent-part) (? float-type-suffix))))

  (defprod exponent-part () (exponent-indicator signed-integer))
  (defchartype exponent-indicator '(member #\e #\E))
  (defprod signed-integer () ((? sign) (+ digit)))
  (defchartype sign '(member #\+ #\-))
  (defchartype float-type-suffix '(member #\f #\F #\d #\D))

  ;; 3.12 Operators

  (defprod operator ()
    (/
     ":"
     "?"
     "~"
     ("!" (? "="))
     ("%" (? "="))
     ("&" (? (/ "=" "&")))
     ("*" (? "="))
     ("+" (? (/ "=" "+")))
     ("-" (? (/ "=" "-")))
     ("/" (? "="))
     ("<" (? "<") (? "="))
     ("=" (? "="))
     (">" (? ">" (? ">")) (? "="))
     ("^" (? "="))
     ("|" (? (/ "=" "|")))))

  ;; This macro actually expands into a fuction java-lexer that takes
  ;; a string of text and tokenizes it according to the rules defined
  ;; above, starting from the "input" rule and returning as tokens
  ;; objects that represent identifiers, java-keywords, literals,
  ;; separators, and operators. Comments and whitespace are, thus,
  ;; discarded.

  (deflexer java-lexer 
    ((:start-rule input)
     (:tokens identifier java-keyword literal separator operator)))

> The Python code is more verbose in that regard because == isn't a
> way to write a function. I assume you also have tests for things
> like "should throw exception of type X" and "should not throw
> expection" and "floating point within epsilon of expected value"?

Sure, you can put any boolean expression in a check and have it
treated as a test case. The macro code can figure out whether it's a
function call and if it is arranges to evaluate the arguments itself
so it can see what their values are and then passes the values to the
function. Because all the comparators such as =, EQL (object
equality), <, >, string<, etc. are functions, this works great.

[snip]

> I expect the usefulness of showing the full expression to be smaller
> when the expression is large, because it could be an intermediate in
> the expression which has the problem, and you don't display those
> intermediates.

Yeah. Though that's just because I didn't get around to implementing
that. As long as the expression consists of a tree of function calls,
I could do the same thing to sub-expressions I do to the top-level
expressions in the CHECK, and display them as well. If I really wanted
I could write a GUI to browse through the whole tree of function
calls. But I decided that the 80/20 rule applies and if I can't figure
out from the top-level expressions what's going on then I can always
resort to normal debugging. (Also the test framework can dump you into
the debugger at the point of a test failure so you can poke around
with the debugger to see any values you care about.)

> > So what is the equivalent non-macro code? Well the equivalent code
> > to the DEFTEST form (i.e. the macro expansion) is not *that* much
> > more complex--it just has to do the stuff I mentioned; binding the
> > test name variable and registering the test function. But it's
> > complex enough that I sure wouldn't want to have to type it over
> > and over again each time I write a test:
> 
> Python's introspection approach works by looking for classes of a
> given type (yes, classes, not instances), then looking for methods
> in that class which have a given prefix.  These methods become the
> test cases.  I imagine Lisp could work the same way, except that
> because other solutions exist (like macros), there's a prefered
> reason to choose another style.

The other difference is that runtime introspection happens at runtime.
That's probably fine for a test framework. I've written several Java
test frameworks that use reflection in similar ways so I know about
that approach too. But for other tasks, such as my parsing example
above, even if they could be done using introspection, there's a big
advantage for doing a lot of the work once, at compile time.

[snip]

> A solution which would get what you want without macros is the
> addition of more parse tree information, like the start/end
> positions of each expression. In that way the function could look up
> the stack, find the context from which it was called, then get the
> full text of the call. This gets at the code "from the other
> direction", that is, from looking at the code after it was parsed
> rather than before.

Heh. That's just a sort of, pardon my French, kludgy reimplementation
of macros. Macros are nothing more (or less) than a mechanism whereby
you can--at compile time--get a hold of the parse tree in a form that
your code can manipulate it to generate other code. (Your's is still
at runtime which makes it somewhat less useful, though it does work
for the test framework case.)

> I'll let you decide if Lisp's introspection abilities provide an
> alternate non-macro way to handle building test cases which is just
> as short.

Sure. I could use Lisp's meta object protocol (MOP) to define all
sorts of crazy things. But I'd probably still end up wrapping them in
some nice syntactic abstractions (macros) to make them as expressive
as possible. The thing is, I'm not looking for a "non-macro way" to do
these things because macros (in Lisp) are no more problematic than
functions--they're just another way of definining abstractions. Like
any mechanism for defining abstractions they need to be used with good
design sense because bad abstractions can be worse than no
abstractions at all--at least if things are concrete you can see how
they work, even if you do have to wade through pages of code to see
it. But good abstractions--whether functions, classes, or macros--make
it even easier to understand the code.

-Peter

P.S. Pythonistas--I was originally following this thread in c.l.lisp.
Somewhere along the lines the follow-ups got set to c.l.python which
is fine with me as the Lisp guys (should) already know this. But feel
free let me know if you want me to shut up about this.

-- 
Peter Seibel                                      peter at javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp