[Python-ideas] AST Transformation Hooks for Domain Specific Languages

Sun Apr 10 09:27:58 CEST 2011

On Sun, Apr 10, 2011 at 3:00 AM, Eugene Toder <eltoder at gmail.com> wrote:
>>Yep - it's the move to compile time metaprogramming that makes ideas
>>like this one and tools like Mython interesting.
> In python compile time and run time are less separated than in many
> other languages. I don't see a big difference between doing
> transformation at compile time proper vs. at import time (except for
> start-up time, if the transformation is taking seconds). Technically,
> decorator that parses docstring is as powerful as an AST
> transformation. The difference is mostly about end user friendliness
> and aesthetics -- whether you feel like you're using a feature or a
> cheap hack.

The difference is actually pretty huge. If it is done at compile time:

1. You consume the entire module at once, permitting non-local
effects. This is significant, as it allows you to reference variables
in outer scopes, converting them into cell references (e.g. see the
"all_nonlocal" example from my original post). Runtime is far too late
to do that, since the compiler has already finished the symbol table
analysis and code generation that handles nested scopes.

2. Compile time operations can have their results cached in the
generated PYC files. This cannot happen with runtime operations.

If this was handled as a runtime operation, you couldn't really do
anything that can't already be done with decorators and metaclasses.

People sometimes get confused about how Python's compilation and
execution model differs from that of a static language like C. To give
a quick rundown of the different major phases:

C build-and-execution model:
  - compile time (.c -> .o)
  - link time (multiple .o -> executable)
  - dynamic linking (loading additional modules at runtime)
  - runtime (actual code execution)

Only the last 2 when the program is executed

Python
  - compile time (.py -> bytecode)
  - definition time (only significant for functions and classes - time
when the "def" or "class" statement is executed)
  - runtime (actual code execution, guaranteed to be after definition
time for code inside a function body and during definition time for a
class body)

Since compilation is implicit, and there is no pre-linking step, all
of these steps happen when the program is executed (although the first
step can optionally be performed in advance).

It's the separation of compile and definition time that is the major
difference between a scripting language like Python and a more
traditional language like C. In a traditional language, the
"top-level" code is handled entirely by the compiler, and never
actually touched at runtime, so you can't do things like use loops or
conditional logic or exception handling to affect how your program is
defined (and if you can, the syntax will typically be completely
different from the "normal" syntax of the language). In a scripting
language, top-level code has access to all the same constructs as code
inside functions (and, typically, vice-versa - hence first class
functions and type definitions).

Currently Python lets you do lots of things at runtime (i.e. most
code) and at definition time (decorators, metaclasses, default
arguments, annotations). There are, however, no compile time hooks
other than creating your own import hook as Mython does, and
completely taking over the compilation process.

>> So a simple "python -mdsl.sql myfile.py" would run a file that uses
>> the DSL, while "python -i -mdsl.sql" would get you an interactive
>> interpreter that understood that DSL dialect.
> Ok, it's easy to do, but to me the fact that I have to even think
> about it makes it less convenient.

Yes, but it's the only way to make this work as a compile-time
operation (since compilation is completed before module execution
starts). If it's runtime only, then there's no point in doing it at
all. Decorators and metaclasses have that space well and truly
covered.

>> Reusing Python syntax would still be easy - you'd simply invoke
>> ast.parse() on the stringified body.
> Right, otherwise I would knew that I *don't* like the idea. At the
> moment I don't know if I like it :)
> Using Python parser first leads to a slippery slope: Python syntax is
> only tuned for one language -- Python. If we start using it as a base
> for DSLs more widely, we'll want some generic extensions, which would
> give syntax error in python, but would produce AST nodes for DSLs to
> transform. E.g. space delimited expression list ('select foo') or
> custom suite ('join:\nleft\nright\ncond').
> Stringifying function body avoids this problem (though one would have
> to write a full parser even if we wants a very little tweak to python
> syntax), but it allows completely non-python syntax mixed with python.
> I don't know if it's a big problem, though.

Python-AST is a reasonable place to start though, since non-Python
syntax can easily be written inside a docstring. It also creates a
subtle social pressure in favour of staying within the spirit of
Python syntax and semantics, and clearly demarcating (via
triple-quoted strings) when you're straying away from that.

An SQL DSL, for example, would most likely go the route of
triple-quoting the entire SQL statement, but could also do something
more novel like using assignments to define SQL clauses:

    select = address
    tables = people
    where = name == {name} and dob == {dob}

Such is the power and danger of compile-time metaprogramming :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia