[Python-ideas] [Python-Dev] AST Transformation Hooks for Domain Specific Languages

Nick Coghlan ncoghlan at gmail.com
Sat Apr 9 02:19:25 CEST 2011


(Fixed list to be python-ideas)

On Sat, Apr 9, 2011 at 2:50 AM, David Malcolm <dmalcolm at redhat.com> wrote:
> On Fri, 2011-04-08 at 21:29 +1000, Nick Coghlan wrote:
>> AST Transformation Hooks for Domain Specific Languages
>> ======================================================
>
> This reminds me a lot of Mython:
>  http://mython.org/
> If you haven't seen it, it's well worth a look.

Ah, very interesting - indeed, compile-time metaprogramming is
definitely what this idea is about (with all of the tremendous power
and potentially major liabilities for readability and debugging that
entails).

> My favourite use case for this kind of thing is having the ability to
> embed shell pipelines into Python code, by transforming bash-style
> syntax into subprocess calls (it's almost possible to do all this in
> regular Python by overloading the | and > operators, but not quite).
>
>> Consider:
>>
>> # In some other module
>> ast.register_dsl("dsl.sql", dsl.sql.TransformAST)
>
> Where is this registered?   Do you have to import this "other module"
> before importing the module using "dsl.sql" ?   It sounds like this is
> global state for the interpreter.

Yep - the registration would be with the compiler itself. Technically
a "flat" namespace, but allowing dots in the names means it can mirror
the general Python module namespace.

An alternative would be to use the module namespace rather than a
dedicated one, in which case registration might not be necessary.
However, requiring an explicit registration step to support DSLs
doesn't really bother me, even it means they can't be used as
standalone scripts, but instead must be executed via a module that
registers the DSL. With runpy.run_module and runpy.run_path available,
supporting execution of other scripts isn't a great burden.

>> # In a module using that DSL
>
> How is this usage expressed?  via the following line?
>
>> import dsl.sql
>
> I see the "import dsl.sql" here, but surely you have to somehow process
> the "import" in order to handle the rest of the parsing.

No, it's expressed by the "from dsl.sql" at the end of the line:

    def lookup_address(name : dsl.sql.char, dob : dsl.sql.date) from dsl.sql:

The import is included in the example, simply because that was a
pattern I used in my sample expansion.

As Jon pointed out in his reply, it's basically a very similar idea to
Mython's "quote" block, but leveraging off the existing Function node
to pick up other fun tricks like arguments, decorators and annotations
rather than defining a completely separate statement.

> Where and how would the bytes of the file usage the DSL get converted to
> an in-memory tree representation?
>
> IIRC, manipulating AST nodes in CPython requires some care: the parser
> has its own allocator (PyArena), and the entities it allocates have a
> shared lifetime that ends when PyArena_Free occurs.

They survive if you use PyCF_ONLY_AST (or otherwise get hold of the
AST from Python). The arenas are just Python lists and returning the
AST for use by Python code bumps the reference count of the head node.

>> So there you are, that's the crazy idea. The stoning of the heretic
>> may now commence :)
>
> Or, less violently, take it to python-ideas?  (though I'm not subscribed
> there, fwiw, make of that what you will)

Yeah, sorry about that - the posting to python-dev was a mistake due
to not properly checking the address auto-complete in Gmail.

> One "exciting" aspect of this is that if someone changes the DSL file,
> the meaning of all of your code changes from under you.  This may or may
> not be a sane approach to software development :)
> (I also worry what this means e.g. for people writing text editors,
> syntax highlighters, etc; insert usual Alan Perlis quote about syntactic
> sugar causing cancer of the semicolon)

Yeah, there's a way to experiment with this that's much friendlier to
those systems: require that the body of the statement *also* be
standard Python code. Support for non-Python syntax would then be
handled by explicitly embedding it in a string (potentially even the
docstring).

The advantage this would have over existing string parsing techniques
is that the contents would be checked at compile time rather than
runtime, just as Mython does.

> Also, insert usual comments about the need to think about how
> non-CPython implementations of Python would go about implementing such
> ideas.

This is also AST-time, so the back end shouldn't make a big difference.

>> Where this idea came from was the various discussions about "make
>> statement" style constructs and a conversation I had with Eric Snow at
>> Pycon about function definition time really being *too late* to do
>> anything particularly interesting that couldn't already be handled
>> better in other ways. Some tricks Dave Malcolm had done to support
>> Python level manipulation of the AST during compilation also played a
>> big part, as did Eugene Toder's efforts to add an AST optimisation
>> step to the compilation process.
>
> Like I said earlier, have a look at Mython

Indeed, in a lot of ways, this idea is really just an alternative
syntax proposal for Mython "quote" statements, along with the concept
of integrating it into the main compiler.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list