looking for tips on how to implement "ruby-style" Domain Specific Language in Python

Fri Jan 9 02:54:56 EST 2009

O.K. Mark. Since you seem to accept the basic requirement to build an
*external* DSL I can provide some help. I'm the author of EasyExtend
( EE ) which is a system to build external DSLs for Python.

http://www.fiber-space.de/EasyExtend/doc/EE.html

EE is very much work in progress and in the last year I was more
engaged with increasing power than enhance accessibility for
beginners. So be warned.

A DSL in EE is called a *langlet*. Download the EE package and import
it in a Python shell. A langlet can then be built this way:

>>> import EasyExtend
>>> EasyExtend.new_langlet("my_langlet", prompt = "myl> ", source_ext = ".dsl")

This creates a bunch of files in a directory

<site-packages-path>/EasyExtend/langlets/my_langlet

Among them is run_my_langet.py and langlet.py. You can cd to the
directory and apply

   $python run_my_langlet.py

which opens a console with prompt 'myl>'. Each langlet is immediatly
interactive. A user can also run a langlet specific module like

   $python run_my_langlet.py mod.dsl

with the suffix .dsl defined in the langlet builder function. Each
module xxx.dsl can be imported from other modules of the my_langlet
langlet. EE provides a generic import hook for user defined suffixes.

In order to do anything meaningful one has to implement langlet
transformations in the langlet.py module. The main transformations are
defined in a class called LangletTransformer. It defines a set of
visitor methods that are marked by a decorator called @trans. Each
@trans method is named like a terminal/non-terminal in a grammar file
and responds to a terminal or non-terminal node of the parse tree
which is traversed. The structure of the parse tree is the same as
those you'd get from Pythons builtin parser. It is entirely determined
by 4 files:

- Grammar which is precisely the Python grammar found in the Python
source distribution.
- Grammar.ext which defines new non-terminals and overwrites old
ones.
- Token which defines Pythons token.
- Token.ext which is the analog of Grammar.ext for token definitions.

The Grammar.ext file is in the directory my_langlet/parsedef. There is
also an analog lexdef directory for Token.ext.

A possible Grammar.ext extension of the Python grammar that overwrites
two non-terminals of looks like this:

Grammar.ext
-----------

trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME | NAME |
NUMBER | STRING

atom: ('(' [yield_expr|testlist_gexp] ')' |
       '[' [listmaker] ']' |
       '{' [dictmaker] '}' |
       '`' testlist1 '`' |
       NAME | NUMBER | STRING)

-----------

Once this has been defined you can start a new my_langlet session and
type

myl> navigate_to 'www.openstreetmap.org' website
Traceback (most recent call last):
  File "C:\lang\Python25\lib\site-packages\EasyExtend\eeconsole.py",
line 270, in compile_cst
    _code = compile(src,"<input>","single", COMPILER_FLAGS)
  File "<input>", line 1
    navigate_to 'www.openstreetmap.org' website
                                      ^
SyntaxError: invalid syntax
myl>

It will raise a SyntaxError but notice that this error stems from the
*compiler*, not the parser. The parser perfectly accepts the modified
non-terminals and produces a parse tree. This parse tree has to be
transformed into a valid Python parse tree that can be accepted by
Pythons bytecode compiler.

I'm not going into detail here but recommend to read the tutorial

http://www.fiber-space.de/EasyExtend/doc/tutorial/EETutorial.html

that walks through a complete example that defines a few terminals,
non-terminals and the transformations accordingly. It also shows how
to use command line options to display parse tree properties, unparse
parse trees back into source code ( you can eliminate DSL code from
the code base entirely and replace it by equivalent Python code ), do
some validation on transformed parse trees etc.