Using PLY

Michael Sparks zathras at thwackety.com
Sun Sep 19 21:34:38 EDT 2004


Maurice LING wrote:
...
> Another thing that I am quite puzzled by is the yacc part of PLY. Most
> of the examples are showing calculators and the yacc part does the
> calculations such as,
> 
>     def p_expression_group(self, p):
>          'expression : LPAREN expression RPAREN'
>          p[0] = p[2]
> 
> this is a bad example, I know. 

Simple examples of lex/yacc type things tend to have this though.

> But how do I get it to output some 
> intermediate representations, like AST, or an intermediate code
> (byte-code type).
> 
> Is
> 
>     def p_expression_group(self, p):
>          'expression : LPAREN expression RPAREN'
>          p[0] = p[2]
>          print "byte_x" + p[0] 
> 
> or something like this legal?

It's legal, but probably not what you want.

Normally you have Lex --(token) --> Parse --(AST)--> Something Interesting.

If Something Interesting is simple, you can do that instead at the AST stage
which is what the examples do.

If you wanted to modify the example/calc/calc.py in the PLY distribution to
return an AST to play with you would change it's rules to store the parsed
structure rather than do the work. Taking the route of minimal change to
try and make it obvious what I've changed:

def p_statement_assign(p):
    'statement : NAME EQUALS expression'
    p[0] = [ "assignment", p[1], p[3] ] # names[p[1]] = p[3]

def p_statement_expr(p):
    'statement : expression'
    p[0] = [ expr_statement", p[1] ] # print p[1]

def p_expression_binop(p):
    '''expression : expression PLUS expression
                  | expression MINUS expression
                  | expression TIMES expression
                  | expression DIVIDE expression'''
    p[0] = ["binop_expr", p[2], p[1], p[3] ] # long if/elif evaluation

def p_expression_uminus(p):
    'expression : MINUS expression %prec UMINUS'
    p[0] = ["uminus_expr", p[2]]   #   p[0] = -p[2]

def p_expression_group(p):
    'expression : LPAREN expression RPAREN'
    p[0] = ["expression", p[2] ]   # p[0] = p[2]

def p_expression_number(p):
    'expression : NUMBER'
    p[0] = ["number", p[1]]        # p[0] = p[1]

def p_expression_name(p):
    'expression : NAME'
    p[0] = ["name", p[1] ]         # p[0] = names[p[1]], with error handling

A sample AST this could generate would be:

[ "assignment", 
   ["name", "BOB" ], 
   ["expression", 
      ["binop_expr", 
         "*", 
         ["number", 7], 
         ["number", 9] 
      ]
   ] 
]

In example/calc/calc.py this value would be returned here:

while 1:
    try:
        s = raw_input('calc > ')
    except EOFError:
        break
    AST = yacc.parse(s) #### <- ------ HERE!

(NB, slight change to the line ####)

This is a very boring, not very interesting, not that great AST,but should
hopefully get you started. You should be able to see that by traversing
this tree you could get the same result as the original code, or could spit
out code that performs this functionality. Often its nice to have some
simplification of the tree as well since this sort of thing can be rather
unwieldy for realistic languages.

It's also worth noting that the calc.py example is also very toy in that it
matches single lines using the parser rather than collections of lines. (ie
the parser has no conception of a piece of code containing more than one
statement)

> I'm trying to parse what looks like a 4GL source code.

FWIW, start small - start with matching the simplest expressions you can and
work forward from there (unless you're lucky enough to have a LALR(1) or
SLR(1) grammar for it suitable for PLY already). Test first style coding
for grammars feels intuitively wrong, but seems to work really well in
practice - just make sure that after making every test work check in the
result to CVS/your favourite version control system :-)

One other tip you might find useful - rather than sending the lexer whole
files as PLY seems to expect, do line handling yourself and send it lines
instead - it works much more like Flex/lex that way.

Regards,


Michael.




More information about the Python-list mailing list