Using PLY

Maurice LING mauriceling at acm.org
Mon Sep 20 00:27:14 EDT 2004


Michael Sparks wrote:

> Maurice LING wrote:
> ...
> 
>>Another thing that I am quite puzzled by is the yacc part of PLY. Most
>>of the examples are showing calculators and the yacc part does the
>>calculations such as,
>>
>>    def p_expression_group(self, p):
>>         'expression : LPAREN expression RPAREN'
>>         p[0] = p[2]
>>
>>this is a bad example, I know. 
> 
> 
> Simple examples of lex/yacc type things tend to have this though.
> 
> 
>>But how do I get it to output some 
>>intermediate representations, like AST, or an intermediate code
>>(byte-code type).
>>
>>Is
>>
>>    def p_expression_group(self, p):
>>         'expression : LPAREN expression RPAREN'
>>         p[0] = p[2]
>>         print "byte_x" + p[0] 
>>
>>or something like this legal?
> 
> 
> It's legal, but probably not what you want.
> 
> Normally you have Lex --(token) --> Parse --(AST)--> Something Interesting.
> 
> If Something Interesting is simple, you can do that instead at the AST stage
> which is what the examples do.
> 
> If you wanted to modify the example/calc/calc.py in the PLY distribution to
> return an AST to play with you would change it's rules to store the parsed
> structure rather than do the work. Taking the route of minimal change to
> try and make it obvious what I've changed:
> 
> def p_statement_assign(p):
>     'statement : NAME EQUALS expression'
>     p[0] = [ "assignment", p[1], p[3] ] # names[p[1]] = p[3]
> 
> def p_statement_expr(p):
>     'statement : expression'
>     p[0] = [ expr_statement", p[1] ] # print p[1]
> 
> def p_expression_binop(p):
>     '''expression : expression PLUS expression
>                   | expression MINUS expression
>                   | expression TIMES expression
>                   | expression DIVIDE expression'''
>     p[0] = ["binop_expr", p[2], p[1], p[3] ] # long if/elif evaluation
> 
> def p_expression_uminus(p):
>     'expression : MINUS expression %prec UMINUS'
>     p[0] = ["uminus_expr", p[2]]   #   p[0] = -p[2]
> 
> def p_expression_group(p):
>     'expression : LPAREN expression RPAREN'
>     p[0] = ["expression", p[2] ]   # p[0] = p[2]
> 
> def p_expression_number(p):
>     'expression : NUMBER'
>     p[0] = ["number", p[1]]        # p[0] = p[1]
> 
> def p_expression_name(p):
>     'expression : NAME'
>     p[0] = ["name", p[1] ]         # p[0] = names[p[1]], with error handling
> 
> A sample AST this could generate would be:
> 
> [ "assignment", 
>    ["name", "BOB" ], 
>    ["expression", 
>       ["binop_expr", 
>          "*", 
>          ["number", 7], 
>          ["number", 9] 
>       ]
>    ] 
> ]
> 
> In example/calc/calc.py this value would be returned here:
> 
> while 1:
>     try:
>         s = raw_input('calc > ')
>     except EOFError:
>         break
>     AST = yacc.parse(s) #### <- ------ HERE!
> 
> (NB, slight change to the line ####)
> 
> This is a very boring, not very interesting, not that great AST,but should
> hopefully get you started. You should be able to see that by traversing
> this tree you could get the same result as the original code, or could spit
> out code that performs this functionality. Often its nice to have some
> simplification of the tree as well since this sort of thing can be rather
> unwieldy for realistic languages.
> 
> It's also worth noting that the calc.py example is also very toy in that it
> matches single lines using the parser rather than collections of lines. (ie
> the parser has no conception of a piece of code containing more than one
> statement)
> 
> 
>>I'm trying to parse what looks like a 4GL source code.
> 
> 
> FWIW, start small - start with matching the simplest expressions you can and
> work forward from there (unless you're lucky enough to have a LALR(1) or
> SLR(1) grammar for it suitable for PLY already). Test first style coding
> for grammars feels intuitively wrong, but seems to work really well in
> practice - just make sure that after making every test work check in the
> result to CVS/your favourite version control system :-)

I've worked out my grammar in BNF, so I hope it is context free.
> 
> One other tip you might find useful - rather than sending the lexer whole
> files as PLY seems to expect, do line handling yourself and send it lines
> instead - it works much more like Flex/lex that way.
> 
> Regards,
> 
> 
> Michael.
> 


Thank you, this really helped my understanding.

maurice



More information about the Python-list mailing list