Pretty Scheme, ??? Python

Mon Jul 2 19:28:58 EDT 2007

On Jul 2, 3:56 pm, Neil Cerutti <horp... at yahoo.com> wrote:
> On 2007-07-02, Laurent Pointal <laurent.poin... at wanadoo.fr> wrote:
>
> > Neil Cerutti wrote:
> >> How can I make the Python more idiomatic Python?
>
> > Have you taken a look at pyparsing ?
>
> Yes, I have it. PyParsing has, well, so many convenience features
> they seem to shout down whatever the core features are, and I
> don't know quite how to get started as a result.
>
> Hardest of all was modifying a working PyParsing program.
>
> As a result, I've found writing my own recursive descent parsers
> much easier.
>
> I'm probably wrong, though. ;)
>
> --
> Neil Cerutti

from pyparsing import *

"""
Neil -

Ok, here is the step-by-step, beginning with your posted BNF.  (Based
on your test cases, I think the '{}'s are really supposed to be
'()'s.)

; <WAE> ::=
;   <num>
;   | { + <WAE> <WAE> }
;   | { - <WAE> <WAE> }
;   | {with {<id> <WAE>} <WAE>}
;   | <id>

The most basic building blocks in pyparsing are Literal and Word.
With these, you compose "compound" elements using And and MatchFirst,
which are bound to the operators '+' and '|' (on occasion, Or is
required, bound to operator '^', but not for this simple parser).
Since you have a recursive grammar, you will also need Forward.
Whitespace is skipped implicitly.

Only slightly more advanced is the Group class, which will impart
hierarchy and structure to the results - otherwise, everything just
comes out as one flat list of tokens.  You may be able to remove these
in the final parser, depending on your results after steps 1 and 2 in
the "left for the student" part below, but they are here to help show
structure of the parsed tokens.

As convenience functions go, I think the most common are oneOf and
delimitedList.  oneOf might be useful here if you want to express id
as a single-char variable; otherwise, just use Word(alphas).

At this point you should be able to write a parser for this WAE
grammar.  Like the following 9-liner:
"""

LPAR = Literal("(").suppress()
RPAR = Literal(")").suppress()

wae = Forward()
num = Word(nums)
id = oneOf( list(alphas) )
addwae = Group( LPAR + "+" + wae + wae + RPAR )
subwae = Group( LPAR + "-" + wae + wae + RPAR )
withwae = Group( LPAR + "with" + LPAR + id + wae + RPAR + wae + RPAR )

wae << (addwae | subwae | withwae | num | id)

tests = """\
 3
 (+ 3 4)
 (with (x (+ 5 5)) (+ x x))""".splitlines()

for t in tests:
    print t
    waeTree = wae.parseString(t)
    print waeTree.asList()
    print

"""
If you extract and run this script, here are the results:
 3
['3']

 (+ 3 4)
[['+', '3', '4']]

 (with (x (+ 5 5)) (+ x x))
[['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]]

Left as an exercise for the student:
1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose
__init__ methods take a ParseResults object named tokens (which you
can treat as a list of tokens), and each with a calc() method to
evaluate them accordingly.
2. Hook each class to the appropriate WAE class using setParseAction.
Hint: here is one done for you:  num.setParseAction(NumWAE)
3. Modify the test loop to insert an evaluation of the parsed tree.

Extra credit: why is id last in the set of alternatives defined for
the wae expression?

-- Paul
"""