pyparser and recursion problem

Paul McGuire ptmcg at austin.rr.com
Thu Jul 26 22:54:15 EDT 2007


On Jul 26, 3:27 pm, Neil Cerutti <horp... at yahoo.com> wrote:
>
> Hopefully I'll have time to help you a bit more later, or Paul
> MaGuire will swoop down in his pyparsing powered super-suit. ;)
>
There's no need to fear...!

Neil was dead on, and your parser is almost exactly right.
Congratulations for delving into the arcane Dict class, not an easy
element for first time pyparsers!

Forward() would have been okay if you were going to have your macros
defined in advance of referencing them.  However, since you are
defining them after-the-fact, you'll have to wait until all the text
is parsed into a tree to start doing the macro substitution.

Your grammar as-is was almost exactly right (I've shown the minimal
mod needed to make this work, plus an alternative grammar that might
be a bit neater-looking).  To perform some work after the tree is
built, you attach a parse action to the top-level doc element.  This
parse action's job is to begin with the "Start" element, and
recursively replace words in all caps with their corresponding
substitution.  As you surmised, the Dict class automagically builds
the lookup dictionary for you during the parsing phase.

After the parse action runse, the resulting res["Start"] element gives
the desired results.  (This looks vaguely YAML-ish, am I right?)

-- Paul

Here is your working code:

from pyparsing import Word, Optional, OneOrMore, Group,  alphas, \
alphanums, Suppress, Dict, Combine, delimitedList, traceParseAction, \
ParseResults


import string


def allIn( as, members ):
    "Tests that all elements of as are in members"""
    for a in as:
        if a not in members:
            return False
    return True


def allUpper( as ):
    """Tests that all strings in as are uppercase"""
    return allIn( as, string.uppercase )


def getItems(myArray, myDict):
    """Recursively get the items for each CAPITAL word"""
    myElements=[]
    for element in myArray:
        myWords=[]
        for word in element:
            if allUpper(word):
                items = getItems(myDict[word], myDict)
                myWords.append(items)
            else:
                myWords.append(word)
        myElements.append(myWords)


    return myElements


testData = """
:Start: first SECOND THIRD  fourth FIFTH

:SECOND: second1_1 second1_2 | second2 | second3

:THIRD: third1 third2 | SIXTH

:FIFTH: fifth1 | SEVENTH

:SIXTH: sixth1_1 sixth1_2 | sixth2

:SEVENTH: EIGHTH | seventh1

:EIGHTH: eighth1 | eighth2

"""

#> original grammar - very close!
#> just needed to enclose definition of data in a Group
label = Suppress(":") + Word(alphas + "_") + Suppress(":")
words = Group(OneOrMore(Word(alphanums + "_"))) + \
    Suppress(Optional("|"))
#~ data = ~label + OneOrMore(words)
data = Group( OneOrMore(words) )
line = Group(label + data)
doc = Dict(OneOrMore(line))

#> suggested alternative grammar
#> - note use of Combine and delimitedList
#~ COLON = Suppress(":")
#~ label = Combine( COLON + Word(alphas + "_") + COLON )
#~ entry = Word(alphanums + "_")
#~ data = delimitedList( Group(OneOrMore(entry)), delim="|" )
#~ line = Group(label + data)
#~ doc = Dict(OneOrMore(line))

# recursive reference fixer-upper
def fixupRefsRecursive(tokens, lookup):
    if isinstance(tokens, ParseResults):
        subs = [ fixupRefsRecursive(t, lookup) for t in tokens ]
        tokens = ParseResults( subs )
    else:
        if tokens.isupper():
            tokens = fixupRefsRecursive(lookup[tokens], lookup)
    return tokens

#> add this parse action to doc, which invokes recursive
#> reference fixer-upper
def fixupRefs(tokens):
    tokens["Start"] = fixupRefsRecursive( tokens["Start"], tokens )

doc.setParseAction( fixupRefs )

res = doc.parseString(testData)

# This prints out what pyparser gives us
#~ for line in res:
    #~ print line
#> not really interested in all of res, just the fixed-up
#> "Start" entry
print res["Start"][0].asList()

print

startString = res["Start"]
items = getItems([startString], res)[0]
# This prints out what we want
for line in items:
    print line

Prints:
['first', [['second1_1', 'second1_2'], ['second2'], ['second3']],
[['third1', 'third2'], [[['sixth1_1', 'sixth1_2'], ['sixth2']]]],
'fourth', [['fifth1'], [[[[['eighth1'], ['eighth2']]],
['seventh1']]]]]

['first', [['second1_1', 'second1_2'], ['second2'], ['second3']],
[['third1', 'third2'], [[['sixth1_1', 'sixth1_2'], ['sixth2']]]],
'fourth', [['fifth1'], [[[[['eighth1'], ['eighth2']]],
['seventh1']]]]]




More information about the Python-list mailing list