Is pyparsing really a recursive descent parser?

Sat Nov 3 01:33:41 EDT 2007

"Paul McGuire" <ptmcg at austin.rr.com> wrote in message 
news:1194007658.200713.178210 at 57g2000hsv.googlegroups.com...
> On Nov 2, 5:47 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>>
>> Pyparsing is no recursive descent parser.  It doesn't go back in the 
>> input
>> stream.  The ``OneOrMore(Word(alphas))`` part "eats" the 'end' and when 
>> it
>> can't get more, the parser moves to the ``Literal('end')`` part which
>> fails because the 'end' is already gone.
>>
>> >  Is there a way to get pyparsing to parse a grammar like this?
>>
>> Negative lookahead maybe:
>>
>> grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas))
>>            + Literal('end'))
>>
>> Ciao,
>>         Marc 'BlackJack' Rintsch- Hide quoted text -
>>
>> - Show quoted text -
>
> Well I'll be darned!  All this time, I thought "recursive descent"
> described the recursive behavior of the parser, which pyparsing
> definitely has.  I never knew that backing up in the face of parse
> mismatches was a required part of the picture.

    It has recursion in it but that's not sufficient to call it a recursive 
descent parser any more than having a recursive implementation of the 
factorial function.  The important part is what it recurses through...


> In pyparsing, each construction gets composed from more fine-grained
> constructions, and they are called recursively to match or not match
> against the input string.
>
> For example, taking your working parser example:
>
> grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas))
>           + Literal('end'))
>
> This creates the following data structure:
>
> - And
>  - OneOrMore
>    - And
>      - NotAny
>        - Literal('end')
>      - Word(alphas)
>  - Literal('end')
>
> Every instance in this structure derives from the base class
> ParserElement, which defines a method parse().  parse() in turn calls
> preParse(), parseImpl(), and postParse().  If parseImpl succeeds, it
> returns a ParseResults object and the next location within the input
> string to continue parsing.
>
> The parseImpl methods are most often overridden in the subclasses (a
> few override postParse as well), such as:
> - And.parseImpl invokes parse() (note the recursion) on each of the
> expressions in its list.  All must succeed or And.parseImpl throws an
> exception.
> - OneOrMore.parseImpl invokes parse() on its contained expression,
> which must succeed at least once; if not, the exception is rethrown.
> If the contained expression succeeds once, then its parse() method is
> called again and again until it fails, but no exceptions are rethrown,
> since only one match was actually required.
> - NotAny inverts the success/failure of its contained expression.  If
> the expression's parse() method succeeds, NotAny.parseImpl throws an
> exception.  If the contained expression's parse() method throws an
> exception, NotAny returns successfully.  (NotAny does not advance the
> parse location, nor does it return any tokens.)
> - Literal and Word are terminals, in that they do not invoke any other
> expression's parse() method.  Literal.parseImpl tests whether its
> string exists at the current parse location, and throws an exception
> if it doesn't.  Word.parseImpl tests whether the current parse
> location contains a letter in the Word instance's set of valid initial
> characters - if so success; if not, throws an exception.  It then
> advances through the input string, matching characters in the Word
> instance's set of valid body characters.  The entire matched string is
> then returned, along with an updated string index at which to continue
> parsing.

    Thank you for the detailed description of pyparsing's implementation.


> In my concept of "recursive descent" parsing, I was under the
> impression that pyparsing's use of this data structure, and
> *recursive* calls of parse() as it *descends* through the data
> structure, constituted a recursive descent parser.  What the OP
> requested was a more regular expression-ish matcher, with backup (or
> backtracking).

    In my concept of "recursive descent" parsing, I was under the impression 
that one should recurse through all rule combinations to ensure that the 
grammar is fully applied.  As I have mentioned before, merely having 
recursion in your algorithm is insufficient.  What you recurse through is 
key.  pyparsing recurses through rules but what's important is to recurse 
through rule combinations.  Otherwise, the parser won't be correct.  That is 
to say, there will be strings that are grammatically correct for which the 
parser will fail to parse.  Honestly, what is the point of a recursive 
descent parser if it doesn't parse correct string?  If you're willing to 
have correct strings unparsed, you might as well go for LALR parsing, or 
some such...
    In my opinion, the rule set I mentioned in my original post:


grammar = OneOrMore(Word(alphas)) + Literal('end')


    ...should be translated (internally) to something like this:


word = Word(alphas)
grammar = Forward()
grammar << ((word + grammar) | (word + Literal(end)))


    This allows a recursive search to find the string correct without any 
need for "backtracking," if I understand what you mean by that.  Couldn't 
pyparsing have done something like this?


> Your example did not include either of the alternation classes,
> MatchFirst or Or.  These classes implement a form of backtracking on
> the stack, that if we descend into an expression on the list of
> alternatives, and that expression fails, then MatchFirst or Or will
> back up to the same starting location and try the next alternative.
> (MatchFirst is an "eager" matcher, in that it will pick the first
> matching expression in its list - Or is an "exhaustive" matcher, in
> that it will evaluate all of the alternatives, and select the longest
> match.)

    I guess I was hoping that I could simply specify, mathematically, a 
grammar and have the software Do The Right Thing(tm)...


> The Wikipedia entry for "Recursive Descent Parser" describes this
> parser model as a "predictive parser", and later goes on to say that
> some (uncited) authors equate "predictive parser" with "recursive
> descent parsers".  The article makes a special distinction for a
> "recursive parser with backup", which is what I believe the OP was
> asking for.  Yes, pyparsing is definitely *not* a "recursive descent
> parser with backup."  The solution, as you correctly posted, is to add
> a negative lookahead.  NotAny is also represented using the '~'
> operator.
>
> So I will take my stance with the uncited authors who lump predictive
> parsers in with recursive descent parsers.

    Well, the good thing about Wikipedia is that, if you don't like the 
definition on the page, you're welcome to change it!  Although, I'd 
recommend discussing it on the talk page before you do, just to make sure 
there isn't a good reason for the page to be as it is...
    Thank you...