Is pyparsing really a recursive descent parser?

Sat Nov 3 15:00:36 EDT 2007

On 2007-11-03, Paul McGuire <ptmcg at austin.rr.com> wrote:
> On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality"
><ihates... at hotmail.com> wrote:
>>     It has recursion in it but that's not sufficient to call it a recursive
>> descent parser any more than having a recursive implementation of the
>> factorial function.  The important part is what it recurses through...
>
><snip>
>
>> In my opinion, the rule set I mentioned in my original post:
>>
>> grammar = OneOrMore(Word(alphas)) + Literal('end')
>>
>>     ...should be translated (internally) to something like this:
>>
>> word = Word(alphas)
>> grammar = Forward()
>> grammar << ((word + grammar) | (word + Literal(end)))
>>
>>     This allows a recursive search to find the string correct without any
>> need for "backtracking," if I understand what you mean by that.  Couldn't
>> pyparsing have done something like this?
>>
>
> Dear JAVotAM -
>
> This was a great post!  (Although I'm not sure the comment in the
> first paragraph was entirely fair - I know the difference between
> recursive string parsing and recursive multiplication.)
>
> You really got me thinking more about how this recursion actually
> behaves, especially with respect to elements such as OneOrMore.  Your
> original question is really quite common, and up until now, my stock
> answer has been to use negative lookahead.  The expression you posted
> is the first time I've seen this solution, and it *does* work.
>
> I was all set to write a cocky response on why your expression
> wouldn't work.  I've seen it many times before, where people (usually
> coming from EBNF experience) implement listOfXs = OneOrMore(x) as:
>
> listOfXs = Forward()
> listOfXs << ( x + listOfXs | x )

>
> Actually, what they usually write is:
>
> listOfXs << ( listOfXs + x )
>
> but this sends pyparsing into a recursive tailspin.
>
> So I fired up SciTE and copy/pasted your code into the editor and ran
> it, and it worked just fine - this was a shock!  I studied this for a
> few minutes, and then realized what was happening.  First of all, I
> misread what you posted.  You posted this:
>
> grammar << ((word + grammar) | (word + Literal(end)))
>
> which works.  I *thought* you posted this:
>
> grammar << ((word + grammar) | word) + Literal(end)
>
> which doesn't work.  In fact this behaves the same as your original
> post, except it iterates over the input string's words recursively,
> vs. repetition ins a for-loop, as is done by OneOrMore.
>
> So going back to your working version, I had to see why it works.
> Initially, the first term in the MatchFirst (the '|' operator creates
> MatchFirst instances) is just the same, and by grammar referencing
> itself, it just goes word by word through the input trying to find a
> match.  I'll try to visualize the steps:
>
> level    "First Second Third end"
> 1         word  grammar
> 2               word   grammar
> 3                      word  grammar
> 4                            word grammar <- fails!
> 4                            word end <- fails!
>                              (therefore level 3 grammar fails!)
> 3                      word  end    <-- success!!!
>
> grammar has 2 options: to match a word followed by a grammar, or to
> match a word followed by 'end'.  At 4 levels deep into the Forward's
> recursion, the first option fails, because after parsing "end" as the
> initial word, there is no more text to try to match against grammar.
> Level 4's Forward then also tries to match a word followed by 'end',
> but this fails for the same reason.  So at this point, the 4th level
> Forward fails to match either of its options, so it throws its
> exception back up to level 3, indicating that the first alternative,
> word followed by grammar, failed.  Level 3 then moves on to see if
> word followed by the literal 'end' matches, and it does - success!
>
> Here's where I am stuck now.  In the original grammar that you posted,
> which you want to render into this behavior, grammar is defined as:
>
> grammar = OneOrMore(Word(alphas)) + Literal('end')

Is there not an ambiguity in the grammar?

In EBNF:

  goal --> WORD { WORD } END

  WORD is '[a-zA-Z]+'
  END is 'end'

I think it is fine that PyParsing can't guess what the composer
of that grammar meant.

-- 
Neil Cerutti