Is pyparsing really a recursive descent parser?

Neil Cerutti horpner at yahoo.com
Sun Nov 4 13:02:59 EST 2007


On 2007-11-04, Just Another Victim of the Ambient Morality
<ihatespam at hotmail.com> wrote:
>> Consider writing a recursive decent parser by hand to parse
>> the language '[ab]+b'.
>>
>>  goal --> ab_list 'b'
>>  ab_list --> 'a' list_tail
>>  ab_list --> 'b' list_tail
>>  list_tail --> 'a' list_tail
>>  list_tail --> 'b' list_tail
>>  list_tail --> null
>>
>>
>> The above has the exact same bug (and probably some others--I'm
>> sorry unable to test it just now) as the PyParsing solution.
>>
>> The error is in the grammar. It might be fixed by specifying that
>> 'b' must be followed by EOF, and then it could be coded by using
>> more than one character of lookahead.
>
> I don't exactly understand the syntax you used to describe the 
> productions of your recursive descent parser so not only did I not follow it 
> but I couldn't make out the rest of your post.  Could you explain in a 
> little more detail?  The last part that points to 'null' is especially 
> confusing...

It's the BNF spelling of
  
  goal --> ab_list 'b'
  ab_list --> ab { ab }
  ab --> 'a' | 'b'

The null is to say that list_tail can match nothing, i.e, an
empty string.

Then, in the Parser class, every method (except for match, which
is used as a central place to consume characters) corresponds to
one of the productions in the BNF. Breaking things down into 
BNF-based productions often makes implementation, debugging and
code generation easier.

PyParsing saves me that stop, since I can often directly
implement the EBNF using PyParsing.

> As demonstrated earlier, it's not just the grammar.  There are
> situations that are unambiguous that pyparsing can't parse
> simply and there's no reason for it.

Yes, many parser generators have many more limitations than just
the requirement of an unambiguous grammar.

> Besides, ambiguous grammars are a fact of life and some of us
> need to parse them.  It's usually okay, too.  Consider a
> previous example:
>
> grammar = OneOrMore(Word(alphas)) + Literal('end')
>
> While you may consider this inherently ambiguous, it's usually
> not. That is to say, as long as it is rare that 'end' is used
> not at the end of the string, this will simply parse and, yet,
> pyparsing will consistently fail to parse it...

I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

-- 
Neil Cerutti



More information about the Python-list mailing list