Is pyparsing really a recursive descent parser?

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Sun Nov 4 16:44:32 EST 2007


"Neil Cerutti" <horpner at yahoo.com> wrote in message 
news:nPnXi.39845$G23.30920 at newsreading01.news.tds.net...
> On 2007-11-04, Just Another Victim of the Ambient Morality
> <ihatespam at hotmail.com> wrote:
>>> Consider writing a recursive decent parser by hand to parse
>>> the language '[ab]+b'.
>>>
>>>  goal --> ab_list 'b'
>>>  ab_list --> 'a' list_tail
>>>  ab_list --> 'b' list_tail
>>>  list_tail --> 'a' list_tail
>>>  list_tail --> 'b' list_tail
>>>  list_tail --> null
>>>
>>>
>>> The above has the exact same bug (and probably some others--I'm
>>> sorry unable to test it just now) as the PyParsing solution.
>>>
>>> The error is in the grammar. It might be fixed by specifying that
>>> 'b' must be followed by EOF, and then it could be coded by using
>>> more than one character of lookahead.
>>
>> I don't exactly understand the syntax you used to describe the
>> productions of your recursive descent parser so not only did I not follow 
>> it
>> but I couldn't make out the rest of your post.  Could you explain in a
>> little more detail?  The last part that points to 'null' is especially
>> confusing...
>
> It's the BNF spelling of
>
>  goal --> ab_list 'b'
>  ab_list --> ab { ab }
>  ab --> 'a' | 'b'
>
> The null is to say that list_tail can match nothing, i.e, an
> empty string.
>
> Then, in the Parser class, every method (except for match, which
> is used as a central place to consume characters) corresponds to
> one of the productions in the BNF. Breaking things down into
> BNF-based productions often makes implementation, debugging and
> code generation easier.
>
> PyParsing saves me that stop, since I can often directly
> implement the EBNF using PyParsing.

    Okay, I see that now, thank you.
    Your statement from the previous post:


>> Consider writing a recursive decent parser by hand to parse
>> the language '[ab]+b'.
>>
>>  goal --> ab_list 'b'
>>  ab_list --> 'a' list_tail
>>  ab_list --> 'b' list_tail
>>  list_tail --> 'a' list_tail
>>  list_tail --> 'b' list_tail
>>  list_tail --> null
>>
>>
>> The above has the exact same bug (and probably some others--I'm
>> sorry unable to test it just now) as the PyParsing solution.


    ...merely demonstrates that this grammar is similarly ambiguous.  There 
are many ways to parse this correctly and pyparsing chooses none of these! 
Instead, it returns the same error it does when the string has no 
solutions...


>> As demonstrated earlier, it's not just the grammar.  There are
>> situations that are unambiguous that pyparsing can't parse
>> simply and there's no reason for it.
>
> Yes, many parser generators have many more limitations than just
> the requirement of an unambiguous grammar.

    Yes, but a recursive descent parser?  I expect such things from LALR and 
others, but not only do I expect a recursive descent parser to correctly 
parse grammars but I expect it to even parse ambiguous ones, in that it is 
the only technique prepared to find more than one solution...


>> Besides, ambiguous grammars are a fact of life and some of us
>> need to parse them.  It's usually okay, too.  Consider a
>> previous example:
>>
>> grammar = OneOrMore(Word(alphas)) + Literal('end')
>>
>> While you may consider this inherently ambiguous, it's usually
>> not. That is to say, as long as it is rare that 'end' is used
>> not at the end of the string, this will simply parse and, yet,
>> pyparsing will consistently fail to parse it...
>
> I believe there's no cure for the confusion you're having except
> for implementing a parser for your proposed grammar.
> Alternatively, try implementing your grammar in one of your other
> favorite parser generators.

    I believe there is a cure and it's called recursive descent parsing. 
It's slow, obviously, but it's correct and, sometimes (arguably, often), 
that's more important the execution speed.

    I spent this morning whipping up a proof of concept parser whose 
interface greatly resembles pyparsing but, baring unknown bugs, works and 
works as I'd expect a recursive descent parser to work.  I don't know Python 
very well so the parser is pretty simple.  It only lexes single characters 
as tokens.  It only supports And, Or, Optional, OneOrMore and ZeroOrMore 
rules but I already think this is a rich set of rules.  I'm sure others can 
be added.  Finally, I'm not sure it's safely copying all its parameter input 
the same way pyparsing does but surely those bugs can be worked out.  It's 
merely a proof of concept to demonstrate a point.
    Everyone, please look it over and tell me what you think. 
Unfortunately, my news client is kind of poor, so I can't simply cut and 
paste the code into here.  All the tabs get turned into single spacing, so I 
will post this link, instead:


http://theorem.ca/~dlkong/new_pyparsing.zip


    I hope you can all deal with .zip files.  Let me know if this is a 
problem.
    Thank you...






More information about the Python-list mailing list