Is pyparsing really a recursive descent parser?

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Sun Nov 4 09:55:46 EST 2007


"Neil Cerutti" <horpner at yahoo.com> wrote in message 
news:wP9Xi.39822$G23.18573 at newsreading01.news.tds.net...
> On 2007-11-04, Just Another Victim of the Ambient Morality
> <ihatespam at hotmail.com> wrote:
>>
>> "Neil Cerutti" <horpner at yahoo.com> wrote in message
>> news:oz3Xi.39812$G23.23308 at newsreading01.news.tds.net...
>>>
>>> Is there not an ambiguity in the grammar?
>>>
>>> In EBNF:
>>>
>>>  goal --> WORD { WORD } END
>>>
>>>  WORD is '[a-zA-Z]+'
>>>  END is 'end'
>>>
>>> I think it is fine that PyParsing can't guess what the composer
>>> of that grammar meant.
>>
>> One interpretation conforms to the grammar while the other
>> doesn't. You would assume that the interpretation that agrees
>> with the grammar would be the preferable choice and so should
>> the program. Secondly, even if it is an ambiguity... so what?
>> pyparsing's current behaviour is to return a parse error,
>> pretending that the string can't be parsed.  Ideally, perhaps
>> it should alert you to the ambiguity but, surely, it's better
>> to return _a_ valid parsing than to pretend that the string
>> can't be parsed at all...
>
> I wouldn't characterize it as pretending. How would you parse:
>
>  hello end hello end
>
> "WORD END WORD END" and "WORD WORD WORD END" are both valid
> interpretations, according to the grammar.

    ...and it would be nice if the parser were to parse one of them since 
they are both right.  Having more than one right answer is not the same as 
having no answer, which is what pyparsing claims...


> As soon as you remove the ambiguity from the grammar, PyParsing
> starts to work correctly.

    This is simply not true.  Try this:


grammar = OneOrMore(Word(alphas)) + Literal('end') + Literal('.')
grammar.parseString('First Second Third end.')


    ...again, this will fail to parse.  Where's the ambiguity?
    Besides, parsing ambiguous grammars is a useful feature.  Not all 
grammars being parsed are designed by those doing the parsing...


> Consider writing a recursive decent parser by hand to parse the
> language '[ab]+b'.
>
>  goal --> ab_list 'b'
>  ab_list --> 'a' list_tail
>  ab_list --> 'b' list_tail
>  list_tail --> 'a' list_tail
>  list_tail --> 'b' list_tail
>  list_tail --> null
>
>
> The above has the exact same bug (and probably some others--I'm
> sorry unable to test it just now) as the PyParsing solution.
>
> The error is in the grammar. It might be fixed by specifying that
> 'b' must be followed by EOF, and then it could be coded by using
> more than one character of lookahead.

    I don't exactly understand the syntax you used to describe the 
productions of your recursive descent parser so not only did I not follow it 
but I couldn't make out the rest of your post.  Could you explain in a 
little more detail?  The last part that points to 'null' is especially 
confusing...
    As demonstrated earlier, it's not just the grammar.  There are 
situations that are unambiguous that pyparsing can't parse simply and 
there's no reason for it.
    Besides, ambiguous grammars are a fact of life and some of us need to 
parse them.  It's usually okay, too.  Consider a previous example:


grammar = OneOrMore(Word(alphas)) + Literal('end')


    While you may consider this inherently ambiguous, it's usually not. 
That is to say, as long as it is rare that 'end' is used not at the end of 
the string, this will simply parse and, yet, pyparsing will consistently 
fail to parse it...








More information about the Python-list mailing list