Is pyparsing really a recursive descent parser?

Neil Cerutti horpner at yahoo.com
Sun Nov 4 21:09:08 EST 2007


On 2007-11-05, Just Another Victim of the Ambient Morality <ihatespam at hotmail.com> wrote:
>
> "Neil Cerutti" <horpner at yahoo.com> wrote in message 
> news:qzsXi.39852$G23.9814 at newsreading01.news.tds.net...
>> On 2007-11-04, Just Another Victim of the Ambient Morality
>> <ihatespam at hotmail.com> wrote:
>>> "Neil Cerutti" <horpner at yahoo.com> wrote in message
>>> news:nPnXi.39845$G23.30920 at newsreading01.news.tds.net...
>>>> I believe there's no cure for the confusion you're having except
>>>> for implementing a parser for your proposed grammar.
>>>> Alternatively, try implementing your grammar in one of your other
>>>> favorite parser generators.
>>>
>>>     I believe there is a cure and it's called recursive descent
>>> parsing. It's slow, obviously, but it's correct and, sometimes
>>> (arguably, often), that's more important the execution speed.
>>>
>>>     I spent this morning whipping up a proof of concept parser
>>> whose interface greatly resembles pyparsing but, baring unknown
>>> bugs, works and works as I'd expect a recursive descent parser
>>> to work.  I don't know Python very well so the parser is pretty
>>> simple.  It only lexes single characters as tokens.  It only
>>> supports And, Or, Optional, OneOrMore and ZeroOrMore rules but
>>> I already think this is a rich set of rules.  I'm sure others
>>> can be added.  Finally, I'm not sure it's safely copying all
>>> its parameter input the same way pyparsing does but surely
>>> those bugs can be worked out.  It's merely a proof of concept
>>> to demonstrate a point.
>>>     Everyone, please look it over and tell me what you think.
>>> Unfortunately, my news client is kind of poor, so I can't
>>> simply cut and paste the code into here.  All the tabs get
>>> turned into single spacing, so I will post this link, instead:
>>>
>>> http://theorem.ca/~dlkong/new_pyparsing.zip
>>
>> Your program doesn't necessarily address the ambiguity in the
>> grammar in question, since right now it is only a recognizer.
>> Will it be hard to get it to return a parse tree?
>
>     Hey, it's only a proof of concept.  If you can parse the tree, surely 
> you can record what you parsed, right?
>     Did you notice that the parse() functions have the rather serious bug of 
> not returning how much of the string they could parse?  

Unfortunately I haven't had much time to play with it today; just
barely enough to put it through a very few paces.

> It just so happens that the contstructions that I made only
> ever had to increment the matches by one, so they just happen
> to work.  That's an easy bug to fix but a pretty major one to
> have overlooked.  Hence, my enthusiasm for input...
>
>> The grammar in your implementation is:
>>
>>>>> goal = OneOrMore(RuleAnd('a') | RuleAnd('b')) + RuleAnd('b')
>>>>> goal.parse(0, 'ab')
>> True
>>>>> goal.parse(0, 'ba')
>> False
>>>>> goal.parse(0, 'b')
>> False
>>>>> goal.parse(0, 'aaab')
>> True
>>>>> goal.parse(0, 'abc')
>> True
>>
>> So far so good. :)
>
>     Good!  Keep hammering at it!
>     More importantly, study it to understand the idea I'm
> trying to convey. This is what I thought a recursive descent
> parser would do...

Kay has pointed out how it works. Strangely enough, I've never
studied a backtracking RDP before (trying to teach yourself a
subject like parsing can be tricky--I've had to somehow avoid all
the texts that overuse Greek letters--those incomprehensible
symbols confuse the hell out of me). It does simplify the job of
the grammar designer, but Kay's message makes it sound like it
won't scale very well.

It might, perhaps, be an interesting feature for PyParsing to
entertain by setting a 'backtracking' option, for when you're
writing a quick script and don't want to fuss too much with a
non-conformant grammar.

I'll have more time to look at it tomorrow.

-- 
Neil Cerutti



More information about the Python-list mailing list