Is pyparsing really a recursive descent parser?

Sun Nov 4 23:30:13 EST 2007

"Neil Cerutti" <horpner at yahoo.com> wrote in message 
news:6xvXi.39855$G23.18679 at newsreading01.news.tds.net...
> On 2007-11-05, Just Another Victim of the Ambient Morality 
> <ihatespam at hotmail.com> wrote:
>>
>> "Kay Schluehr" <kay.schluehr at gmx.net> wrote in message
>> news:1194223837.104424.242360 at o3g2000hsb.googlegroups.com...
>>> On Nov 4, 10:44 pm, "Just Another Victim of the Ambient Morality"
>>> <ihates... at hotmail.com>
>>>
>>>> I believe there is a cure and it's called recursive descent parsing.
>>>> It's slow, obviously, but it's correct and, sometimes (arguably, 
>>>> often),
>>>> that's more important the execution speed.
>>>
>>> Recursive decendent parsing is not necessarily slow but from your
>>> remarks above I infer you want a general RD parser with backtracking:
>>> when one rule doesn't match, try another one to derive the current
>>> symbol in the input stream.
>>
>>     I think I've just discovered a major hurdle in my understand of the
>> problem.
>>     You keep saying "with backtracking."  Why?  Isn't "backtracking"
>> inherent in recursion?  So, why can't these alleged "recursive descent
>> parsers" find valid parsings?  How are they not already backtracking? 
>> What
>> was the point of being recursive if not to take advantage of the inherent
>> backtracking in it?
>>     Obviously, these parsers aren't recursing through what I think they
>> should be recursing.  The question is "why not?"
>
> There are different kinds of recursion. Compare:
>
>  def fac1(x, y=1):
>    """ Compute factorials with a recursive function (it calls
>    itself), but the stack is not actually used for storing
>    anything important, i.e., it is tail-recursive. """
>    if x < 0:
>      raise ValueError('non-negative integer')
>    elif x == 0:
>      return y
>    else:
>      return fac1(x-1, y*x)
>
> to
>
>  def fac2(x):
>    """ Computes factorials with a recursive process, keeping
>    the state of the calculation on the stack. """
>    if x < 0:
>      raise ValueError('non-negative integer')
>    if x == 0:
>      return 1
>    else:
>      return fac2(x-1) * x
>
> to
>
>  def Ack(x, y):
>    """ The Ackermann function. Creates a humongous mess even
>    with quite tiny numbers. """
>    if x < 0 or y < 0:
>      raise ValueError('non-negative integer')
>    elif x == 0:
>      return y + 1
>    elif y == 0:
>      return foo3(x-1, 1)
>    else:
>      return foo3(x-1, foo3(x, y-1))
>
> There's probably a word for the type of recursive process built
> by fac2; the RDP's I'm most familiar with create a fac2 sort of
> process, which stores valuable info on the stack.
>
> And even though fac1 defines an iterative process, the code
> itself is recursive, and you can call it a recursive function if
> you wish (and in Python you might as well).

    While interesting, none of this actually addresses the point I was 
making.  I wasn't saying that there was no recursion (at least, not in this 
paragraph), I was saying that it wasn't recursing through what I thought it 
should be recursing through.  It recurses through a set of rules without any 
regard to how these rules interact with each other.  That's why it fails to 
parse valid strings.  In my opinion, it should recurse through appropriate 
combinations of rules to determine validity, rather than by arbitrary 
categorization...


>>     Correct me if I'm wrong but I'm beginning to think that
>> pyparsing doesn't typically use recursion, at all.  It only
>> employs it if you create one, using the Forward class.
>> Otherwise, it does everything iteratively, hence the lack of
>> "backtracking."
>
> It's recursive because each production rule calls other
> production rules to define itself. A rule regularly ends up
> calling itself. Consider the Parser class I built earlier.
> list_tail keeps calling itself to continue consuming characters
> in an ab_list. The stack is used to keep track of where we are in
> the grammar; at any time you can look up the stack and see how
> you got where you are--you 'descend' down from the topmost
> productions to the most primitive productions, and then back up
> once everything has been sorted out. Take another look
> at the exception raised in my Parsing class example for an
> illustrative traceback.

    I guess that all the And and Or class in pyparsing call methods of each 
other from each other, even if they are doing so from different 
instantiations.  I still say they're not recursing through the right 
things...


>>> I'm not sure one needs to start again with a naive approach just to
>>> avoid any parser theory. For a user of a parser it is quite important
>>> whether she has to wait 50 seconds for a parse to run or 50
>>> milliseconds. I don't like to compromise speed for implementation
>>> simplicity here.
>>
>>     Finally, I can't believe you complain about potential speed
>> problems. First, depending on the size of the string, it's
>> likely to be the difference between 2ms and 200ms.  Secondly,
>> if speed were an issue, you wouldn't go with a recursive
>> descent parser.  You'd go with LALR or the many other parsing
>> techniques available.  Recursive descent parsing is for those
>> situations where you need correctness, regardless of execution
>> time.  These situations happen...
>
> RDP is plenty fast; speed has never been one of it's
> disadvantages, as far as I know. Today there are many
> excellent parser generators and compiler builders that compose an
> RDP under the hood, e.g., Antlr and Gentle.

    I think I read somewhere that LALR was O(n) while RDP was O(e^n).  Most 
people would consider that, at least, slower...
    I think your examples may exemplify how little speed matters rather than 
how fast RDPs are...


>>     I've said this before, albeit for a different language, but
>> it applies to Python just as well.  I don't use Python to write
>> fast code, I use it to write code fast.
>>     If _you_ "don't like to compromise speed for implementation
>> simplicity" then you have a plethora choices available to you.
>> What about the guy who needs to parse correctly and is
>> unconcerned about speed?
>
> You have to be concerned about speed when something runs so
> slowly in common circumstances compared to other well-known
> algotithms that you can't practically wait for an answer. Would
> you consider bubble-sort a suitable general-purpose sorting
> algorithm for Python?

    What you've stated is an hypothetical.  You need to be concerned about 
speed when "you can't practically wait for an answer."  Encryption is based 
on this, for example.  I'm not sure what you're getting at with "compared to 
other well known algorithms."  Generally, the Python virtual machine is 
pitifully slow when compared to a compiled C program, however, the 
comparison is unnecessary.  For most tasks, Python is plenty fast.  Context 
is key.  Things don't have to be fast, they just need to be fast enough...
    As I have said before, if you need speed, there are many options. 
Often, speed is not important, especially with the power of computing 
available today.  Development time is increasingly valuable.  So, for us who 
are not overly concerned with speed, what are our options?
    Now, if my RDP takes two minutes to parse four characters of text, I'd 
sympathize with your point of view.  However, it really isn't noticeably 
slow.  I may be nervous parsing an entire text file with it but I'd feel 
safe parsing individual lines of a .conf file, for instance...
    And I'll have you know that when I was still learning to program (that 
is, before I knew sort() was part of the C standard), I would routinely 
implement bubble sort 'cause it was so trivial to implement.  I wouldn't 
even short cut, necessarily.  Just iterate it as many times as there are 
elements in the list!
    Also, it should probably be noted that bubble sort is very fast for 
nearly sorted lists; much faster than quicksort.  So, while it shouldn't be 
the default sorting algorithm, you could have it somewhere in the library...