recursion in grammar?

Sun Dec 28 21:23:10 EST 2003

On Sat, 27 Dec 2003 13:04:15 -0500, "Terry Reedy" <tjreedy at udel.edu>
wrote:

>
>"Peri" <peri12345 at poczta.onet.pl> wrote in message
>news:34ea966a.0312270509.5ecc2095 at posting.google.com...
>> I'm trying to create Python parser/interpreter using ANTLR.
>> Reading grammar from language refference I found:
>> or_expr::= xor_expr | or_expr "|" xor_expr
>>
>> For me it looks like infinite recursion.
>
>Better to call it unbounded recursion.  This is what makes general CF
>languages/grammars more extensive/powerful that regular languages.
>
>In particular, the recursion is written as left recursion.
>The equivalent right recursion would be
>or_expr::= xor_expr | xor_expr "|" or_expr
>The recursive 'call' is to the right of the connector literal instead of
>the left.
>
>There are two types of linear CF parsers: bottom-up LR and top-down LL.
>Each handles recursion written one way and chokes on recursion written the
>other way.  The two types have opposite requirements.  (I forget which is
>which.)  Since Antler choked on left recursion, I presume that it is the
>'other' type than Python's and that it requires right rather than left
>recursion.  So try reversing all the recursive definitions.  (Or try first
>the one that gives first error message and see if you get farther.)

Unless I am seriously mistaken, bottom-up LR parsers do not choke on
either left or right recursion. Right recursion, IIRC, requires
unbounded stack space in principle, but works OK in practice for the
simple reason that the input always defines a finite bound. It is
certainly less efficient, and carrys the risk that a complex input
might overflow the available stack space, but those are rarely serious
issues these days.

If LR parsing could not handle both left and right recursion, it
equally could not handle both left-associative and right-associative
operators. Of course most real parsers have specialised support to
optimise these common cases, but pure-form bottom-up LR parsing
algorithms such as LR(1) and LALR(1) are perfectly capable of handling
both left and right associative operators, and they do so using left
and right recursion as appropriate.

Which raises an important point - if you change the grammar to swap
the recursions from left to right or whatever, that results in a major
code-breaking change in how the parser interprets the language.
Left-associative expressions will be parsed as right-associative and
visa versa. Maybe worthwhile as a test to see what happens, but no
good for a final parser that is intended to work on real Python code.
Well, not without some post-processing of the resulting AST anyway.

ANTLR definitely uses LL parsing. I don't know about Pythons parsing
engine, though I suspect it uses LR-style parsing. It is a basic fact
of life that LL parsers and LR parsers do have quite different
limitations, and if Python uses LR parsing it will probably be
impossible to build a parser for the (unmodified) Python grammar using
an LL tool such as ANTLR. However, I find it hard to believe that
ANTLR has no means of working around these issues with some relatively
simple tweaks to the grammar specification.

I would raise the question in comp.compilers, or check if ANTLR has a
mailing list.

-- 
Steve Horne

steve at ninereeds dot fsnet dot co dot uk