[Python-Dev] Grammar help.

Greg Ward gward@mems-exchange.org
Thu, 6 Jul 2000 18:22:56 -0400


On 07 July 2000, Thomas Wouters said:
> Argh, my head hurts ! :P

Yeah, but try writing your *own* parser-generator.  >grin<

> Here's what I did. The current definition of 'atom' is:
> 
> atom: '(' [testlist] ')' | '[' [testlist] ']' | << ... >>
> testlist: test (',' test)* [',']
> 
> I tried adding it like this:
> 
> atom: '(' [testlist] ')' | '[' [listmaker] ']' | << ... >>
> testlist: test (',' test)* [',']
> listmaker: testlist | rangemaker
> rangemaker: [test] ':' test

I think the problem is this:
  * one possible candidate for atom is: '[' listmaker
  * two candidates for listmaker: testlist or rangemaker
  * but both of these start with a test; the fact that one of them
    is optional is probably beside the point
  * the parser needs to see ',' or ':' to know if it's got a
    testlist or a rangemaker: but because a test could be any number
    of tokens, it would need infinite lookahead.  (I think.)  Since
    this is a recursive-descent parser, it is most likely LL(1) --
    at most one token of lookahead.  Not good enough.

Another, entirely hypothetical way of looking at it (based on experience
with a completely different recursive descent parser generator
(PCCTS/ANTLR) and dim recollections of the Dragon Book):
  * parser sees a '[' token and enters the 'atom' state
    (ie. calls a subroutine 'atom()')
  * swallows the '[' and immediately enters 'listmaker' (no other option)
  * now it wants to enter either 'testlist' or 'rangemaker': but the
    FIRST(testlist) == FIRST(test) (testlist can only start with a
    test), and FIRST(rangemaker) == FIRST(test) + ':' (rangemaker
    can start with a test or the ':' token)
  * I suspect this is the ambiguity: overlap in the FIRST sets of
    testlist and rangemaker

The fix?  Hmmm, I dunno.  Never hacked on the Python grammar myself, and 
it's been a while since I hacked seriously on any grammar.  Maybe the
people (person?) who really understands this stuff will have an idea...

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange / CNRI                           voice: +1-703-262-5376
Reston, Virginia, USA                            fax: +1-703-262-5367