grammar question

Wed Feb 27 20:48:38 EST 2002

"Martin v. Loewis" wrote:
> 
> Notice that regular expressions also define LL(1) languages,
> so the entire pgen parser is a "regular expression matching engine"
> :-)

Well, yes. But what I was trying to say is that the
Python parser generator must be doing some sort of
NFA-to-DFA processing on the productions in order
to generate its tables.

If you want to feed the Python grammar into
something which doesn't understand productions with
REs in them, you have to split things like

  arglist: (argument ',')* (argument [',']| 
    '*' test [',' '**' test] | '**' test)

up into several productions. If you do that naively,
you get something like

  arglist: argspart1 argspart2
  argspart1: argument ',' argspart1 | empty
  argspart2: argument opt_comma | '*' test opt_kw | kw
  opt_kw: kw | empty
  kw: '**' test
  opt_comma: ',' | empty

which is ambiguous. To get it into an unambiguous LL(1)
form, some rearrangement is necessary.

I don't think it really makes sense to ask whether
the Python grammar *as written in the Grammar/Grammar 
file* is LL(1). The textbook theory about classification
of grammars all assumes that the grammar is written
using the simple Yacc-like form of productions.

The Python Grammar file is written in a higher-level
language than that, and happens to have been written
in a way that does not admit a trivial translation
into a lower-level form that is LL(1).

-- 
Greg Ewing, Computer Science Dept, University of Canterbury,	  
Christchurch, New Zealand
To get my email address, please visit my web page:	  
http://www.cosc.canterbury.ac.nz/~greg