grammar question

Greg Ewing greg at cosc.canterbury.ac.nz
Tue Feb 26 23:36:34 EST 2002


Fernando Pereira wrote:
> 
> This production cannot be disambiguated top-down with *any* lookahead, for
> the reason you suggest. Thus, it doesn't seem Python's grammar is LL(1).

Python's parser is an interesting beast. I haven't looked at
the internals of the parser generator, so I can't be sure,
but it seems that, although it may be using a recursive
descent algorithm at the production level, *within* a
production it must use some sort of regular expression 
matching engine.

Another thing is that all the items it collects while
parsing a given production are simply put into a tuple,
regardless of the internal structure of the production
used to parse it. Later on, while traversing the parse
tree, the compiler performs a *second* parsing operation
on the contents of the tuple, at which point it has
everything laid out before it, and can look ahead as
much as it wants.

For these reasons, saying just "Python's grammar is
LL(1)" is oversimplifying things considerably.

As for Nick Collier's problem, it seems likely that the
target parser generator isn't smart enough to cope with
the regular-expression-within-a-production thing that
the Python parser generator does. It should be possible
to transform it into something acceptable without too
much difficulty, though.

As it happens, I'm engaged in doing exactly that myself
at the moment, as I'm writing a recursive-descent LL(1)
parser for a superset of Python. I'm not using any parser
generator, just writing it by hand. I did all of the
expression stuff the other day, including the arglist
production, without any particular difficulty. So I 
know it can be done!

-- 
Greg Ewing, Computer Science Dept, University of Canterbury,	  
Christchurch, New Zealand
To get my email address, please visit my web page:	  
http://www.cosc.canterbury.ac.nz/~greg



More information about the Python-list mailing list