[Python-Dev] Python Grammar Ambiguity

Guido van Rossum guido at python.org
Mon Apr 24 20:01:51 CEST 2006


This is probably because we have a similar ambiguity in assignments:
the grammar says something like

  exprlist ('=' exprlist)*

but what is actually desired is

  (varlist '=')* exprlist

Unfortunately the latter is not LL1 so we lie to the parser and tell
it the first form, and then in the code generator checks that the
actual parse tree for all exprlists except for the last conforms to
the more limited syntax for varlist (which is only in our head). If
not, it issues a "phase 2" SyntaxError. (One that doesn't have the
exact column information.)

We didn't have to do this for the syntax of the for-statement,
list-comprehensions etc. because the ambiguity doesn't exist there
(the 'for' keyword disambiguates the situation); but the current
approach allows more code sharing in the code generator.

I suggest you go ahead and write the second form for your own parser.
Coming up with the correct rules for varlist is not hard.

--Guido

On 4/24/06, Michael Foord <michael.foord at resolversystems.com> wrote:
> (oops - should have gone to list)
>
> Guido van Rossum wrote:
> > Well, yes, the syntax is supposed to be something like "for varlist in
> > testlist". Could you report this as a doc bug (if you found this
> > information in the docs)?
> >
>
> I think the documentation (which does put expression_list there)
> reflects the current state of the parser.
>
> First of all the grammar in SVN also has expression_list there *and* the
> following does successfully parse to an ast (but fails when you compile
> the ast) :
>
>     [x for x + 1 in y]
>
> All the best,
>
>
> Michael Foord
>
>
> > On 4/24/06, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> >
> >> Hello all,
> >>
> >> I'm working on a parser for part of the Python language (expressions but
> >> not statements basically). I'm using PLY to generate the parser and it's
> >> mostly done.
> >>
> >> I've hit on what looks like a fundamental ambiguity in the Python
> >> grammar which is difficult to get round with PLY; and I'm wondering
> >> *why* the grammar is defined in this way. It's possible there is a
> >> reason that I've missed, which means I need to rethink my workaround.
> >>
> >> List displays (list comprehensions) are defined as (from
> >> http://docs.python.org/ref/lists.html ) :
> >>
> >>
> >> test      ::=      and_test ( "or" and_test )* | lambda_form
> >> testlist     ::=     test ( "," test )* [ "," ]
> >> list_display     ::=     "[" [listmaker] "]"
> >> listmaker     ::=     expression ( list_for | ( "," expression )* [","] )
> >> list_iter     ::=     list_for | list_if
> >> list_for     ::=     "for" expression_list "in" testlist [list_iter]
> >> list_if     ::=     "if" test [list_iter]
> >>
> >> The problem is that list_for is defined as :
> >>
> >>     "for" expression_list "in" testlist
> >>
> >> This allows arbitrary expressions in the 'assignment' part of a list
> >> comprehension.
> >>
> >> As a result, the following is valid syntax according to the grammar :
> >>
> >>     [x for x + 1 in y]
> >>
> >> Obviously it isn't valid ! This parses to an ast, but the syntax error
> >> is thrown when you compile the resulting ast.
> >>
> >> The problem is that for the basic case of a list comprehension ( ``[x
> >> for x in y]``), ``x in y`` is a valid expression. That makes it
> >> extremely hard to disambiguate the grammar so that the ``in`` is treated
> >> correctly, and not part of an expression.
> >>
> >> My question is, why are arbitrary expressions allowed here in the
> >> grammar ? As far as I can tell, only identifiers (nested in parentheses
> >> or brackets) are valid here. I've got round the problem by creating a
> >> new node 'identifier_list' and just having that return the expected
> >> syntax tree (actually an expression list). This gets round the ambiguity
> >> [#]_.
> >>
> >> It worries me that there might be a valid expression allowed here that I
> >> haven't thought of. My current rules allow anything that looks like
> >> ``(a, [b, c, (d, e)], f)`` - any  nested identifier list. Would anything
> >> else be allowed ?
> >>
> >> If not, why not modify the grammar so that the compiler has less
> >> possible invalid syntax trees to work with ?
> >>
> >> (Also the grammar definition of string conversion is wrong as it states
> >> that a trailing comma is valid, which isn't the case. As far as I can
> >> tell that is necessary to allow nesting string conversions.)
> >>
> >> Fuzzyman
> >> http://www.voidspace.org.uk/python/index.shtml
> >>
> >>
> >> .. [#] If I could make precedence work in PLY I could also solve it I
> >> guess. However I can't. :-)
> >>
> >> _______________________________________________
> >> Python-Dev mailing list
> >> Python-Dev at python.org
> >> http://mail.python.org/mailman/listinfo/python-dev
> >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
> >>
> >>
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
> >
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list