[Python-Dev] Python Grammar Ambiguity

Michael Foord fuzzyman at voidspace.org.uk
Mon Apr 24 21:21:36 CEST 2006


Guido van Rossum wrote:
> This is probably because we have a similar ambiguity in assignments:
> the grammar says something like
>
>   exprlist ('=' exprlist)*
>
> but what is actually desired is
>
>   (varlist '=')* exprlist
>
> Unfortunately the latter is not LL1 so we lie to the parser and tell
> it the first form, and then in the code generator checks that the
> actual parse tree for all exprlists except for the last conforms to
> the more limited syntax for varlist (which is only in our head). If
> not, it issues a "phase 2" SyntaxError. (One that doesn't have the
> exact column information.)
>
> We didn't have to do this for the syntax of the for-statement,
> list-comprehensions etc. because the ambiguity doesn't exist there
> (the 'for' keyword disambiguates the situation); but the current
> approach allows more code sharing in the code generator.
>
> I suggest you go ahead and write the second form for your own parser.
> Coming up with the correct rules for varlist is not hard.
>
>   
We've already done that, I was just checking there wasn't some case we'd 
missed were an expression was possible.

Thanks

Michael

> --Guido
>
> On 4/24/06, Michael Foord <michael.foord at resolversystems.com> wrote:
>   
>> (oops - should have gone to list)
>>
>> Guido van Rossum wrote:
>>     
>>> Well, yes, the syntax is supposed to be something like "for varlist in
>>> testlist". Could you report this as a doc bug (if you found this
>>> information in the docs)?
>>>
>>>       
>> I think the documentation (which does put expression_list there)
>> reflects the current state of the parser.
>>
>> First of all the grammar in SVN also has expression_list there *and* the
>> following does successfully parse to an ast (but fails when you compile
>> the ast) :
>>
>>     [x for x + 1 in y]
>>
>> All the best,
>>
>>
>> Michael Foord
>>
>>
>>     
>>> On 4/24/06, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
>>>
>>>       
>>>> Hello all,
>>>>
>>>> I'm working on a parser for part of the Python language (expressions but
>>>> not statements basically). I'm using PLY to generate the parser and it's
>>>> mostly done.
>>>>
>>>> I've hit on what looks like a fundamental ambiguity in the Python
>>>> grammar which is difficult to get round with PLY; and I'm wondering
>>>> *why* the grammar is defined in this way. It's possible there is a
>>>> reason that I've missed, which means I need to rethink my workaround.
>>>>
>>>> List displays (list comprehensions) are defined as (from
>>>> http://docs.python.org/ref/lists.html ) :
>>>>
>>>>
>>>> test      ::=      and_test ( "or" and_test )* | lambda_form
>>>> testlist     ::=     test ( "," test )* [ "," ]
>>>> list_display     ::=     "[" [listmaker] "]"
>>>> listmaker     ::=     expression ( list_for | ( "," expression )* [","] )
>>>> list_iter     ::=     list_for | list_if
>>>> list_for     ::=     "for" expression_list "in" testlist [list_iter]
>>>> list_if     ::=     "if" test [list_iter]
>>>>
>>>> The problem is that list_for is defined as :
>>>>
>>>>     "for" expression_list "in" testlist
>>>>
>>>> This allows arbitrary expressions in the 'assignment' part of a list
>>>> comprehension.
>>>>
>>>> As a result, the following is valid syntax according to the grammar :
>>>>
>>>>     [x for x + 1 in y]
>>>>
>>>> Obviously it isn't valid ! This parses to an ast, but the syntax error
>>>> is thrown when you compile the resulting ast.
>>>>
>>>> The problem is that for the basic case of a list comprehension ( ``[x
>>>> for x in y]``), ``x in y`` is a valid expression. That makes it
>>>> extremely hard to disambiguate the grammar so that the ``in`` is treated
>>>> correctly, and not part of an expression.
>>>>
>>>> My question is, why are arbitrary expressions allowed here in the
>>>> grammar ? As far as I can tell, only identifiers (nested in parentheses
>>>> or brackets) are valid here. I've got round the problem by creating a
>>>> new node 'identifier_list' and just having that return the expected
>>>> syntax tree (actually an expression list). This gets round the ambiguity
>>>> [#]_.
>>>>
>>>> It worries me that there might be a valid expression allowed here that I
>>>> haven't thought of. My current rules allow anything that looks like
>>>> ``(a, [b, c, (d, e)], f)`` - any  nested identifier list. Would anything
>>>> else be allowed ?
>>>>
>>>> If not, why not modify the grammar so that the compiler has less
>>>> possible invalid syntax trees to work with ?
>>>>
>>>> (Also the grammar definition of string conversion is wrong as it states
>>>> that a trailing comma is valid, which isn't the case. As far as I can
>>>> tell that is necessary to allow nesting string conversions.)
>>>>
>>>> Fuzzyman
>>>> http://www.voidspace.org.uk/python/index.shtml
>>>>
>>>>
>>>> .. [#] If I could make precedence work in PLY I could also solve it I
>>>> guess. However I can't. :-)
>>>>
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> http://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>>
>>>>
>>>>         
>>> --
>>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>>
>>>
>>>       
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>>     
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>
>   



More information about the Python-Dev mailing list