[issue41659] PEG discrepancy on 'if {x} {a}: pass'

Lysandros Nikolaou report at bugs.python.org
Sat Aug 29 10:41:50 EDT 2020


Lysandros Nikolaou <lisandrosnik at gmail.com> added the comment:

I had a look at this. I have been testing with the input a{x}, which faces the same problem. 

It's actually because of an invalid_* rule. The call stack looks like this:

...
invalid_comprehension_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:15065)
genexp_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:11381)
primary_raw(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:10361)
primary_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:10285)
await_primary_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:10240)
...

The invalid_comprehension rule acecpts an LBRACE as the starting token and only fails after it's parsed it, which means that the parser fails with three tokens in the tokens array, the NAME which is valid, the LBRACE which is parsed for the invalid_comprehension rule and the NAME thereafter, upon which the parser fails and backs out of everything. Then, we look at the last token we've parsed and that's where we're placing the caret.

Because of invalid_comprehension, we can even go as far as making the parser show a completely different error, for example:

➜  cpython git:(master) ✗ cat a.py
a{*x for x in a}
➜  cpython git:(master) ✗ ./python a.py
  File "/home/lysnikolaou/repos/cpython/a.py", line 1
    a{*x for x in a}
      ^
SyntaxError: iterable unpacking cannot be used in comprehension

Or place the caret even further:

➜  cpython git:(master) ✗ cat a.py     
a{*x + a + b + c}
➜  cpython git:(master) ✗ ./python a.py
  File "/home/lysnikolaou/repos/cpython/a.py", line 1
    a{*x + a + b + c}
                    ^
SyntaxError: invalid syntax

There's a simple fix, which is adding an alternative to the primary rule, that parses something along the lines of `primary set` and then call RAISE_SYNTAX_ERROR_KNOWN_LOCATION there, but that would have to come before the genexp alternative, which worries me because of the performance implications. `primary` is a left recursive rule that gets called VERY often, probably more than a few times even when parsing a single NAME.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41659>
_______________________________________


More information about the Python-bugs-list mailing list