[Python-Dev] Features for Python 2.0

Guido van Rossum guido@beopen.com
Tue, 25 Jul 2000 11:12:05 -0500


> > guido wrote:
> > > This is a (minor) open issue; I'd be happy with requiring
> > > 
> > >   [(x, x*2) for x in seq]

[effbot]
> > doesn't really change anything, does it?  the (x, x*2) could
> > still be a value tuple or an assignment target.

[effbot again]
> umm.  not sure "assignment target" was the right term here; what
> I'm trying to say is that you cannot tell what you're parsing until
> you've seen the "for" keyword.
> 
> or in other words, the "n" in "when you see 'x', you have to read
> 'n' more tokens before you can tell what 'x' is" suddenly became
> much larger...
> 
> (after all, I've successfully parsed Python code with a very simple
> non-backtracking recursive-descent parser -- and I'm not sure I
> can teach that one to recognize list comprehensions...)
> 
> (and yes, I'm sure this can be expressed in [AELRS]+\(\d+\)
> notation, but I'm not gonna try ;-)

Don't worry.  Greg Ewing had no problem expressing this in Python's
own grammar, which is about as restricted as parsers come.  (It's
LL(1), which is equivalent to pure recursive descent with one
lookahead token, i.e. no backtracking.)

Here's Greg's grammar:

  atom: ... | '[' [testlist [list_iter]] ']' | ...
  list_iter: list_for | list_if
  list_for: 'for' exprlist 'in' testlist [list_iter]
  list_if: 'if' test [list_iter]

Note that before, the list syntax was '[' [testlist] ']'.  Let me
explain it in different terms:

The parser parses a series comma-separated expressions.  Previously,
it was expecting ']' as the sole possible token following this.  After
the change, 'for' is another possible following token.  This is no
problem at all for any parser that knows how to parse matching
parentheses!

If you'd rather not support [x, y for ...] because it's ambiguous (to
the human reader, not to the parser!), we can change the grammar to
something like:

  '[' test [',' testlist | list_iter] ']'

(Note that | binds less than concatenation, and [...] means an
optional part.)  To merge with range literals, we'd have to go for:

  '[' test [':' test [':' test] | ',' testlist | list_iter ] ']'

Another modification is needed to support [:x] and [:x:y], but that's
a separate path through the parser entirely:

  atom: ... | '[' range | test [ range | ',' testlist | list_iter ] ']' | ...
  range: ':' test [':' test]

--Guido van Rossum (home page: http://dinsdale.python.org/~guido/)