[Python-Dev] listcomp / par-for (summary)

Thomas Wouters thomas@xs4all.net
Wed, 12 Jul 2000 16:13:28 +0200


I'm gonna try and summarize all the opinions and suggestions in the
listcomprehensions thread-from-hell (mutt's thread-sorted view was going
bananas ;) but I'm certain I missed some suggestions and opinions, so I
apologize in advance ;)

Basically, what's been said is that list comprehensions as they are, are too
confusing. I haven't hear that many people say that, but not many people
objected to those that did say it, and plenty of alternative syntaxes were
proposed. 

The same goes for parallel-for. It isn't terribly obvious whether the
proposed syntax implements a cartesian product (as a nested for loop does)
or a cross-section. There seem to be less divergent thoughts about this,
though. A builtin (zip() or such) is preferred by a number of people, but
not that many alternative syntaxes came along.

As said, the currently implemented list-comprehension syntax was too
unclear, especially when nesting the 'for's and 'if's too deeply. One of the
proposals was to enforce the use of ()'s and add commas to make the 'for's
and 'if's more readable:

[(x,y) for x in a, y in b, if y > 3]

That is not going to work. Enforcing the parentheses is possible, but only
if you *always* enforce them, even if they don't create a tuple:

[([x,y]) for x in a]

(which would create a list of lists, not a list of tuples of lists.) And the
problem with enforcing it is that we have to split open a *lot* of the
Grammar, in order to disambiguate the syntax. Ain't a recursive descent
parser great ? :)

Secondly, the use of commas to seperate the for and if statements is not
possible. The parser will see the commas as part of the iter-list of the
previous for statement.

The same goes for all other seperators that already have a meaning inside a
'testlist': commas, vertical bars, ampersands, circumflexes, etc. About the
only character that already had a meaning inside Python that's usable at
this point is the semi-colon, because it's only used as a statement
seperator, not part of an expression list. Of course, using a new character
that hasn't a meaning yet also works.

And the same also goes for keywords in the same place. 'and' won't work, for
instance, because it's already part of 'exprlist' and 'testlist', but
'while' does work, because it's currently only valid as a statement, not in
an expression.

So we end up with the following possibly syntaxes:

[ <test> <for-or-if-stmt1> <for-or-if-stmt2> <...> ]

and

[ <test>; <for-or-if-stmt1>; <for-or-if-stmt2>; <...> ]

and perhaps the use of Python blocks and such:

[ <for-or-if-stmt1>:
	<for-or-if-stmt2>:
		<....>
			<test> ]

Though I think that is a bit of a far cry from the original proposal.

All the other proposals simply won't work without a massive re-ordering of
the Grammar or rewriting the entire parser into something more versatile
(and complicated) than a simple recursive descent parser.

And the exact same story can be told for the parallel-for loop:

for x in a and y in b:

will not work because the 'exprlist' in 'for' will eat up 'a and y in b'.
Likewise for 

for x in a | y in b:
for x|y in a|b:
for x,y in a|b:

etc. The unnatural

for {x,y} in a, b:

would work, but it'd require special-casing the use of {}'s, and might
suggest 'dict-unpacking' to people.

for x;y in a;b:

probably will work, too. Actually, yes, this will work, even though I
haven't tested it... It also makes seeing howmany lists you are looping over
pretty easy. Hmm, I might even like this more than the currently implemented
syntax, even if it's slightly less readable.

The range-literal thing ([:10:2] -> [0,2,4,6,8]) seems to be acceptable, I
believe ? It might 'merely' be syntactic sugar for a builtin, but it's
such pretty sugar ! :-) And it's not possible to shadow the builtin like
with range.

(Though I have one possible addition to the range syntax: range tuples, like
so:

(20:10:3)

They could just generate a rangeobject, like xrange does... That could be a
fair bit of memory savings ;) But I have to admit the syntax isn't as
obvious as [19:15:-1])

If I were to tally votes, I'd say the general consensus (for as far as there
is one ;-) is rougly like this:

list comprehensions: -1/-0
parallel for: -0
range literal: +0

Correct me if I'm wrong ;)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!