[Python-Dev] Iteration variables and list comprehensions

Wed, 30 May 2001 08:49:29 -0500 (CDT)

Tim Peters writes:
 > > Because of this, I have frequently found myself debugging the
 > > following programming error:
 > 
 > If "frequently" is "a little more than usual", then it sounds like your
 > problems in all areas are too common for us to really help you by fixing
 > this one <wink>.

I've probably been bitten by this about 5-10 times over the last few
months. I can also say that it's a real bugger to track down when it
happens.  Now while this may just be a user problem on my part (which
I can accept), I think there is a much deeper semantic problem with
the current implementation of list comprehensions.  Specifically, we
now have this really cool list construction technique that is, for all
practical purposes, an operator.  Yet, at the same time, this
"operator" has a really nasty side-effect of changing the values of
variables in the surrounding scope in a very unnatural and unexpected
way.

More generally, it's essentially the same behavior that you would get
if you wrote some code like this:

    a = expr(x,y)

and expr() went off and nuked the value of x, replacing it with
something completely different (note: I'm not talking about cases
where x might be mutable here).  Since you can write things like this

    a = [ 2*x for x in s]

it's easy to view the right hand side as being isolated in the same
way as a normal expression (where the name of the iteration variable
"x" is incidental--a throwaway if you will).

Maybe everyone else views list comprehensions as a series of
statements (the syntactic sugar for nested for-loop idea).  However,
if you look at how they can be used, it's completely different than
this.  Specifically, if I write something like this:

   a = [2*x for x in s] + [3*x for x in t]

I certainly don't conceptualize it as being literally expanded into
the following sequence of statements:

   t1 = [ ]
   for x in s:
      t1.append(2*x)
   t2 = [ ]
   for x in t:
      t2.append(3*x)
   a = t1 + t2

 > 
 > I'm not sure it's worth losing the exact correspondence with nested loops;
 > or that it's not worth it either.  Note that "the iterator variables"
 > needn't be bare names:
 > 
 > >>> class x:
 > ...     pass
 > ...
 > >>> [1 for x.i in range(3)]
 > [1, 1, 1]
 > >>> x.i
 > 2
 > >>>
 > 

Hmmm. I didn't realize that you could even do this.    Yes, this would
definitely present a problem.   However, if list comprehensions were
modified not to assign any names in the current scope, it still
seems like this would work (in this case, "x" is already defined and
"x.i" is not creating a new name, but is setting an attribute on
something else).   Couldn't nested scopes be used to implement this
in some manner?

 > > ...
 > > Just as an aside, I have never intentionally used the iterator
 > > variable of a list comprehension after the operation has completed.
 > 
 > Not even in a debugger, when the operation has completed via unexpected
 > exception, and you're desperate to know what the control vrbl was bound to
 > at the time of death?  Or in an exception handler?
 > 

Nope.  I don't make programming mistakes---well, other than this one,
and well, all of those other ones :-).

 > Another principled model is possible, where
 > 
 >     [f(i) for i in whatever]
 > 
 > is treated like
 > 
 >     (lambda: [f(i) for i in whatever])()
 > 
 > >>> i = 12
 > >>> (lambda: [i**2 for i in range(4)])()
 > [0, 1, 4, 9]
 > >>> i
 > 12
 > >>>
 > 
 > That's more like Haskell does it.  But the day we explain a Python construct
 > in terms of a lambda transformation is the day Guido kills all of us <wink>.

Ah yes, well this is exactly the kind of behavior that seems most
natural to me.   It's also the behavior that everyone expected went I
went around to the various Python hackers in the department and asked
them about it yesterday.

I suppose I could just write this:

  a = (lambda s: [2*i for i in s])(s)

However, that's pretty ugly.

In any case, I'm mostly just curious if anyone else has been bitten by
the problem I've described.  I would certainly love to see a fix for
it (I would even volunteer to work on a prototype implementation if
there is interest). On the other hand, if no changes are deemed
necessary, we should at least try to better emphasize this behavior in the
documentation--perhaps encouraging people to use private names.  For
example:

   a = [_i*2 for _i in t]

(although, I have to say that this just looks like a gross hack--I'd
rather not have to resort to doing this).

Cheers,

Dave