[Python-Dev] Informal educator feedback on PEP 572 (was Re: 2018 Python Language Summit coverage, last part)

Sat Jun 30 04:17:02 EDT 2018

On Wed, Jun 27, 2018 at 09:52:43PM -0700, Chris Barker wrote:

> It seems everyone agrees that scoping rules should be the same for
> generator expressions and comprehensions,

Yes. I dislike saying "comprehensions and generator expressions" over 
and over again, so I just say "comprehensions".

Principle One:

- we consider generator expressions to be a lazy comprehension;
- or perhaps comprehensions are eager generator expressions;
- either way, they behave the same in regard to scoping rules.

Principle Two:

- the scope of the loop variable stays hidden inside the 
  sub-local ("comprehension" or "implicit hidden function")
  scope;
- i.e. it does not "leak", even if you want it to.

Principle Three:

- glossing over the builtin name look-up, calling list(genexpr)
  will remain equivalent to using a list comprehension;

- similarly for set and dict comprehensions.

Principle Four:

- comprehensions (and genexprs) already behave "funny" inside
  class scope; any proposal to fix class scope is beyond the,
  er, scope of this PEP and can wait for another day.

So far, there should be (I hope!) no disagreement with those first four 
principles. With those four principles in place, teaching and using 
comprehensions (genexprs) in the absense of assignment expressions does 
not need to change one iota.

Normal cases stay normal; weird cases mucking about with locals() inside 
the comprehension are already weird and won't change.

> So what about:
> 
> l = [x:=i for i in range(3)]
> 
> vs
> 
> g = (x:=i for i in range(3))
> 
> Is there any way to keep these consistent if the "x" is in the regular
> local scope?

Yes. That is what closures already do.

We already have such nonlocal effects in Python 3. Move the loop inside 
an inner (nested) function, and then either call it immediately to 
simulate the effect of a list comprehension, or delay calling it to 
behave more like a generator expression.

Of course the *runtime* effects depend on whether or not the generator 
expression is actually evaluated. But that's no mystery, and is 
precisely analogous to this case:

def demo():
    x = 1
    def inner():
        nonlocal x
        x = 99
    inner()  # call the inner function
    print(x)

This prints 99. But if you comment out the call to the inner function, 
it prints 1. I trust that doesn't come as a surprise.

Nor should this come as a surprise:

def demo():
    x = 1
    # assuming assignment scope is local rather than sublocal
    g = (x:= i for i in (99,))
    L = list(g)
    print(x)

The value of x printed will depend on whether or not you comment out 
the call to list(g).

> Note that this thread is titled "Informal educator feedback on PEP 572".
> 
> As an educator -- this is looking harder an harder to explain to newbies...
> 
> Though easier if any assignments made in a "comprehension" don't "leak out".

Let me introduce two more principles.

Principle Five:

- all expressions are executed in the local scope.

Principle Six:

- the scope of an assignment expression variable inside a
  comprehension (genexpr) should not depend on where inside
  the comprehension it sits.

Five is, I think, so intuitive that we forget about it in the same way 
that we forget about the air we breathe. It would be surprising, even 
shocking, if two expressions in the same context were executed in 
different scopes:

    result = [x + 1, x - 2]

If the first x were local and the second was global, that would be 
disturbing. The same rule ought to apply if we include assignment 
expressions:

    result = [(x := expr) + 1, x := x - 2]

It would be disturbing if the first assignment (x := expr) executed in 
the local scope, and the second (x := x - 2) failed with NameError 
because it was executed in the global scope.

Or worse, *didn't* fail with NameError, but instead returned something 
totally unexpected.

Now bring in a comprehension:

    result = [(x := expr) + 1] + [x := x - 2 for a in (None,)]

Do you still want the x inside the comprehension to be a different x to 
the one outside the comprehension? How are you going to explain that 
UnboundLocalError to your students?

That's not actually a rhetorical question. I recognise that while 
Principle Five seems self-evidently desirable to me, you might consider 
it less important than the idea that "assignments inside comprehensions 
shouldn't leak".

I believe that these two expressions should give the same results even 
to the side-effects:

    [(x := expr) + 1, x := x - 2]

    [(x := expr) + 1] + [x := x - 2 for a in (None,)]

I think that is the simplest and most intuitive behaviour, the one 
which will be the least surprising, cause the fewest unexpected 
NameErrors, and be the simplest to explain.

If you still prefer the "assignments shouldn't leak" idea, consider 
this: under the current implementation of comprehensions as an implicit 
hidden function, the scope of a variable depends on *where* it is, 
violating Principle Six.

(That was the point of my introducing locals() into a previous post: to 
demonstrate that, today, right now, "comprehension scope" is a misnomer. 
Comprehensions actually execute in a hybrid of at least two scopes, the 
surrounding local scope and the sublocal hidden implicit function 
scope.)

Let me bring in another equivalency:

    [(x := expr) + 1, x := x - 2]

    [(x := expr) + 1] + [x := x - 2 for a in (None,)]

    [(x := expr) + 1] + [a for a in (x := x - 2,)]

By Principle Six, the side-effect of assigning to x shouldn't depend on 
where inside the comprehension it is. The two comprehension expressions 
shown ought to be referring to the same "x" variable (in the same scope) 
regardless of whether that is the surrounding local scope, or a sublocal 
comprehension scope.

(In the case of it being a sublocal scope, the two comprehensions will 
raise UnboundLocalError.)

But -- and this is why I raised all that hoo-ha about locals() -- 
according to the current implementation, they *don't*. This version 
would assign to x in the sublocal scope:

    # best viewed in a monospaced font
    [x := x - 2 for a in (None,)]
     ^^^^^^^^^^ this is sublocal scope

but this would assign in the surrounding local scope:

    [a for a in (x := x - 2,)]
                ^^^^^^^^^^^^^ this is local scope

I strongly believe that all three ought to be equivalent, including 
side-effects. (Remember that by Principle Two, we agree that the loop 
variable doesn't leak. The loop variable is invisible from the outside 
and doesn't count as a side-effect for this discussion.)

So here are three possibilities (assuming assignment expressions are 
permitted):

1. Nick doesn't like the idea of having to inject an implicit
   "nonlocal" into the comprehension hidden implicit function;
   if we don't, that gives us the case where the scope of
   assignment variables depends on where they are in the
   comprehension, and will sometimes leak and sometimes not.

This torpedoes Princple Six, and leaves you having to explain why 
assignment sometimes "works" inside comprehensions and sometimes gives 
UnboundLocalError.

2. If we decide that assignment inside a comprehension should always
   be sublocal, the implementation becomes more complex in order to
   bury the otherwise-local scope beneath another layer of even more
   hidden implicit functions.

That rules out some interesting (but probably not critical) uses of 
assignment expressions inside comprehensions, such as using them as a 
side-channel to sneak out debugging information.

And it adds a great big screaming special case to Principle Five:

-  all expressions, EXCEPT FOR THE INSIDE OF COMPREHENSIONS, are
  executed in the local scope.

3. Or we just make all assignments inside comprehensions (including gen 
exprs) occur in the surrounding local scope.

Number 3 is my strong preference. It complicates the implementation a 
bit (needing to add some implicit nonlocals) but not as much as needing 
to hide the otherwise-local scope beneath another implicit function. And 
it gives by far the most consistent, useful and obvious semantics out of 
the three options.

My not-very-extensive survey on the Python-List mailing lists suggests 
that, if you don't ask people explicitly about "assignment expressions", 
they already think of the inside of comprehensions as being part of the 
surrounding local scope rather than a hidden inner function. So I do not 
believe that this will be hard to teach.

These two expressions ought to give the same result with the same 
side-effect:

    [x := 1]

    [x := a for a in (1,)]

That, I strongly believe, is the inuitive behaviour to peope who aren't 
immersed in the implementation details of comprehensions, as well as 
being the most useful.

-- 
Steve