[Python-ideas] PEP 572: Statement-Local Name Bindings

Wed Feb 28 08:45:59 EST 2018

On Wed, Feb 28, 2018 at 10:49 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 27 February 2018 at 22:27, Chris Angelico <rosuav at gmail.com> wrote:
>> This is a suggestion that comes up periodically here or on python-dev.
>> This proposal introduces a way to bind a temporary name to the value
>> of an expression, which can then be used elsewhere in the current
>> statement.
>>
>> The nicely-rendered version will be visible here shortly:
>>
>> https://www.python.org/dev/peps/pep-0572/
>>
>> ChrisA
>>
>> PEP: 572
>> Title: Syntax for Statement-Local Name Bindings
>> Author: Chris Angelico <rosuav at gmail.com>
>> Status: Draft
>> Type: Standards Track
>> Content-Type: text/x-rst
>> Created: 28-Feb-2018
>> Python-Version: 3.8
>> Post-History: 28-Feb-2018
>
> Thanks for writing this - as you mention, this will be a useful
> document even if the proposal ultimately gets rejected.
>
> FWIW, I'm currently -1 on this, so "rejected" is what I'm expecting,
> but it's possible that subsequent discussions could refine this into
> something that people are happy with (or that I'm in the minority, of
> course :-))

TBH, I'm no more than +0.5 on the proposal being accepted, but a very
strong +1 on it being all written up in a PEP :)

>> Abstract
>> ========
>>
>> Programming is all about reusing code rather than duplicating it.  When
>> an expression needs to be used twice in quick succession but never again,
>> it is convenient to assign it to a temporary name with very small scope.
>> By permitting name bindings to exist within a single statement only, we
>> make this both convenient and safe against collisions.
>
> In the context of multi-line statements like with, or while,
> describing this as a "very small scope" is inaccurate (and maybe even
> misleading). See later for an example using def.

True. When I first wrote that paragraph, I wasn't thinking of compound
statements at all. They are, however, a powerful use of the new syntax
IMO. I'll remove the word "very" from there; "small scope" is still
the intention though, as the point of this is to be a subscope within
a (larger) function scope.

>> Rationale
>> =========
>>
>> When an expression is used multiple times in a list comprehension, there
>> are currently several suboptimal ways to spell this, and no truly good
>> ways. A statement-local name allows any expression to be temporarily
>> captured and then used multiple times.
>
> I agree with Rob Cliffe, this is a bit overstated.
>
> How about "there are currently several ways to spell this, none of
> which is universally accepted as ideal". Specifically, the point to me
> is that opinions are (strongly) divided, rather than that everyone
> agrees that it's a problem but we haven't found a good solution yet.

I can run with that wording. Thanks.

>> Syntax and semantics
>> ====================
>>
>> In any context where arbitrary Python expressions can be used, a named
>> expression can appear. This must be parenthesized for clarity, and is of
>> the form `(expr as NAME)` where `expr` is any valid Python expression,
>> and `NAME` is a simple name.
>
> I agree with requiring parentheses - although I dislike the tendency
> towards "mandatory parentheses" in recent syntax proposals :-( But "1
> as x + 1 as y" is an abomination. Conversely "sqrt((1 as x))" is a
> little annoying. So it feels like a case of the lesser of two evils to
> me, rather than an actually good idea...

The sqrt example could be changed in the future (either before or
after the PEP's acceptance). It's like a genexp - parens mandatory but
function calls are special-cased.

>> The value of such a named expression is the same as the incorporated
>> expression, with the additional side-effect that NAME is bound to that
>> value for the remainder of the current statement.
>
> While there's basically no justification for doing so, it should be
> noted that under this proposal, ((((((((1 as x) as y) as z) as w) as
> v) as u) as t) as s) is valid. Of course, "you can write confusing
> code using this" isn't an argument against a useful enhancement, but
> potential for abuse is something to be aware of. There's also
> (slightly more realistically) something like [(sqrt((b*b as bsq) +
> (4*a*c as fourac)) as root1), (sqrt(bsq - fourac) as root2)], which I
> can see someone thinking is a good idea!

Sure! Though I'm not sure what you're representing there; it looks
almost, but not quite, like the quadratic formula. If that was the
intention, I'd be curious to see the discriminant broken out, with
some kind of trap for the case where it's negative.

>> Example usage
>> =============
>>
>> These list comprehensions are all approximately equivalent::
>>
>>     # Calling the function twice
>>     stuff = [[f(x), f(x)] for x in range(5)]
>>
>>     # Helper function
>>     def pair(value): return [value, value]
>>     stuff = [pair(f(x)) for x in range(5)]
>>
>>     # Inline helper function
>>     stuff = [(lambda v: [v,v])(f(x)) for x in range(5)]
>>
>>     # Extra 'for' loop - see also Serhiy's optimization
>>     stuff = [[y, y] for x in range(5) for y in [f(x)]]
>>
>>     # Expanding the comprehension into a loop
>>     stuff = []
>>     for x in range(5):
>>         y = f(x)
>> stuff.append([y, y])
>>
>>     # Using a statement-local name
>>     stuff = [[(f(x) as y), y] for x in range(5)]
>
> Honestly, the asymmetry in [(f(x) as y), y] makes this the *least*
> readable option to me :-( All of the other options clearly show that
> the 2 elements of the list are the same, but the statement-local name
> version requires me to stop and think to confirm that it's a list of 2
> copies of the same value.

I need some real-world examples where it's not as trivial as [y, y] so
people don't get hung up on the symmetry issue.

>> Open questions
>> ==============
>>
>> 1. What happens if the name has already been used? `(x, (1 as x), x)`
>>    Currently, prior usage functions as if the named expression did not
>>    exist (following the usual lookup rules); the new name binding will
>>    shadow the other name from the point where it is evaluated until the
>>    end of the statement.  Is this acceptable?  Should it raise a syntax
>>    error or warning?
>
> IMO, relying on evaluation order is the only viable option, but it's
> confusing. I would immediately reject something like `(x, (1 as x),
> x)` as bad style, simply because the meaning of x at the two places it
> is used is non-obvious.
>
> I'm -1 on a warning. I'd prefer an error, but I can't see how you'd
> implement (or document) it.

Sure. For now, I'm just going to leave it as a perfectly acceptable
use of the feature; it can be rejected as poor style, but permitted by
the language.

>> 2. The current implementation [1] implements statement-local names using
>>    a special (and mostly-invisible) name mangling.  This works perfectly
>>    inside functions (including list comprehensions), but not at top
>>    level.  Is this a serious limitation?  Is it confusing?
>
> I'm strongly -1 on "works like the current implementation" as a
> definition of the behaviour. While having a proof of concept to
> clarify behaviour is great, my first question would be "how is the
> behaviour documented to work?" So what is the PEP proposing would
> happen if I did
>
> if ('.'.join((str(x) for x in sys.version_info[:2])) as ver) == '3.6':
>     # Python 3.6 specific code here
> elif sys.version_info[0] < 3:
>     print(f"Version {ver} is not supported")
>
> at the top level of a Python file? To me, that's a perfectly
> reasonable way of using the new feature to avoid exposing a binding
> for "ver" in my module...

I agree, sounds good. I'll reword this to be a limitation of implementation.

>> 4. Syntactic confusion in `except` statements.  While technically
>>    unambiguous, it is potentially confusing to humans.  In Python 3.7,
>>    parenthesizing `except (Exception as e):` is illegal, and there is no
>>    reason to capture the exception type (as opposed to the exception
>>    instance, as is done by the regular syntax).  Should this be made
>>    outright illegal, to prevent confusion?  Can it be left to linters?
>
> Wait - except (Exception as e): would set e to the type Exception, and
> not capture the actual exception object?

Correct. The expression "Exception" evaluates to the type Exception,
and you can capture that. It's a WutFace moment but it's a logical
consequence of the nature of Python.

> Even though that's
> unambiguous, it's *incredibly* confusing. But saying we allow "except
> <expr-but-not-a-name-binding>" is also bad, in the sense that it's an
> annoying exception (sic) to have to include in the documentation.

Agreed. I don't want to special-case it out; this is something for
code review to catch. Fortunately, this would give fairly obvious
results - you try to do something with the exception, and you don't
actually have an exception object, you have <class 'Exception'>. It's
a little more problematic in a "with" block, because it'll often do
the same thing.

> Maybe it would be viable to say that a (top-level) expression can
> never be a name binding - after all, there's no point because the name
> will be immediately discarded. Only allow name bindings in
> subexpressions. That would disallow this case as part of that general
> rule. But I've no idea what that rule would do to the grammar (in
> particular, whether it would still be possible to parse without
> lookahead). (Actually no, this would prohibit constructs such as
> `while (f(x) as val) > 0`, which I presume you're trying to
> support[1], although you don't mention this in the rationale or
> example usage sections).
>
> [1] Based on the fact that you want the name binding to remain active
> for the enclosing *statement*, not just the enclosing *expression*.

Not sure what you mean by a "top-level expression", if it's going to
disallow 'while (f(x) as val) > 0:'. Can you elaborate?

>> 5. Similar confusion in `with` statements, with the difference that there
>>    is good reason to capture the result of an expression, and it is also
>>    very common for `__enter__` methods to return `self`.  In many cases,
>>    `with expr as name:` will do the same thing as `with (expr as name):`,
>>    adding to the confusion.
>
> This seems to imply that the name in (expr as name) when used as a top
> level expression will persist after the closing parenthesis. Is that
> intended? It's not mentioned anywhere in the PEP (that I could see).
> On re-reading, I see that you say "for the remainder of the current
> *statement*" and not (as I had misread it) the remainder of the
> current *expression*.

Yep. If you have an expression on its own, it's an "expression
statement", and the subscope will end at the newline (or the
semicolon, if you have one). Inside something larger, it'll persist.

> So multi-line statements will give it a larger
> scope? That strikes me as giving this proposal a much wider
> applicability than implied by the summary. Consider
>
>     def f(x, y=(object() as default_value)):
>         if y is default_value:
>             print("You did not supply y")
>
> That's an "obvious" use of the new feature, and I could see it very
> quickly becoming the standard way to define sentinel values. I *think*
> it's a reasonable idiom, but honestly, I'm not sure. It certainly
> feels like scope creep from the original use case, which was naming
> bits of list comprehensions.

Eeeuaghh.... okay. Now I gotta think about this one.

The 'def' statement is an executable one, to be sure; but the
statement doesn't include *running* the function, only *creating* it.
So as you construct the function, default_value has that value. Inside
the actual running of it, that name doesn't exist any more. So this
won't actually work. But you could use this to create annotations and
such, I guess...

>>> def f():
...     def g(x=(object() as default_value)) -> default_value:
...        ...
...     return g
...
>>> f().__annotations__
{'return': <object object at 0x7fe37dab2280>}
>>> dis.dis(f)
  2           0 LOAD_GLOBAL              0 (object)
              2 CALL_FUNCTION            0
              4 DUP_TOP
              6 STORE_FAST               0 (default_value)
              8 BUILD_TUPLE              1
             10 LOAD_FAST                0 (default_value)
             12 LOAD_CONST               1 (('return',))
             14 BUILD_CONST_KEY_MAP      1
             16 LOAD_CONST               2 (<code object g at
0x7fe37d974ae0, file "<stdin>", line 2>)
             18 LOAD_CONST               3 ('f.<locals>.g')
             20 MAKE_FUNCTION            5
             22 STORE_FAST               1 (g)
             24 DELETE_FAST              0 (default_value)

  4          26 LOAD_FAST                1 (g)
             28 RETURN_VALUE

Disassembly of <code object g at 0x7fe37d974ae0, file "<stdin>", line 2>:
  3           0 LOAD_CONST               0 (None)
              2 RETURN_VALUE

Not terribly useful.

> Overall, it feels like the semantics of having the name bindings
> persist for the enclosing *statement* rather than the enclosing
> *expression* is a significant extension of the scope of the proposal,
> to the extent that the actual use cases that it would allow are mostly
> not mentioned in the rationale, and using it in comprehensions becomes
> merely yet another suboptimal solution to the problem of reusing
> calculated values in comprehensions!!!
>
> So IMO, either the semantics should be reduced to exposing the binding
> to just the enclosing top-level expression, or the rationale, use
> cases and examples should be significantly beefed up to reflect the
> much wider applicability of the feature (and then you'd need to
> address any questions that arise from that wider scope).

I'll add some more examples. I think the if/while usage is potentially of value.

Thanks for the feedback! Keep it coming! :)

ChrisA