[Python-ideas] PEP 572: Assignment Expressions (post #4)

Wed Apr 11 01:32:04 EDT 2018

Wholesale changes since the previous version. Statement-local name
bindings have been dropped (I'm still keeping the idea in the back of
my head; this PEP wasn't the first time I'd raised the concept), and
we're now focusing primarily on assignment expressions, but also with
consequent changes to comprehensions.

Sorry for the lengthy delays; getting a reference implementation going
took me longer than I expected or intended.

ChrisA

PEP: 572
Title: Assignment Expressions
Author: Chris Angelico <rosuav at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Feb-2018
Python-Version: 3.8
Post-History: 28-Feb-2018, 02-Mar-2018, 23-Mar-2018, 04-Apr-2018

Abstract
========

This is a proposal for creating a way to assign to names within an expression.
Additionally, the precise scope of comprehensions is adjusted, to maintain
consistency and follow expectations.

Rationale
=========

Naming the result of an expression is an important part of programming,
allowing a descriptive name to be used in place of a longer expression,
and permitting reuse.  Currently, this feature is available only in
statement form, making it unavailable in list comprehensions and other
expression contexts.  Merely introducing a way to assign as an expression
would create bizarre edge cases around comprehensions, though, and to avoid
the worst of the confusions, we change the definition of comprehensions,
causing some edge cases to be interpreted differently, but maintaining the
existing behaviour in the majority of situations.

Syntax and semantics
====================

In any context where arbitrary Python expressions can be used, a **named
expression** can appear. This can be parenthesized for clarity, and is of
the form ``(target := expr)`` where ``expr`` is any valid Python expression,
and ``target`` is any valid assignment target.

The value of such a named expression is the same as the incorporated
expression, with the additional side-effect that the target is assigned
that value.

    # Similar to the boolean 'or' but checking for None specifically
    x = "default" if (eggs := spam().ham) is None else eggs

    # Even complex expressions can be built up piece by piece
    y = ((eggs := spam()), (cheese := eggs.method()), cheese[eggs])

Differences from regular assignment statements
----------------------------------------------

An assignment statement can assign to multiple targets::

    x = y = z = 0

To do the same with assignment expressions, they must be parenthesized::

    assert 0 == (x := (y := (z := 0)))

Augmented assignment is not supported in expression form::

    >>> x +:= 1
      File "<stdin>", line 1
        x +:= 1
            ^
    SyntaxError: invalid syntax

Otherwise, the semantics of assignment are unchanged by this proposal.

Alterations to comprehensions
-----------------------------

The current behaviour of list/set/dict comprehensions and generator
expressions has some edge cases that would behave strangely if an assignment
expression were to be used. Therefore the proposed semantics are changed,
removing the current edge cases, and instead altering their behaviour *only*
in a class scope.

As of Python 3.7, the outermost iterable of any comprehension is evaluated
in the surrounding context, and then passed as an argument to the implicit
function that evaluates the comprehension.

Under this proposal, the entire body of the comprehension is evaluated in
its implicit function. Names not assigned to within the comprehension are
located in the surrounding scopes, as with normal lookups. As one special
case, a comprehension at class scope will **eagerly bind** any name which
is already defined in the class scope.

A list comprehension can be unrolled into an equivalent function. With
Python 3.7 semantics::

    numbers = [x + y for x in range(3) for y in range(4)]
    # Is approximately equivalent to
    def <listcomp>(iterator):
        result = []
        for x in iterator:
            for y in range(4):
                result.append(x + y)
        return result
    numbers = <listcomp>(iter(range(3)))

Under the new semantics, this would instead be equivalent to::

    def <listcomp>():
        result = []
        for x in range(3):
            for y in range(4):
                result.append(x + y)
        return result
    numbers = <listcomp>()

When a class scope is involved, a naive transformation into a function would
prevent name lookups (as the function would behave like a method).

    class X:
        names = ["Fred", "Barney", "Joe"]
        prefix = "> "
        prefixed_names = [prefix + name for name in names]

With Python 3.7 semantics, this will evaluate the outermost iterable at class
scope, which will succeed; but it will evaluate everything else in a function::

    class X:
        names = ["Fred", "Barney", "Joe"]
        prefix = "> "
        def <listcomp>(iterator):
            result = []
            for name in iterator:
                result.append(prefix + name)
            return result
        prefixed_names = <listcomp>(iter(names))

The name ``prefix`` is thus searched for at global scope, ignoring the class
name. Under the proposed semantics, this name will be eagerly bound, being
approximately equivalent to::

    class X:
        names = ["Fred", "Barney", "Joe"]
        prefix = "> "
        def <listcomp>(prefix=prefix):
            result = []
            for name in names:
                result.append(prefix + name)
            return result
        prefixed_names = <listcomp>()

With list comprehensions, this is unlikely to cause any confusion. With
generator expressions, this has the potential to affect behaviour, as the
eager binding means that the name could be rebound between the creation of
the genexp and the first call to ``next()``. It is, however, more closely
aligned to normal expectations.  The effect is ONLY seen with names that
are looked up from class scope; global names (eg ``range()``) will still
be late-bound as usual.

One consequence of this change is that certain bugs in genexps will not
be detected until the first call to ``next()``, where today they would be
caught upon creation of the generator. TODO: Discuss the merits and costs
of amelioration proposals.

Recommended use-cases
=====================

Simplifying list comprehensions
-------------------------------

These list comprehensions are all approximately equivalent::

    # Calling the function twice
    stuff = [[f(x), x/f(x)] for x in range(5)]

    # External helper function
    def pair(x, value): return [value, x/value]
    stuff = [pair(x, f(x)) for x in range(5)]

    # Inline helper function
    stuff = [(lambda y: [y,x/y])(f(x)) for x in range(5)]

    # Extra 'for' loop - potentially could be optimized internally
    stuff = [[y, x/y] for x in range(5) for y in [f(x)]]

    # Iterating over a genexp
    stuff = [[y, x/y] for x, y in ((x, f(x)) for x in range(5))]

    # Expanding the comprehension into a loop
    stuff = []
    for x in range(5):
        y = f(x)
        stuff.append([y, x/y])

    # Wrapping the loop in a generator function
    def g():
        for x in range(5):
            y = f(x)
            yield [y, x/y]
    stuff = list(g())

    # Using a mutable cache object (various forms possible)
    c = {}
    stuff = [[c.update(y=f(x)) or c['y'], x/c['y']] for x in range(5)]

    # Using a temporary name
    stuff = [[y := f(x), x/y] for x in range(5)]

If calling ``f(x)`` is expensive or has side effects, the clean operation of
the list comprehension gets muddled. Using a short-duration name binding
retains the simplicity; while the extra ``for`` loop does achieve this, it
does so at the cost of dividing the expression visually, putting the named
part at the end of the comprehension instead of the beginning.

Capturing condition values
--------------------------

Assignment expressions can be used to good effect in the header of
an ``if`` or ``while`` statement::

    # Current Python, not caring about function return value
    while input("> ") != "quit":
        print("You entered a command.")

    # Current Python, capturing return value - four-line loop header
    while True:
        command = input("> ");
        if command == "quit":
            break
        print("You entered:", command)

    # Proposed alternative to the above
    while (command := input("> ")) != "quit":
        print("You entered:", command)

    # Capturing regular expression match objects
    # See, for instance, Lib/pydoc.py, which uses a multiline spelling
    # of this effect
    if match := re.search(pat, text):
        print("Found:", match.group(0))

    # Reading socket data until an empty string is returned
    while data := sock.read():
        print("Received data:", data)

Particularly with the ``while`` loop, this can remove the need to have an
infinite loop, an assignment, and a condition. It also creates a smooth
parallel between a loop which simply uses a function call as its condition,
and one which uses that as its condition but also uses the actual value.

Rejected alternative proposals
==============================

Proposals broadly similar to this one have come up frequently on python-ideas.
Below are a number of alternative syntaxes, some of them specific to
comprehensions, which have been rejected in favour of the one given above.

Alternative spellings
---------------------

Broadly the same semantics as the current proposal, but spelled differently.

1. ``EXPR as NAME``, with or without parentheses::

       stuff = [[f(x) as y, x/y] for x in range(5)]

   Omitting the parentheses in this form of the proposal introduces many
   syntactic ambiguities.  Requiring them in all contexts leaves open the
   option to make them optional in specific situations where the syntax is
   unambiguous (cf generator expressions as sole parameters in function
   calls), but there is no plausible way to make them optional everywhere.

   With the parentheses, this becomes a viable option, with its own tradeoffs
   in syntactic ambiguity.  Since ``EXPR as NAME`` already has meaning in
   ``except`` and ``with`` statements (with different semantics), this would
   create unnecessary confusion or require special-casing.

2. Adorning statement-local names with a leading dot::

       stuff = [[(f(x) as .y), x/.y] for x in range(5)] # with "as"
       stuff = [[(.y := f(x)), x/.y] for x in range(5)] # with ":="

   This has the advantage that leaked usage can be readily detected, removing
   some forms of syntactic ambiguity.  However, this would be the only place
   in Python where a variable's scope is encoded into its name, making
   refactoring harder.  This syntax is quite viable, and could be promoted to
   become the current recommendation if its advantages are found to outweigh
   its cost.

3. Adding a ``where:`` to any statement to create local name bindings::

       value = x**2 + 2*x where:
           x = spam(1, 4, 7, q)

   Execution order is inverted (the indented body is performed first, followed
   by the "header").  This requires a new keyword, unless an existing keyword
   is repurposed (most likely ``with:``).  See PEP 3150 for prior discussion
   on this subject (with the proposed keyword being ``given:``).

Special-casing conditional statements
-------------------------------------

One of the most popular use-cases is ``if`` and ``while`` statements.  Instead
of a more general solution, this proposal enhances the syntax of these two
statements to add a means of capturing the compared value::

    if re.search(pat, text) as match:
        print("Found:", match.group(0))

This works beautifully if and ONLY if the desired condition is based on the
truthiness of the captured value.  It is thus effective for specific
use-cases (regex matches, socket reads that return `''` when done), and
completely useless in more complicated cases (eg where the condition is
``f(x) < 0`` and you want to capture the value of ``f(x)``).  It also has
no benefit to list comprehensions.

Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction
of possible use-cases, even in ``if``/``while`` statements.

Special-casing comprehensions
-----------------------------

Another common use-case is comprehensions (list/set/dict, and genexps). As
above, proposals have been made for comprehension-specific solutions.

1. ``where``, ``let``, or ``given``::

       stuff = [(y, x/y) where y = f(x) for x in range(5)]
       stuff = [(y, x/y) let y = f(x) for x in range(5)]
       stuff = [(y, x/y) given y = f(x) for x in range(5)]

   This brings the subexpression to a location in between the 'for' loop and
   the expression. It introduces an additional language keyword, which creates
   conflicts. Of the three, ``where`` reads the most cleanly, but also has the
   greatest potential for conflict (eg SQLAlchemy and numpy have ``where``
   methods, as does ``tkinter.dnd.Icon`` in the standard library).

2. ``with NAME = EXPR``::

       stuff = [(y, x/y) with y = f(x) for x in range(5)]

   As above, but reusing the `with` keyword. Doesn't read too badly, and needs
   no additional language keyword. Is restricted to comprehensions, though,
   and cannot as easily be transformed into "longhand" for-loop syntax. Has
   the C problem that an equals sign in an expression can now create a name
   binding, rather than performing a comparison. Would raise the question of
   why "with NAME = EXPR:" cannot be used as a statement on its own.

3. ``with EXPR as NAME``::

       stuff = [(y, x/y) with f(x) as y for x in range(5)]

   As per option 2, but using ``as`` rather than an equals sign. Aligns
   syntactically with other uses of ``as`` for name binding, but a simple
   transformation to for-loop longhand would create drastically different
   semantics; the meaning of ``with`` inside a comprehension would be
   completely different from the meaning as a stand-alone statement, while
   retaining identical syntax.

Regardless of the spelling chosen, this introduces a stark difference between
comprehensions and the equivalent unrolled long-hand form of the loop.  It is
no longer possible to unwrap the loop into statement form without reworking
any name bindings.  The only keyword that can be repurposed to this task is
``with``, thus giving it sneakily different semantics in a comprehension than
in a statement; alternatively, a new keyword is needed, with all the costs
therein.

Migration path
==============

The semantic changes to list/set/dict comprehensions, and more so to generator
expressions, may potentially require migration of code. In many cases, the
changes simply make legal what used to raise an exception, but there are some
edge cases that were previously legal and are not, and a few corner cases with
altered semantics.

Yield inside comprehensions
---------------------------

As of Python 3.7, the outermost iterable in a comprehension is permitted to
contain a 'yield' expression. If this is required, the iterable (or at least
the yield) must be explicitly elevated from the comprehension::

    # Python 3.7
    def g():
        return [x for x in [(yield 1)]]
    # With PEP 572
    def g():
        sent_item = (yield 1)
        return [x for x in [sent_item]]

This more clearly shows that it is g(), not the comprehension, which is able
to yield values (and is thus a generator function). The entire comprehension
is consistently in a single scope.

Name lookups in class scope
---------------------------

A comprehension inside a class previously was able to 'see' class members ONLY
from the outermost iterable. Other name lookups would ignore the class and
potentially locate a name at an outer scope::

    pattern = "<%d>"
    class X:
        pattern = "[%d]"
        numbers = [pattern % n for n in range(5)]

In Python 3.7, ``X.numbers`` would show angle brackets; with PEP 572, it would
show square brackets. Maintaining the current behaviour here is best done by
using distinct names for the different forms of ``pattern``, as would be the
case with functions.

Generator expression bugs can be caught later
---------------------------------------------

Certain types of bugs in genexps were previously caught more quickly. Some are
now detected only at first iteration::

    gen = (x for x in rage(10)) # NameError
    gen = (x for x in 10) # TypeError (not iterable)
    gen = (x for x in range(1/0)) # Exception raised during evaluation

This brings such generator expressions in line with a simple translation to
function form::

    def <genexp>():
        for x in rage(10):
            yield x
    gen = <genexp>() # No exception yet
    tng = next(gen) # NameError

To detect these errors more quickly, ... TODO.

Open questions
==============

Can the outermost iterable still be evaluated early?
----------------------------------------------------

As of Python 3.7, the outermost iterable in a genexp is evaluated early, and
the result passed to the implicit function as an argument.  With PEP 572, this
would no longer be the case. Can we still, somehow, evaluate it before moving
on? One possible implementation would be::

    gen = (x for x in rage(10))
    # translates to
    def <genexp>():
        iterable = iter(rage(10))
        yield None
        for x in iterable:
            yield x
    gen = <genexp>()
    next(gen)

This would pump the iterable up to just before the loop starts, evaluating
exactly as much as is evaluated outside the generator function in Py3.7.
This would result in it being possible to call ``gen.send()`` immediately,
unlike with most generators, and may incur unnecessary overhead in the
common case where the iterable is pumped immediately (perhaps as part of a
larger expression).

Frequently Raised Objections
============================

Why not just turn existing assignment into an expression?
---------------------------------------------------------

C and its derivatives define the ``=`` operator as an expression, rather than
a statement as is Python's way.  This allows assignments in more contexts,
including contexts where comparisons are more common.  The syntactic similarity
between ``if (x == y)`` and ``if (x = y)`` belies their drastically different
semantics.  Thus this proposal uses ``:=`` to clarify the distinction.

This could be used to create ugly code!
---------------------------------------

So can anything else.  This is a tool, and it is up to the programmer to use it
where it makes sense, and not use it where superior constructs can be used.

With assignment expressions, why bother with assignment statements?
-------------------------------------------------------------------

The two forms have different flexibilities.  The ``:=`` operator can be used
inside a larger expression; the ``=`` operator can be chained more
conveniently, and closely parallels the inline operations ``+=`` and friends.
The assignment statement is a clear declaration of intent: this value is to
be assigned to this target, and that's it.

Acknowledgements
================

The author wishes to thank Guido van Rossum and Nick Coghlan for their
considerable contributions to this proposal, and to members of the
core-mentorship mailing list for assistance with implementation.

References
==========

.. [1] Proof of concept / reference implementation
   (https://github.com/Rosuav/cpython/tree/assignment-expressions)

Copyright
=========

This document has been placed in the public domain.

..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: