[Python-Dev] The new and improved PEP 572, same great taste with 75% less complexity!

Nathaniel Smith njs at pobox.com
Wed Apr 25 02:55:16 EDT 2018


On Tue, Apr 24, 2018 at 8:31 AM, Chris Angelico <rosuav at gmail.com> wrote:
> The most notable change since last posting is that the assignment
> target is no longer as flexible as with the statement form of
> assignment, but is restricted to a simple name.
>
> Note that the reference implementation has not been updated.

I haven't read most of the discussion around this, so my apologies if
I say anything that's redundant. But since it seems like this might be
starting to converge, I just read through it for the first time, and
have a few comments.

First, though, let me say that this is a really nice document, and I
appreciate the incredible amount of work it takes to write something
like this and manage the discussions! Regardless of the final outcome
it's definitely a valuable contribution.

> Recommended use-cases
> =====================
>
> Simplifying list comprehensions
> -------------------------------
>
> A list comprehension can map and filter efficiently by capturing
> the condition::
>
>     results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
>
> Similarly, a subexpression can be reused within the main expression, by
> giving it a name on first use::
>
>     stuff = [[y := f(x), x/y] for x in range(5)]
>
>     # There are a number of less obvious ways to spell this in current
>     # versions of Python, such as:
>
>     # Inline helper function
>     stuff = [(lambda y: [y,x/y])(f(x)) for x in range(5)]
>
>     # Extra 'for' loop - potentially could be optimized internally
>     stuff = [[y, x/y] for x in range(5) for y in [f(x)]]
>
>     # Using a mutable cache object (various forms possible)
>     c = {}
>     stuff = [[c.update(y=f(x)) or c['y'], x/c['y']] for x in range(5)]
>
> In all cases, the name is local to the comprehension; like iteration variables,
> it cannot leak out into the surrounding context.

These examples all make me very nervous. The order of execution in
comprehensions is pretty confusing to start with (right to left,
except when it's left to right!). But usually this is fine, because
comprehensions are mostly used in a functional/declative-ish style,
where the exact order doesn't matter. (As befits their functional
language heritage.) But := is a side-effecting operator, so when you
start using it here, I suddenly have to become extremely aware of the
exact order of execution.

Concretely, I find it unnerving that two of these work, and one doesn't:

# assignment on the right of usage
results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]

# assignment on the left of usage
stuff = [[y := f(x), x/y] for x in range(5)]

# assignment on the right of usage
stuff = [[x/y, y := f(x)] for x in range(5)]

I guess this isn't limited to comprehensions either – I rarely see
complex expressions with side-effects embedded in the middle, so I'm
actually a bit hazy about the exact order of operations inside Python
expressions. I could probably figure it out if necessary, but even in
simple cases like f(g(), h()), then do you really know for certain off
the top of your head whether g() or h() runs first? Does the average
user? With code like f(a := g(), h(a)) this suddenly matters a lot!
But comprehensions suffer from a particularly extreme version of this,
so it worries me that they're being put forward as the first set of
motivating examples.

> Capturing condition values
> --------------------------

Note to Chris: your examples in this section have gotten their order
scrambled; you'll want to fix that :-). And I'm going to reorder them
yet again in my reply...

>     # Reading socket data until an empty string is returned
>     while data := sock.read():
>         print("Received data:", data)

I don't find this example very convincing. If it were written:

    for data in iter(sock.read, b""):
        ...

then that would make it clearer what's happening ("oh right, sock.read
uses b"" as a sentinel to indicate EOF). And the fact that this is
needed at all is only because sockets are a low-level API with lots of
complexity inherited from BSD sockets. If this were a normal python
API, it'd just be

    for data in sock:
        ...

(Hmm, I guess the original example is actually wrong because it should
be sock.recv, and recv takes a mandatory argument. To be fair, adding
that argument would also make the iter() version uglier, and that
argument explains why we can't support 'for data in sock'. But this is
still consistent with my argument that working directly with sockets
is always going to be a bit awkward... I don't think bits of sugar
like this are going to make any substantive difference to how easy it
is read or write raw socket code.)

>     # Proposed syntax
>     while (command := input("> ")) != "quit":
>         print("You entered:", command)
>
>     # Equivalent in current Python, not caring about function return value
>     while input("> ") != "quit":
>         print("You entered a command.")
>
>     # To capture the return value in current Python demands a four-line
>     # loop header.
>     while True:
>         command = input("> ");
>         if command == "quit":
>             break
>         print("You entered:", command)
>
> Particularly with the ``while`` loop, this can remove the need to have an
> infinite loop, an assignment, and a condition. It also creates a smooth
> parallel between a loop which simply uses a function call as its condition,
> and one which uses that as its condition but also uses the actual value.

I dare you to describe that first version in English :-). I would say:
"it reads the next line of input and stores it in 'command'; then
checks if it was 'quit', and if so it exits the loop; otherwise, it
prints the command". What I find interesting is that the English
clauses exactly match the statements in the "expanded" version; that
feels about right. Jamming three clauses into one line (that you have
to read from the inside out!) feels really cramped.

(Plus in a real version of this you'd have some command line parsing
to do – at least stripping off whitespace from the command, probably
tokenizing it somehow –  before you could check what the command was,
and then you're back to the final version anyway.)

>     # Capturing regular expression match objects
>     # See, for instance, Lib/pydoc.py, which uses a multiline spelling
>     # of this effect
>     if match := re.search(pat, text):
>         print("Found:", match.group(0))

Now this is a genuinely compelling example! re match objects are
always awkward to work with. But this feels like a very big hammer to
make re.match easier to use :-). I wonder if there's anything more
focused we could do here?

> Special-casing conditional statements
> -------------------------------------
>
> One of the most popular use-cases is ``if`` and ``while`` statements.  Instead
> of a more general solution, this proposal enhances the syntax of these two
> statements to add a means of capturing the compared value::
>
>     if re.search(pat, text) as match:
>         print("Found:", match.group(0))
>
> This works beautifully if and ONLY if the desired condition is based on the
> truthiness of the captured value.  It is thus effective for specific
> use-cases (regex matches, socket reads that return `''` when done), and
> completely useless in more complicated cases (eg where the condition is
> ``f(x) < 0`` and you want to capture the value of ``f(x)``).  It also has
> no benefit to list comprehensions.
>
> Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction
> of possible use-cases, even in ``if``/``while`` statements.

It does only cover a fraction of possible use-cases, but
interestingly, the fraction it covers includes:

- two of the three real examples given in the rationale section
- exactly the cases that *don't* force you to twist your brain in
pretzels thinking about sequential side-effecting control flow in the
middle of expressions.

However, I do think it'd be kinda confusing if we had:

if EXPR as X:
while EXPR as X:
with EXPR as X:

and the first two assign the value of EXPR to X, while the last one
does something more subtle. Or maybe it'd be fine?

FWIW,
-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Python-Dev mailing list