[Python-ideas] How assignment should work with generators?

Mon Nov 27 08:55:05 EST 2017

On Mon, Nov 27, 2017 at 12:17:31PM +0300, Kirill Balunov wrote:
> Currently during assignment, when target list is a comma-separated list of
> targets (*without "starred" target*) the rule is that the object (rhs) must
> be an iterable with the same number of items as there are targets in the
> target list. That is, no check is performed on the number of targets
> present, and if something goes wrong the  ValueError is raised.

That's a misleading description: ValueError is raised when the number of 
targets is different from the number of items. I consider that to be 
performing a check on the number of targets.

> To show this on simple example:
> 
> >>> from itertools import count, islice
> >>> it = count()
> >>> x, y = it
> >>> it
> count(3)

For everyone else who was confused by this, as I was, that's not 
actually a copy and paste from the REPL. There should be a ValueError 
raised after the x, y assignment. As given, it is confusing because it 
looks like the assignment succeeded, when in fact it didn't.

> Here the count was advanced two times but assignment did not happen.

Correct, because there was an exception raised.

> I found that in some cases it is too much restricting that rhs must 
> have the same number of items as targets. It is proposed that if the 
> rhs is a generator or an iterator (better some object that yields 
> values on demand), the assignmenet should be lazy and dependent on the 
> number of targets.

I think that's problematic. How do you know what objects that yields 
values on demand? Not all lazy iterables are iterators: there are also 
lazy sequences like range.

But even if we decide on a simple rule like "iterator unpacking depends 
on the number of targets, all other iterables don't",  I think that will 
be a bug magnet. It will mean that you can't rely on this special 
behaviour unless you surround each call with a type check:

if isinstance(it, collections.abc.Iterator):
    # special case for iterators
    x, y = it
else:
    # sequences keep the old behaviour
    x, y = it[:2]

> I find this feature to be very convenient for 
> interactive use,

There are many things which would be convenient for interactive use that 
are a bad idea outside of the interactive environment. Errors which pass 
silently are one of them. Unpacking a sequence of 3 items into 2 
assignment targets should be an error, unless you explicitly limit it to 
only two items.

Sure, sometimes it would be convenient to unpack just two items out of 
some arbitrarily large iterator just be writing `x, y = it`. But 
other times that would be an error, even in the interactive interpreter.

I don't want Python trying to *guess* whether I want to unpack the 
entire iteratable or just two items. Whatever tiny convenience there is 
from when Python guesses correctly will be outweighed by the nuisance 
value of when it guesses wrongly.

> while it remains readable, expected, and expressed in a more compact 
> code.

I don't think it is expected behaviour. It is different from the current 
behaviour, so it will be surprising to everyone used to the current 
behaviour, annoying to those who like the current behaviour, and a 
general inconvenience to those writing code that runs under multiple 
versions of Python.

Personally, I would not expect this suggested behaviour. I would be very 
surprised, and annoyed, if a simple instruction like:

x, y = some_iterable

behaved differently for iterators and sequences.

> There are some Pros:
>     1. No overhead

No overhead compared to what?

>     2. Readable and not so verbose code
>     3. Optimized case for x,y,*z = iterator

The semantics of that are already set: the first two items are assigned 
to x and y, with all subsequent items assigned to z as a list. How will 
this change optimize this case? It still needs to run through the 
iterator to generate the list.

>     4. Clear way to assign values partially from infinite generators.

It isn't clear at all. If I have a non-generator lazy sequence like:

# Toy example
class EvenNumbers:
    def __getitem__(self, i):
        return 2*i

it = EvenNumbers()  # A lazy, infinite sequence

then `x, y = it` will keep the current behaviour and raise an exception 
(since it isn't an iterator), but `x, y = iter(it)` will use the new 
behaviour.

So in general, when I'm reading code and I see:

x, y = some_iterable

I have very little idea of which behaviour will apply. Will it be the 
special iterator behaviour that stops at two items, or the current 
sequence behaviour that raises if there are more than two items?

> Cons:
>     1. A special case of how assignment works
>     2. As with any implicit behavior, hard-to-find bugs

Right. Hard-to-find bugs beats any amount of convenience in the 
interactive interpreter. To use an analogy:

"Sure, sometimes my car suddenly shifts into reverse while I'm driving 
at 60 kph, sometimes the engine falls out when I go around the corner, 
and occasionally the brakes catch fire, but gosh the cup holder makes it 
really convenient to drink coffee while I'm stopped at traffic lights!"

> There several cases with "undefined" behavior:
> 1. Because the items are assigned, from left to right to the corresponding
> targets, should rhs see side effects during assignment or not?

I don't understand what you mean by this. Surely the behaviour should be 
exactly the same as if you wrote:

x, y = islice(it, 2)

What would you do differently, and why?

> 2. Should this work only for generators or for any iterators?

I don't understand why you are even considering singling out *only* 
generators. A generator is a particular implementation of an iterator. I 
can write:

def gen():
   yield 1; yield 2; yield 3

it = gen()

or I can write:

it = iter([1, 2, 3])

and the behaviour of `it` should be identical.

> 3. Is it Pythonic to distinguish what is on the rhs during assignment, or
> it contradicts with duck typing (goose typing)?

I don't understand this question.

> In many cases it is possible to do this right now, but in too verbose way:
> 
> >>> x, y = islice(gen(), 2)

I don't think that is excessively verbose.

But maybe we should consider allowing slice notation on arbitrary 
iterators:

x, y = it[:2]

I have not thought this through in any serious detail, but it seems to 
me that if the only problem here is the inconvenience of using islice(), 
we could add slicing to iterators. I think that would be better than 
having iterators and other iterables behave differently.

Perhaps a better idea might be special syntax to tell the interpreter 
you don't want to run the right-hand side to completion. "Explicit is 
better than implicit" -- maybe something special like:

x, y, * = iterable

will attempt to extract exactly two items from iterable, without 
advancing past the second item. And it could work the same for 
sequences, iterators, lazy sequences like range, and any other iterable.

I don't love having yet another meaning for * but that would be better 
than changing the standard behaviour of iterator unpacking.

-- 
Steve