[Python-Dev] Generator objects and list comprehensions?

Nathaniel Smith njs at pobox.com
Wed Jan 25 18:53:08 EST 2017


On Jan 25, 2017 8:16 AM, "Joe Jevnik" <jjevnik at quantopian.com> wrote:

List, set, and dict comprehensions compile like:

# input
result = [expression for lhs in iterator_expression]

# output
def comprehension(iterator):
    out = []
    for lhs in iterator:
        out.append(expression)
    return out

result = comprehension(iter(iterator_expression))

When you make `expression` a `yield` the compiler thinks that
`comprehension` is a generator function instead of a normal function.

We can manually translate the following comprehension:

result = [(yield n) for n in (0, 1)]

def comprehension(iterator):
    out = []
    for n in iterator:
        # (yield n) as an expression is the value sent into this generator
        out.append((yield n))
    return out

result = comprehension(iter((0, 1)))


We can see this in the behavior of `send` on the resulting generator:

In [1]: g = [(yield n) for n in (0, 1)]

In [2]: next(g)
Out[2]: 0

In [3]: g.send('hello')
Out[3]: 1

In [4]: g.send('world')
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-4-82c589e0c27d> in <module>()
----> 1 g.send('world')

StopIteration: ['hello', 'world']

The `return out` gets translated into `raise StopIteration(out)` because
the code is a generator.

The bytecode for this looks like:

In [5]: %%dis
   ...: [(yield n) for n in (0, 1)]
   ...:
<module>
--------
  1           0 LOAD_CONST               0 (<code object <listcomp> at
0x7f4bae68eed0, file "<show>", line 1>)
              3 LOAD_CONST               1 ('<listcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_CONST               5 ((0, 1))
             12 GET_ITER
             13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             16 POP_TOP
             17 LOAD_CONST               4 (None)
             20 RETURN_VALUE

<module>.<listcomp>
-------------------
  1           0 BUILD_LIST               0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                13 (to 22)
              9 STORE_FAST               1 (n)
             12 LOAD_FAST                1 (n)
             15 YIELD_VALUE
             16 LIST_APPEND              2
             19 JUMP_ABSOLUTE            6
        >>   22 RETURN_VALUE


In `<module>` you can see us create the <listcomp> function, call `iter` on
`(0, 1)`, and then call `<listcomp>(iter(0, 1))`. In `<listcomp>` you can
see the loop like we had above, but unlike a normal comprehension we have a
`YIELD_VALUE` (from the `(yield n)`) in the comprehension.

The reason that this is different in Python 2 is that list comprehension,
and only list comprehensions, compile inline. Instead of creating a new
function for the loop, it is done in the scope of the comprehension. For
example:

# input
result = [expression for lhs in iterator_expression]

# output
result = []
for lhs in iterator_expression:
    result.append(lhs)

This is why `lhs` bleeds out of the comprehension. In Python 2, adding
making `expression` a `yield` causes the _calling_ function to become a
generator:

def f():
    [(yield n) for n in range(3)]
    # no return

# py2
>>> type(f())
generator

# py3
>> type(f())
NoneType

In Python 2 the following will even raise a syntax error:

In [5]: def f():
   ...:     return [(yield n) for n in range(3)]
   ...:
   ...:
  File "<ipython-input-5-3602a9999f46>", line 2
    return [(yield n) for n in range(3)]
SyntaxError: 'return' with argument inside generator


Generator expressions are a little different because the compilation
already includes a `yield`.

# input
(expression for lhs in iterator_expression)

# output
def genexpr(iterator):
    for lhs in iterator:
        yield expression

You can actually abuse this to write a cute `flatten` function:

`flatten = lambda seq: (None for sub in seq if (yield from sub) and False)`

because it translates to:

def genexpr(seq):
    for sub in seq:
        # (yield from sub) as an expression returns the last sent in value
        if (yield from sub) and False:
            # we never enter this branch
            yield None


That was a long way to explain what the problem was. I think that that
solution is to stop using `yield` in comprehensions because it is
confusing, or to make `yield` in a comprehension a syntax error.


Another option would be to restore the py2 behavior by inserting an
implicit 'yield from' whenever the embedded expression contains a yield. So
in your example where

result = [(yield n) for n in (0, 1)]

currently gets expanded to

result = comprehension(iter((0, 1)))

it could notice that the expanded function 'comprehension' is a generator,
and emit code like this instead:

result = yield from comprehension(iter((0, 1)))

At least, this would work for list/dict/set comprehensions; not so much for
generator expressions. I assume this is basically how 'await' in
comprehensions works in 3.6.

I'm not sure this is actually a good idea, given the potential for
ambiguity and backwards compatibility considerations -- I'm kind of leaning
towards the deprecate/error option on balance :-). But it's an option that
would probably be less confusing than the status quo, and would make it
easier to write the kind of straddling code the original poster was trying
to write.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170125/08fc26c0/attachment.html>


More information about the Python-Dev mailing list