Generator question

Ian Kelly ian.g.kelly at gmail.com
Wed Mar 13 21:44:18 EDT 2019


You're basically running into this:
https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result

To see why, let's try disassembling your function. I'm using Python 3.5
here, but it shouldn't make much of a difference.

py> import dis
py> dis.dis(flat_genexp_cat_prod)
  2           0 BUILD_LIST               0
              3 BUILD_LIST               1
              6 STORE_FAST               1 (solutions)

  3           9 SETUP_LOOP              39 (to 51)
             12 LOAD_FAST                0 (lists)
             15 GET_ITER
        >>   16 FOR_ITER                31 (to 50)
             19 STORE_DEREF              0 (a_list)

  4          22 LOAD_CLOSURE             0 (a_list)
             25 BUILD_TUPLE              1
             28 LOAD_CONST               1 (<code object <genexpr> at
0x73f31571ac90, file "<stdin>", line 4>)
             31 LOAD_CONST               2
('flat_genexp_cat_prod.<locals>.<genexpr>')
             34 MAKE_CLOSURE             0
             37 LOAD_FAST                1 (solutions)
             40 GET_ITER
             41 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             44 STORE_FAST               1 (solutions)
             47 JUMP_ABSOLUTE           16
        >>   50 POP_BLOCK

  5     >>   51 LOAD_FAST                1 (solutions)
             54 RETURN_VALUE

Now, take a look at the difference between the instruction at address 22
and the one at address 37:

  4          22 LOAD_CLOSURE             0 (a_list)
             37 LOAD_FAST                1 (solutions)

The value of solutions is passed directly to the generator as an argument,
which is the reason why building the generator up iteratively like this
works at all: although the nested generators are evaluated lazily, each new
generator that is constructed contains as its input a reference to the
previous generator.

By contrast, the value of a_list is a closure. The contents of the closure
are just whatever the value of a_list is when the generator gets evaluated,
not when the generator was created. Since the entire nested generated
structure is evaluated lazily, it doesn't get evaluated until list() is
called after the function has returned. The value of the a_list closure at
that point is the last value that was assigned to it: the list [5, 6] from
the last iteration of the for loop. This same list value then gets used for
all three nested generators.

So now why do solutions and a_list get treated differently like this? To
answer this, look at this paragraph about generator expressions from the
language reference:

"""
Variables used in the generator expression are evaluated lazily when the
__next__() method is called for the generator object (in the same fashion
as normal generators). However, the iterable expression in the leftmost for
clause is immediately evaluated, so that an error produced by it will be
emitted at the point where the generator expression is defined, rather than
at the point where the first value is retrieved. Subsequent for clauses and
any filter condition in the leftmost for clause cannot be evaluated in the
enclosing scope as they may depend on the values obtained from the leftmost
iterable. For example: (x*y for x in range(10) for y in range(x, x+10)).
"""

So, it's simply because the iterable expression in the leftmost for clause
is treated differently from every other value in the generator expression.

On Wed, Mar 13, 2019 at 3:49 PM Pierre Reinbold <preinbold at gmx.net> wrote:

> Dear all,
>
> I want to implement a function computing the Cartesian product if the
> elements
> of a list of lists, but using generator expressions. I know that it is
> already
> available in itertools but it is for the sake of understanding how things
> work.
>
> I already have a working recursive version, and I'm quite sure that this
> iterative version used to work (at least in some Python2.X) :
>
> def flat_genexp_cat_prod(lists):
>     solutions = [[]]
>     for a_list in lists:
>         solutions = (part_sol+[el] for part_sol in solutions for el in
> a_list)
>     return solutions
>
> But, with Python3.7.2, all I got is this :
>
> >>> list(flat_genexp_cat_prod([[1, 2], [3, 4], [5, 6]]))
> [[5, 5, 5], [5, 5, 6], [5, 6, 5], [5, 6, 6], [6, 5, 5], [6, 5, 6], [6, 6,
> 5],
> [6, 6, 6]]
>
> instead of
>
> >>> list(flat_genexp_cat_prod([[1, 2], [3, 4], [5, 6]]))
> [[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6], [2, 3, 5], [2, 3, 6], [2, 4,
> 5],
> [2, 4, 6]]
>
> Using a list comprehension instead of a generator expression solves the
> problem,
> but I can't understand why the version above fails.
>
> Even stranger, when debugging I tried to use itertools.tee to duplicate the
> solutions generators and have a look at them :
>
> def flat_genexp_cat_prod(lists):
>     solutions = [[]]
>     for a_list in lists:
>         solutions, debug = tee(
>                 part_sol+[el] for part_sol in solutions for el in a_list)
>         print("DEBUG", list(debug))
>     return solutions
>
> And, that version seems to work!
>
> >>> list(flat_genexp_cat_prod([[1, 2], [3, 4], [5, 6]]))
> DEBUG [[1], [2]]
> DEBUG [[1, 3], [1, 4], [2, 3], [2, 4]]
> DEBUG [[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6], [2, 3, 5], [2, 3, 6],
> [2, 4,
> 5], [2, 4, 6]]
> [[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6], [2, 3, 5], [2, 3, 6], [2, 4,
> 5],
> [2, 4, 6]]
>
> Can you help me understand what I'm doing wrong ?
>
> Thank you by advance,
>
>
> πr
> --
> https://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list