[Python-Dev] An issue recently brought up in patch #872326(generator expression)

Guido van Rossum guido at python.org
Tue Mar 23 14:02:42 EST 2004


> Not sure that's it.  In some sense it's also arbitrary that Python
> decides in
> 
>     def f(x=DEFAULT_X_VALUE):
>         ...
> 
> to capture the binding of DEFAULT_X_VALUE (which is a free vrbl so
> far as the defn of f is concerned) at the time f is defined and
> reuse it on each call; it would have been possible to define the
> language to use whatever binding is current each time f is called.

Either choice is just as arbitrary though.

> Outside of explicitly burying something inside a lambda body,
> there's no precendent in Python for "delaying" evaluation of
> anything in a Python expression either.  So generator expressions
> aren't like any non-lambda expressions in Python either: the time at
> which their guts get evaluated can be arbitrarily far removed from
> the time the statement holding the guts gets executed.

And that's what makes me feel uncomfortable with the binding
capturing.  Another thing that makes me uncomfortable:

When you write

    gens = []
    for var in range(10):
        gens.append(x+var for x in range(10))

the value of var is captured in each generator expression; but when
you write

    gens = []
    for self.var in range(10):
        gens.append(x+self.var for x in range(10))

(yes that is valid syntax!) the value if self is captured, and all
generator expressions will generate the same sequence.  If you object
to the "for self.var in" syntax, I'm sure I can come up with another
example -- the point is that we're not capturing *values*, we're
capturing *bindings*.  But since in most simple examples bindings and
values are synonymous (since in most simple examples all values are
immutable -- we like to use numbers and strings in examples for
simplicity), the difference may elude most folks until they're caught
in the trap -- just as with using a mutable object as a class variable
or default value.

OTOH it's clear that actually capturing values would make things
worse.

All this makes me lean towards getting rid of the binding capture
feature.  That way everybody will get bitten by the late binding fair
and square the first time they try it.

> >> now we have some iterators being treated more equally
> >> than others.
> >>
> >> I'm getting an "architecture smell" here. Something is wrong
> >> somewhere, and I don't think we're tinkering in the right place to
> >> fix it properly.
> 
> When scopes and lifetimes get intricate, it's easier to say what you
> mean in Scheme (which is a good argument for not letting scopes and
> lifetimes get intricate <wink>).
> 
> [Guido]
> > I'm not disagreeing -- I was originally vehemently against the
> > idea of capturing free variables, but Tim gave an overwhelming
> > argument that whenever it made a difference that was the desired
> > semantics.
> 
> Na, and that's a big part of the problem we're having here: I didn't
> make an argument, I just trotted out non-insane examples.  They all
> turned out to suck under "delay evaluation" semantics, to such an
> extent that wholly delayed evaluation appeared to be a downright
> foolish choice.  But decisions driven *purely* by use cases can be
> mysterious.  I was hoping to see many more examples, on the chance
> that a clarifying principle would reveal itself.

Maybe your examples were insane after all. :-)

I expect that 99.9% of all use cases for generator expressions use up
all of the generated values before there's a chance of rebinding any
of the variables, like the prototypical example:

    sum(x**2 for x in range(10))

I don't recall Tim's examples, but I find it hard to believe that
there are a lot of use cases for building lists (or other containers)
containing a bunch of generator expressions that are then used up at
some later point, where the author isn't sophisticated enough to deal
with the late binding by explicitly inserting a lambda with a default
variable binding to capture the one variable in need of capture.

Which reminds me, under binding-capturing semantics, most bindings
will be captured unnecessarily -- typically there's only a single
variable whose capture makes a difference, but we can't define a rule
that determines which variable this would be without adding syntax.
(Using the loop control variables of all containing for loops would be
one of the saner rules to try, but even this will sometimes capture
too much and sometimes not enough -- and the compiler can't know in
general what the lifetime of the generator expression will be.)

In summary, I'm strongly leaning towards not capturing *any* bindings,
Tim's examples be damned.

> > But I always assumed that the toplevel iterable would be different.
> 
> Jiwon's example certainly suggests that it must be.  But why <0.7 wink>?

Because (as someone already explained) it's independent of the iteration.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list