[Python-ideas] free variables in generator expressions

Arnaud Delobelle arno at marooned.org.uk
Sun Dec 16 00:31:19 CET 2007


On 15 Dec 2007, at 22:41, Jan Kanis wrote:

> On Thu, 13 Dec 2007 08:08:53 +0100, Arnaud Delobelle <arno at marooned.org.uk 
> > wrote:
>
>>
>> On 12 Dec 2007, at 23:41, Georg Brandl wrote:
>>
>>> Arnaud Delobelle schrieb:
>>>
>>>> Let's test this (python 2.5):
>>>>
>>>> >>> A = '12'
>>>> >>> B = 'ab'
>>>> >>> gen = (x + y for x in A for y in B)
>>>> >>> A = '34'
>>>> >>> B = 'cd'
>>>> >>> list(gen)
>>>> ['1c', '1d', '2c', '2d']
>>>>
>>>> So in the generator expression, A is remains bound to the string  
>>>> '12'
>>>> but B gets rebound to 'cd'.  This may make the implementation of
>>>> generator expressions more straighforward, but from the point of  
>>>> view
>>>> of a user of the language it seems rather arbitrary. What makes A  
>>>> so
>>>> special as opposed to B?  Ok it belongs to the outermost loop, but
>>>> conceptually in the example above there is no outermost loop.
>>>
>>> Well, B might depend on A so it can't be evaluated in the outer
>>> context
>>> at the time the genexp "function" is called. It has to be evaluated
>>> inside the "function".
>>
>> You're right. I expressed myself badly: I was not talking about
>> evaluation but binding.  I was saying that if the name A is bound to
>> the object that A is bound to when the generator expression is
>> created, then the same should happen with B.
>>
>
> I think what Georg meant was this (I intended to reply this to your  
> earlier mail of Thursday AM, but Georg beat me to it):
>
> The reason for not binding B when the genexp is defined is so you  
> can do this:
>
> >>> A = [[1, 2], [3, 4]]
> >>> gen = (x for b in A for x in b)
> >>> list(gen)
> [1, 2, 3, 4]
>
> Here, b can't be bound to something at generator definition time  
> because the 'something' may not exist yet. (It does actually in this  
> example, but you get the point.) So, only the first (outer loop)  
> iterable is bound immediately.
>

In your example, b is not free of course.

> Whether a variable is rebound within the expression could of course  
> be decided at compile time, so all free variables could be bound  
> immediately. I think that would be an improvement, but it requires  
> the compiler to be a bit smarter.
>
This is what I was advocating.  As it is decided at compile time
which variables are free, it may only be a small extra step to
add a bit of code saying that they must be bound at the creation
of the generator expression.  Or, to continue with the _genexp
function mentioned in previous posts, for:

(f(x) for b in a for x in b)

to be translated as

def _genexp(f, A):
     for b in A:
         for x in b:
             yield f(x)

as A and f are free but not b and x.  Then

gen = (f(x) for b in A for x in b)

would be translated as

gen = _genexp(f, A)

I imagine this wouldn't be too hard, but I am not familiar with
the specifics of python code compilation...
Moreover this behaviour ('freezing' all free variables at the
creation of the generator expression) is well defined and easy
to reason on I think.  I haven't yet had the time to see how
generator expressions are created, but I'd like to have a look,
although I suspect I will have to learn a lot more besides in
order to understand it.

[...]
> And, while I'm writing this:
>
> On Thu, 13 Dec 2007 00:01:42 +0100, Arnaud Delobelle <arno at marooned.org.uk 
> > wrote:
>> l = [f(x, y) for x in A for y in B(x) if g(x, y)]
>> g = [f(x, y) for x in A for y in B(x) if g(x, y)]
>> <code, maybe binding A, B, f, g to new objects>
>> assert list(g) == l
>
> I suppose this should have been
>
> g = (f(x, y) for x in A for y in B(x) if g(x, y)

Yes!  Sorry about that.  In fact, I should also have called the
generator expression something else than 'g' as it is already
the name of a function (g(x, y)) :|

-- 
Arnaud





More information about the Python-ideas mailing list