Jeremy Hylton : weblog : 2004-02-04

Problems with nested scopes and lazy evaluation

Wednesday, February 04, 2004

It's a strain to combine the new generator expressions feature and nested scopes. The standard scoping rules are inconvenient for the lazy evaluation of generator expressions, but it's very strange for a list comprehension and an apparently identical generator expression to produce different results.

There is a lively discussion in the Sourceforge patch tracker concerning generator expressions. The crux of the matter is the difference between generator expressions and list comprehensions:

>>> lst = [lambda x:x+y for y in range(3)]
>>> for f in lst:print f(0)

>>> gen = (lambda x:x+y for y in range(3))
>>> for f in gen:print f(0)

The target of the generator expression is evaluated in a copy of the environment that the gen expr was defined in. The nested target scope captures the values of names at definition time rather than run time. It will be a wart if these two expressions yield different sequences.

Armin Rigo says he's ready to rant about problems with nested scopes. "As far as I'm concerned it is yet another reason why free variable bindings ("nested scopes") are done wrong in Python currently :-(" I wonder what the other reasons are?

In the generator expressions discussion, the question has been phrased in terms of late binding versus early binding, where late binding means the standard namespace rules and early binding means copying the value of a binding when the function is defined. (Early binding works like default argument values.)

In the absence of rebinding of free variables, I don't think there are any cases where early or late binding makes a difference for plain nested functions. It would be unusual for a nested function to depend on changes to a free variable, because changes could only occur while the block containing the actual binding was being executed. On the other hand, Guido is now inclined to allow rebinding of free variables . I think it would make a big difference then.

Scheme doesn't seem to have the same problems with its namespaces. I suspect that lexical scoping and side-effects have problematic interactions. The functional style typical of Scheme avoids those problems in many cases. Or, set! stands out in a way that "for x in range(3)" doesn't. (More generally, I wonder how other strict languages integrate lazy features -- imperative languages in particular.)

It looks like delayed evaluation interacts very poorly with side effects. The examples discussed on the list involved a free variable that in a generator expression that is rebound after the gen expr is created. I think all of the examples involve for loops, where the target of the for loop is used in the target expression. This kind of code will always be delicate.

On python-dev, Tim Peters has posted the only real world examples of generator expressions. He notes:

Whenever I've written a list-of-generators, or in the recent example a generator pipeline, I have found it semantically necessary, without exception so far, to capture the bindings of the variables whose bindings wouldn't otherwise be invariant across the life of the generator. [If] it turns out that this is always, or nearly almost always, the case, across future examples too, then it would just be goofy not to implement generator expressions that way ("well, yes, the implementation does do a wrong thing in every example we had, but what you're not seeing is that the explanation would have been a line longer had the implementation done a useful thing instead" <wink>).

The example Tim posted was about a pipeline of predicates on a text stream. He can use generator expressions to create a lazy filter that yield a token only if each predicate returns True. It is delicate code; the first few times I read it, it was hard to understand. I'd feel better about generator expressions if the examples were easier to read, but maybe that will just come with time.

    pipe = source
    for p in predicates:
        # add a filter over the current pipe, and call that the new pipe
        pipe = e for e in pipe if p(e)

How would you re-write the example to use the late binding approach that I advocated on python-dev earlier?

    def extend(pipe, pred):
        return e for e in pipe if pred(e)

    pipe = source
    for p in predicates:
        pipe = extend(pipe, p)

This seems a clearer, because it avoids any free variables in the generator expression. It is a few more lines of code.

A related complaint, first voiced by David Beazley, is that the target of the for loop in a list comprehension should not be visible outside of the list comp expression. If the for target happens to have the same name as a local variable, you get a conflict. It's relatively easy mistake to make, because the list comprehension is just an expression. Expressions don't bind names, right? It even looks distinct, since it's wrapped in brackets.

Guido has recently changed his mind about the issue and agrees with David that the list comprehension name bindings should not be visible to the enclosing function. It should be have as if the list comprehension were defined in a nested function.


L = [x for x in range(3)]
would be equivalent to
def listcomp():
    return [x for x in range(3)]

L = listcomp()

The implementation could just use renaming and avoid the expense of a function call, although producing useful variable names in error messages would be tricky.

It's also, strictly speaking, not compatible with the current definition, which allows odd expressions like "[x for x in x]" where x is a local variable defined earlier in the code block. It could be made to work with a special renaming approach, but the semantics would sound odd: The namespace of list comp targets is separate from the namespace of list comp generator expressions.