[Tutor] lambda in a loop

Thu Nov 17 01:56:21 CET 2005

> The original solution does use a closure. The problem is that variables
> are not bound into a closure until the scope of the variable exits. That
> is why using a separate factory function works - the closure is bound
> when the factory function exits which happens each time through the
> loop. In the case of closures in a loop the closure is not bound until
> exiting the scope containing the loop and all the closures are bound to
> the same value.

[Warning: really subtle concepts ahead.  You should probably skip this if
you're a newcomer to programming, because this is not really going to make
sense at all.  *grin*

This message is also really long.  Sorry, but I haven't figured out how to
talk about this concisely yet.]

Hi Kent,

There's some confusion here.

People are making an artificial distinction between the "closure" values
built by lambda vs the function values built by 'def'.  They're the same
kind of thing.

######
>>> commands = []
>>> def sayNumber(n):
...     print n
...
>>> for i in range(5):
...     commands.append((lambda v: lambda: sayNumber(v))(i))
...
>>>
>>> for c in commands:
...     c()
...
0
1
2
3
4
######

Kent's explanation here:

> In the case of closures in a loop the closure is not bound until exiting
> the scope containing the loop and all the closures are bound to the same
> value.

makes it sounds like closures somehow twiddle their thumbs and wait till
things go out of scope before closing on their environment.  This is not
what is happening.  They capture the environment as soon as they're
constructed.

The issue, then, isn't "when" closures are constructed: it's "what":
what's in the environment when we make a closure?

Just to make sure we all have the same conceptual model: Python's toplevel
environment can be seen as thing that attaches names to values --- a
namespace.  Let's dive into that model and make sure we understand how it
works.

[Side note: the following is adapted from material in SICP,
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-21.html#%_sec_3.2]

We can see the keys of that namespace by doing dir():

######
>>> dir()
['__builtins__', '__doc__', '__name__']
######

Just to be able to talk about things, let's give a name to the global
namespace as: "G".

Whenever we call a function, we build a new environment that's chained up
to the one we're in at the time of function construction.  This
corresponds to what people's ideas of the "stack frame" is.

######
>>> def x():
...     print dir()
...
>>> x()
[]
######

When we're calling x(), let's give a name to that environmental frame as
"X".  (I'm sorry for doing the uppercase-lowercase thing, but it makes it
easier for me to remember what environment goes with what function!)

"X" is empty, but that doesn't prevent our x function from calling global
stuff, because "X" is chained up to "G".  Let me use some funky notation
to draw the state of the environments:

    [X | ]   ---->   [G | __builtins__=...]

When we try to access a name in X, if the lookup fails, the hunt will
continue down the chain of environments toward the global environment G.

The above diagram using an ad-hoc bracket/pipe notation to try to visually
what an environment frame might be.  It tries to make it clear that G has
bindings to things like __builtins__, and that the frames can be chained
up together.

Let's try a slightly different example:

######
>>> def y():
...     someNumber = 42
...     print dir()
...
>>> y()
['someNumber']
######

y() was also created at toplevel, so whenever we call y(), we'll create a
new environment "Y" that get's chained up to the toplevel environment "G".
We see that assigning local variables adds bindings to "Y", so at the end
of calling y(), right before we return, our world looks like this:

    [Y | someNumber=42]    ---->    [G | __builtins__=...]

Let's take a look at something that touches on what people think of as a
closure:

######
>>> def z():
...     someNumber = 5
...     def inner():
...         print someNumber
...     return inner
...
######

When we create z(), we're at the toplevel environment G again, so whenever
Z gets called, it'll make a new environment frame whose parent is "G".

Ok, let's call z():

######
>>> value = z()
######

When we call z(), we create a fresh new environment frame "Z" that's
attached to G.  our environment looks like:

    [Z |] ----> [G | __builtins__=...]

We then add the someNumber binding to "Z".  Our environment now looks
like:

    [Z | someNumber=5] ----> [G | __builtins__=...]

Then we hit the 'def inner():  ...' call: that's a function construction.
When we define inner(), that function will remember it's origin
environment:

    [Z | someNumber=5] ----> [G | __builtins__=...]

And whenever we call that inner() function, it'll make a fresh environment
I attached to that origin environment.  We captures that function in
'value', so let's call that now from the toplevel:

######
>>> value()
5
######

When we call value(), as it starts up, it remembers its origin.  It builds
a new environment frame which we'll call "I", and chains it up to that
origin.  Our world will look like this:

    [I |] ---->  [Z | someNumber=5]  ---->  [G | __builtins__=...]

When we try to print someNumber, the reason that we can see someNumber is
because, although that name binding doesn't exist in "I", it does exist in
Z.  So the lookup of someNumber succeeds, and things run happily.

Let's make the example a little different:

######
>>> counter = 0
>>> def make_f():
...     def inner():
...         print counter
...     return inner
...
>>> f1 = make_f()
>>> counter = counter + 1
>>> f2 = make_f()
>>> counter = counter + 1
>>> f3 = make_f()
######

We've extended our global environment G with a new name 'counter'.

    [G | counter=0, __builtins__=...]

When we call make_f() three times, each call builds a unique environment
which we'll call M1, M2, and M3.

    f1's origin environment is: [M1 |] --> G
    f2's origin environment is: [M2 |] --> G
    f3's origin environment is: [M3 |] --> G

f1, f2, and f3 are functions whose environments all get their 'counter'
out of "G", so if we make a change to that, as we do with:

    counter = counter + 1

that will be seen by all of the functions:

######
>>> f1()
2
######

And it's this aliasing behavior that started this whole conversation.

Finally, let's cover what happens with:

######
>>> def make_f(counter):
...     def inner():
...         print counter
...     return inner
######

When we define make_f(), it's attached to our global environment 'G' as
before:

    make_f()'s origin environment is: [G |counter=2, __builtins__=...]

Let's call make_f() once.

######
>>> f1 = make_f(7)
######

When we call make_f() here, we construct an environment which we'll call
N:

    [N | counter=7] ---> [G | counter=2, __builtins__=...]

We hit the definition of inner, so it creates a function value whose
origin environment is this, and f1 is bound to that function value.  So
we'll say that:

    f1's origin environment is:
        [N | counter=7] ---> [G | counter=2, __builtins__=...]

We call f1().

######
>>> f1()
7
######

What happened here?  When we called f1() here, we first constructed an
environment that we'll call O.

    [O |] ---> [N | counter=7] ---> [G | counter=2, __builtins__=...]

Any name lookups that we do while we're calling f1() will follow this
general chain.  Since 'counter' is bound in N, that's how we get seven,
and not two, even though 'counter' is in the global environment frame.

Ok, done.  Whew.  The devil's in the details, and these are the details
that implement: "Functions remember their environments."  *grin*

Best of wishes!