why cannot assign to function call

Sat Feb 28 08:54:40 EST 2009

Ethan Furman <ethan at stoneleaf.us> writes:

> Mark Wooding wrote:
>> Here's what I think is the defining property of pass-by-value [...]:
>>
>>   The callee's parameters are /new variables/, initialized /as if by
>>   assignment/ from the values of caller's argument expressions.
>>
>> My soundbite definition for pass-by-reference is this:
>>
>>   The callee's parameters are merely /new names/ for the caller's
>>   argument variables -- as far as that makes sense.
>
> Greetings, Mark!

You posted to the whole group -- probably using a wide-reply option
while reading the mailing list.  Still, I'll give an answer here in
order to help any other readers of the list or newsgroup.  However, as
far as I'm concerned, this discussion is basically at an end and I'm not
really interested in starting it up all over again.  To that end, I've
set followups to `poster'.  I shall continue reply to private email
which seems interested in a sensible discussion.

> I was hoping you might be able to clarify those last two sound bites
> for me -- I think I understand what you are saying, but I'm confused
> about how they relate to Python...
>
> Specifically, how is a new name (pbr) different, in Python, from a new
> name initialized as if by assignment (pbv)?  It seems to me than you
> end up with the same thing in either case (in Python, at least),
> making the distinction non-existent.

You've missed a level of indirection.  In particular, `names' aren't
things that you initialize.  They're things that you /bind/ to
variables.  The crucial difference is that, in pass-by-value, new
variables are created as an intrinsic part of the process, whereas in
pass-by-reference, new variables are not (usually) created, and instead
the formal parameter names are bound to the caller's pre-existing
argument variables.

It's worth noting that I use the terms `name' and `binding' in different
ways from most of the Python community.  This is unfortunate.  The
discrepancy is actually because the Python meanings of these words are
not the same as the meanings in the wider computer science and
mathematical communities.  For example, many Python users seem to use
`binding' to mean `assignment', which is a shame because it leaves the
concept that is usually called `binding' without a name.  So I'll stick
with the wider meanings.

A while ago, I posted an essay -- to this group -- which may help
explain the concepts:

        Message-ID: <8763k14nc6.fsf.mdw at metalzone.distorted.org.uk>
        http://groups.google.com/group/comp.lang.python/msg/f6f1a321f819d02b

On with the question.

> def func(bar):
>     bar.pop()
>
> Pass-by-reference:
>   foo = ['Ethan','Furman']
>   func(foo)			# bar = foo
>
> Pass-by-value:
>   foo = ['Python','Rocks!']
>   func(foo)			# bar is new name for foo
> 				# is this any different from above?
>
> If I have this right, in both cases foo will be reduced to a
> single-item list after func.  

You're correct.  So: we can conclude that the above test is not
sufficient to distinguish the two cases.

> Any further explanation you care to provide will be greatly
> appreciated!

This test is sufficient to distinguish:

        def test(x):
          x = 'clobbered'

        y = 'virgin'
        test(y)
        print y

If it prints `virgin' then you have call-by-value.  If it prints
`clobbered' then you have call-by-reference.

Let's examine the two cases, as I did in the essay I cited above.  I'll
do call-by-value first.  First, we define a function `test'.  Then, we
initialize `y'.  It's worth examining this process in detail.  The name
`y' is initially unbound. so it is implicitly bound to a fresh
variable.  Then, (a reference to) the string object 'virgin' is stored
in this variable.  We can show this diagrammatically as follows.

        y (in global env.)  ====>  [VAR]  ---> 'virgin'

(In the diagrams, ===> denotes a binding relationship, between names and
variables; and ---> denotes a reference relationship, between variables
and values.)

Next, we call the `test' function.  Call-by-value says that we must
evaluate the argument expressions.  There's only one: `x'.  The value of
a name is obtained by (a) finding which variable is bound to the name,
and (b) extracting the value from this variable.  Well, the variable is
the one we just bound, and the value stored is (the reference to) the
string 'virgin'.  So the result of evaluating the argument expressions
is simply (the reference to) that string.

The function has one parameter, `y'.  A new environment is constructed
by extending the global environment.  In this new environment, the name
`y' is bound to a fresh variable -- distinct from all others, and
especially from the variable bound to `x' -- and in that variable we
store the value of the corresponding argument expression.  Result: the
function body is executed in an environment which is like the global
environment except that `y' is bound to a fresh variable containing
'virgin'.

        y (in global env.)  ====>  [VAR]  ---> 'virgin'
                                                  ^
                                                  |
        x (in function `test') ====> [VAR] -------'

Now there's an assignment

          x = 'clobbered'

The name `x' is already bound to a variable.  So we modify that variable
so that it stores (a reference to) the string 'clobbered'.

        y (in global env.)  ====>  [VAR]  ---> 'virgin'

        x (in function `test') ====> [VAR] ---> 'clobbered'

And then the function ends.  The environment we constructed is
forgotten.  The variable bound to `x' is lost forever, since it wasn't
bound to any other name.  Since modifying that variable was the only
action carried out in the function, and the variable is now lost, there
is no externally observable effect.  When we finally print `y', we see
`virgin', because the variable bound to `y' was unchanged.

        y (in global env.)  ====>  [VAR]  ---> 'virgin'

So much for call-by-value.  How about call-by-reference?  Well,
everything is the same until the actual call.  But then everything
changes.

Firstly, call-by-reference /doesn't/ evaluate the argument expressions.
Instead, we just note that `y' is bound to a particular variable in the
global environment.  The function has a single parameter `x'.  A new
environment is constructed by extending the global environment (again):
in this new environment, the name `x' is bound to /the same variable/
that `y' is bound to in the global environment.

        y (in global env.)  ====>  [VAR]  <===== x (in function `test')
                                     |
                                     |
                                     v
                                 'virgin'

Now we assign to `x'.  The detailed rules are the same: `x' is bound, so
we modify the variable it's bound to, so that it stores 'clobbered'.
But this time, the variable being clobbered is the /same/ variable that
`y' is bound to.  So finally, when we print `y', we see the string
'clobbered.

        y (in global env.)  ====>  [VAR]  <===== x (in function `test')
                                     |
                                     |
                                     v
                                'clobbered'

Now, let's look at your example.

> def func(bar):
>     bar.pop()
>
>   foo = ['Ethan','Furman']
>   func(foo)			# bar = foo

I won't describe this in the same excruciating detail as I did for the
one above; but I will make some observations.

In call-by-reference, the environment in which the body of `func' is
executed is constructed by extending the global environment with a
binding of the name `bar' to the same variable as is bound to `foo' in
the global environment.  There is only the one variable, so obviously
both names (considered as expressions) must evaluate to the same value.
I won't go into the details of method invocation, which in Python is
quite complicated; but `bar.pop()' mutates this value: it modifies it in
place.  So, when the function returns, `foo' can be seen to print
differently.  The value is, in some sense, the /same/ value as it was
before, in that it occupies the same storage locations, but the contents
of those storage locations has been altered.

              foo =====> [VAR] <===== bar
        (in global env)    |    (in function `func')   
                           |
                           v
                   ['Ethan', 'Furman']

In call-by-value, `func' executes in an environment constructed by
extending the global environment with a binding of `bar' to a /fresh/
variable.  This fresh variable is then initialized: we store the value
of the argument expression `foo' into it.  This value is a reference to
the list ['Ethan', 'Furman'].  (See, I've stopped parenthesizing the
reference stuff, because here it really matters.)  So, we have two
variables, bound to `foo' and `bar', but both variables refer to the
same list object.  The body of `func' doesn't modify any variables;
rather, it mutates the list object in place (as above).  Since both
variables refer to the /same/ list object, this mutation is still
observable outside of the function.

              foo =====> [VAR] ---> ['Ethan', 'Furman']
        (in global env)                     ^
                                            |
              bar =====> [VAR'] ------------'
        (in function `func')

So the reason that your example doesn't distinguish the two cases is
because, in both cases, there's still only one value, which is mutated.
But there is a conceptual difference, unobservable in this instance,
because call-by-reference has two names bound to the same variable,
while call-by-value has two names, bound to /distinct/ variables but
which both refer to the same value.

Finally, it may be instructive to remove the issue of function calling
altogether.  Consider:

        foo = ['Ethan', 'Furman']
        bar = foo
        bar.pop()

What is the final value of `foo'?

Now:

        x = 'virgin'
        y = x
        y = 'clobbered'

What is the final value of `x'?

Are you enlightened?

-- [mdw]