[Tutor] Python vs. MATLAB
Steven D'Aprano
steve at pearwood.info
Tue Dec 7 00:16:43 CET 2010
Joel Schwartz wrote:
> Chris,
>
> Can you say more about number (7) in your list? What does "pass by value"
> mean and what are the alternatives?
Oh boy, is that a can of worms... and this is going to be a long post.
You might want to go make yourself a coffee first :)
Pass by whatever (also written as "call by ...") is one of those
frustrating topics where people have added vast amounts of confusion
where no confusion need exist.
Take a variable, "x", and give it a value, say, 42. When you pass that
variable to a function, func(x), what happens?
Such a simple question, but you wouldn't believe how many angry words
have been written about it.
The problem is that there are a whole lot of answers to that question,
used by many different programming languages, but most people are only
familiar with *two*: pass by value, and pass by reference. This is
particularly strange since most popular modern languages, like Ruby,
Python and Java, don't use either of those! Nevertheless, people have
got it in their head that there are only two calling conventions, and so
they hammer the square peg of the language's actual behaviour until it
will fit one or the other of the round holes in their mind.
So there are huge flame wars about whether Python is pass by value or
pass by reference, with some people wrongly claiming that Python is
p-b-v for "simple" objects like numbers and strings and p-b-r for
"complicated" objected like lists. This is nonsense.
But that pales before the craziness of the Java community, that claims
that Java is pass by value so long as you understand that that values
being passed are references and not the value of the variable. But don't
make the mistake of thinking that makes Java pass by reference! Pass by
value-which-is-actually-a-reference is completely different from pass by
reference. Only the Java people don't call it that, they just call it
pass by value, even though Java's behaviour is different from pass by
value in common languages like C, Pascal and Visual Basic.
How is this helpful? Even if *technically* true, for some definition of
"reference" and "value", it is gobbledygook. It's as helpful as claiming
that every language ever written, without exception, is actually pass by
flipping bits. No values are actually passed anywhere, it's all just
flipping bits in memory.
100% true, and 100% useless.
Translated into Python terms, the Java argument is this:
Take a variable, call it "x". When you say x = 42, the value of x is not
actually the number 42, like naive non-Java programmers might think, but
some invisible reference to 42. Then when you call function(x), what
gets passed to the function is not 42 itself (what Pascal or C
programmers call "pass by value"), nor is it a reference to the
*variable* "x" (which would be "pass by reference"), but the invisible
reference to 42. Since this is the "true" value of x, Java is pass by value.
(The situation is made more complicated because Java actually does pass
ints like 5 by value, in the C or Pascal sense. The above description
should be understood as referring to "boxed" integers, rather than
unboxed. If this means nothing to you, be glad. All you need know is
that in Java terms, all Python integers are boxed.)
And note that in the Ruby community, they call the exact same behaviour
"pass by reference". And then folks wonder why people get confused.
All this because people insist on the false dichotomy that there are
only two argument passing conventions, pass by value and pass by
reference. But as this Wikipedia page shows, there are actually many
more than that:
http://en.wikipedia.org/wiki/Evaluation_strategy
Let's go back to my earlier question. You have a variable x = 42, and
you pass it to a function. What happens?
In Pascal, or C, the compiler keeps a table mapping variable names to
fixed memory addresses, like this:
Variable Address
======== =======
x 10234
y 10238
z 10242
The command "x = 42" stuffs the value 42 into memory address 10234.
Then, when you call func(x), the compiler looks up memory address 10234
and copies whatever it finds (in this case, 42) into another memory
address (say, 27548), where func can see it. This is pass by value.
What this means is the the variable x *inside* the function is not the
same as the variable x *outside* the function. Inside the function, x
has the address 27548, and the command "x = x + 1" will store 43 there,
leaving the outside x at 10234 unchanged. This is normally a good thing.
The classic test of pass by value is, does the value get copied when you
pass it to a function? We can test Python to see if it copies values:
>>> def func(arg):
... print(id(arg))
...
>>> x = 42
>>> print(id(x))
135996112
>>> func(x)
135996112
The local variable arg and the global variable x have the same ID, which
means they are the same object. This is conclusive proof that Python
does not make a copy of x to pass to the function. So Python is not pass
by value.
Pass by value is nice and fast for values like 42, which are ints and
therefore small. But what if x is (say) an array of a million numbers?
Then the compiler has to copy all one million numbers, which is
expensive, and your function will be slow.
One alternative is pass by reference: instead of copying 42 to memory
address 27548 (which is where func looks), the compiler can pass a
reference to the *variable* x. That's as simple as passing 10234
instead. The compiler then treats that as the equivalent of "See
here..." and follows that reference to get to the actual value wanted.
Because addresses are small numbers, this is fast, but it means that the
*local* variable and the *global* variable are, in fact, the same
variable. This means that func can now operate on the variable x
directly: if func uses call by reference, and func executes "x = x + 1",
then the value 43 will be written into memory address 10234.
Pascal and Visual Basic (and Perl, I think) have compiler support for
pass by reference. In C, you have to fake it by hand by passing a
pointer to the value, and then doing your own re-direction. Except for
arrays, which are handled differently, to the confusion of all.
The classic test of pass by reference is to write a "swap" function --
can you swap the value of two variables *without* returning them? In
other words, something like this:
a = 1
b = 2
swap(a, b)
assert a == 2 and b == 1
In Python, you would swap two values like this:
a, b = b, a
but we want to do it inside a function. Doing this would be cheating:
a, b = swap(a, b)
because that explicitly re-assigns the variables a and b outside of the
function. To be pass by reference, the swap must be done inside the
function.
There's no way of writing a general purpose swap function like this in
Python. You can write a limited version:
def swap():
global a, b
a, b = b, a
but that doesn't meet the conditions of the test: swap must take the
variables to swap as arguments, and not hard-coded into the function.
Python is not pass by value, because it doesn't make a copy of the value
before passing it to the function. And it's not pass by reference,
because it doesn't pass a reference to the variable itself: assignment
inside the function doesn't effect the outer variable, only the inner
variable (except in the limited case that you use the global statement).
So Python is neither pass by value nor pass by reference.
So what does Python actually do?
Well, to start with Python doesn't have variables in the C or Pascal
sense. There is no table of variable:address available to the compiler.
Python's model is of *name binding*, not fixed memory addresses. So
Python keeps a global dictionary of names and *objects*:
{'x': <integer object 42>,
'y': <string object 'hello world'>,
'z': <list object [1,2,3]>,
}
The general name for this is "namespace".
(Aside: you can access the global namespace with the globals() function.
Don't mess with it unless you know what you're doing.)
Functions have access to the global namespace, but they also get their
own local namespace. You can access it with the locals() function.
(Aside: as an optimization, CPython doesn't use a real dictionary for
locals. Consequently, the dict returned by locals() is a copy, not the
real thing, and you can't modify local variables by messing with
locals(). Other Pythons may do differently.)
So when you have this function:
def func(arg):
# do stuff in here...
and then call func(x), Python initialises the function and creates a
local namespace containing:
('arg': <integer object 42>}
No copy is made -- it is very fast to add the object to the namespace,
regardless of how big or small the object is. (Implementation note:
CPython does it by using pointers. Other Pythons may use different
strategies, although it's hard to think of one which would be better.)
So the local arg and the global x share the same value: 42. But if you
do an assignment inside the function, say:
arg += 1
43 will be stored in the local namespace, leaving the global x untouched.
So far so good -- Python behaves like pass by value, when you assign to
a local variable inside the function. But we've already seen it doesn't
copy the value, so it isn't actually pass by value.
This is where it gets interesting, and leads to people mistakingly
thinking that Python is sometimes pass by value and sometimes pass by
reference. Suppose you call func(z) instead, where z is a list. This
time the local namespace will be:
('arg': <list object [1,2,3]>}
Now, instead of assigning to the name arg, suppose we modify it like so:
arg.append(1)
Naturally the list object [1,2,3] becomes [1,2,3,1]. But since the local
arg and the global x are the same object, and not copies, both the local
and the global list see the same change. Naturally, since they are one
and the same object!
So, for *immutable* objects that can't be modified in place, Python
behaves superficially like pass by value (except it doesn't copy values)
and for *mutable* objects that can be modified in place, Python behaves
superficially like pass by reference (except you can't assign to the
global variable, only the local). So Python's behaviour combines some
behaviour of both pass by value and pass by reference, while being
implemented differently from both.
This strategy has been known as "pass by sharing" or "pass by object
sharing" since 1974, when it was invented by Barbara Liskov. It is the
same as what Ruby calls "pass by reference" and Java calls "pass by
value", to the confusion of all. There is no need for this confusion
except for the stubborn insistence that there are only two argument
passing strategies.
--
Steven
More information about the Tutor
mailing list