[Tutor] Python vs. MATLAB

Steven D'Aprano steve at pearwood.info
Tue Dec 7 00:16:43 CET 2010


Joel Schwartz wrote:
> Chris,
> 
> Can you say more about number (7) in your list? What does "pass by value"
> mean and what are the alternatives?

Oh boy, is that a can of worms... and this is going to be a long post. 
You might want to go make yourself a coffee first :)

Pass by whatever (also written as "call by ...") is one of those 
frustrating topics where people have added vast amounts of confusion 
where no confusion need exist.

Take a variable, "x", and give it a value, say, 42. When you pass that 
variable to a function, func(x), what happens?

Such a simple question, but you wouldn't believe how many angry words 
have been written about it.

The problem is that there are a whole lot of answers to that question, 
used by many different programming languages, but most people are only 
familiar with *two*: pass by value, and pass by reference. This is 
particularly strange since most popular modern languages, like Ruby, 
Python and Java, don't use either of those! Nevertheless, people have 
got it in their head that there are only two calling conventions, and so 
they hammer the square peg of the language's actual behaviour until it 
will fit one or the other of the round holes in their mind.

So there are huge flame wars about whether Python is pass by value or 
pass by reference, with some people wrongly claiming that Python is 
p-b-v for "simple" objects like numbers and strings and p-b-r for 
"complicated" objected like lists. This is nonsense.

But that pales before the craziness of the Java community, that claims 
that Java is pass by value so long as you understand that that values 
being passed are references and not the value of the variable. But don't 
make the mistake of thinking that makes Java pass by reference! Pass by 
value-which-is-actually-a-reference is completely different from pass by 
reference. Only the Java people don't call it that, they just call it 
pass by value, even though Java's behaviour is different from pass by 
value in common languages like C, Pascal and Visual Basic.

How is this helpful? Even if *technically* true, for some definition of 
"reference" and "value", it is gobbledygook. It's as helpful as claiming 
that every language ever written, without exception, is actually pass by 
flipping bits. No values are actually passed anywhere, it's all just 
flipping bits in memory.

100% true, and 100% useless.

Translated into Python terms, the Java argument is this:

Take a variable, call it "x". When you say x = 42, the value of x is not 
actually the number 42, like naive non-Java programmers might think, but 
some invisible reference to 42. Then when you call function(x), what 
gets passed to the function is not 42 itself (what Pascal or C 
programmers call "pass by value"), nor is it a reference to the 
*variable* "x" (which would be "pass by reference"), but the invisible 
reference to 42. Since this is the "true" value of x, Java is pass by value.

(The situation is made more complicated because Java actually does pass 
ints like 5 by value, in the C or Pascal sense. The above description 
should be understood as referring to "boxed" integers, rather than 
unboxed. If this means nothing to you, be glad. All you need know is 
that in Java terms, all Python integers are boxed.)

And note that in the Ruby community, they call the exact same behaviour 
"pass by reference". And then folks wonder why people get confused.

All this because people insist on the false dichotomy that there are 
only two argument passing conventions, pass by value and pass by 
reference. But as this Wikipedia page shows, there are actually many 
more than that:

http://en.wikipedia.org/wiki/Evaluation_strategy


Let's go back to my earlier question. You have a variable x = 42, and 
you pass it to a function. What happens?

In Pascal, or C, the compiler keeps a table mapping variable names to 
fixed memory addresses, like this:

Variable  Address
========  =======
x         10234
y         10238
z         10242

The command "x = 42" stuffs the value 42 into memory address 10234. 
Then, when you call func(x), the compiler looks up memory address 10234 
and copies whatever it finds (in this case, 42) into another memory 
address (say, 27548), where func can see it. This is pass by value.

What this means is the the variable x *inside* the function is not the 
same as the variable x *outside* the function. Inside the function, x 
has the address 27548, and the command "x = x + 1" will store 43 there, 
leaving the outside x at 10234 unchanged. This is normally a good thing.

The classic test of pass by value is, does the value get copied when you 
pass it to a function? We can test Python to see if it copies values:

 >>> def func(arg):
...     print(id(arg))
...
 >>> x = 42
 >>> print(id(x))
135996112
 >>> func(x)
135996112

The local variable arg and the global variable x have the same ID, which 
means they are the same object. This is conclusive proof that Python 
does not make a copy of x to pass to the function. So Python is not pass 
by value.

Pass by value is nice and fast for values like 42, which are ints and 
therefore small. But what if x is (say) an array of a million numbers? 
Then the compiler has to copy all one million numbers, which is 
expensive, and your function will be slow.

One alternative is pass by reference: instead of copying 42 to memory 
address 27548 (which is where func looks), the compiler can pass a 
reference to the *variable* x. That's as simple as passing 10234 
instead. The compiler then treats that as the equivalent of "See 
here..." and follows that reference to get to the actual value wanted. 
Because addresses are small numbers, this is fast, but it means that the 
*local* variable and the *global* variable are, in fact, the same 
variable. This means that func can now operate on the variable x 
directly: if func uses call by reference, and func executes "x = x + 1", 
then the value 43 will be written into memory address 10234.

Pascal and Visual Basic (and Perl, I think) have compiler support for 
pass by reference. In C, you have to fake it by hand by passing a 
pointer to the value, and then doing your own re-direction. Except for 
arrays, which are handled differently, to the confusion of all.

The classic test of pass by reference is to write a "swap" function -- 
can you swap the value of two variables *without* returning them? In 
other words, something like this:

a = 1
b = 2
swap(a, b)
assert a == 2 and b == 1

In Python, you would swap two values like this:
a, b = b, a

but we want to do it inside a function. Doing this would be cheating:

a, b = swap(a, b)

because that explicitly re-assigns the variables a and b outside of the 
function. To be pass by reference, the swap must be done inside the 
function.

There's no way of writing a general purpose swap function like this in 
Python. You can write a limited version:

def swap():
     global a, b
     a, b = b, a

but that doesn't meet the conditions of the test: swap must take the 
variables to swap as arguments, and not hard-coded into the function.

Python is not pass by value, because it doesn't make a copy of the value 
before passing it to the function. And it's not pass by reference, 
because it doesn't pass a reference to the variable itself: assignment 
inside the function doesn't effect the outer variable, only the inner 
variable (except in the limited case that you use the global statement). 
So Python is neither pass by value nor pass by reference.

So what does Python actually do?

Well, to start with Python doesn't have variables in the C or Pascal 
sense. There is no table of variable:address available to the compiler. 
Python's model is of *name binding*, not fixed memory addresses. So 
Python keeps a global dictionary of names and *objects*:

{'x': <integer object 42>,
  'y': <string object 'hello world'>,
  'z': <list object [1,2,3]>,
}

The general name for this is "namespace".

(Aside: you can access the global namespace with the globals() function. 
Don't mess with it unless you know what you're doing.)

Functions have access to the global namespace, but they also get their 
own local namespace. You can access it with the locals() function.

(Aside: as an optimization, CPython doesn't use a real dictionary for 
locals. Consequently, the dict returned by locals() is a copy, not the 
real thing, and you can't modify local variables by messing with 
locals(). Other Pythons may do differently.)

So when you have this function:

def func(arg):
    # do stuff in here...

and then call func(x), Python initialises the function and creates a 
local namespace containing:

('arg': <integer object 42>}

No copy is made -- it is very fast to add the object to the namespace, 
regardless of how big or small the object is. (Implementation note: 
CPython does it by using pointers. Other Pythons may use different 
strategies, although it's hard to think of one which would be better.) 
So the local arg and the global x share the same value: 42. But if you 
do an assignment inside the function, say:

arg += 1

43 will be stored in the local namespace, leaving the global x untouched.

So far so good -- Python behaves like pass by value, when you assign to 
a local variable inside the function. But we've already seen it doesn't 
copy the value, so it isn't actually pass by value.

This is where it gets interesting, and leads to people mistakingly 
thinking that Python is sometimes pass by value and sometimes pass by 
reference. Suppose you call func(z) instead, where z is a list. This 
time the local namespace will be:

('arg': <list object [1,2,3]>}

Now, instead of assigning to the name arg, suppose we modify it like so:

arg.append(1)

Naturally the list object [1,2,3] becomes [1,2,3,1]. But since the local 
arg and the global x are the same object, and not copies, both the local 
and the global list see the same change. Naturally, since they are one 
and the same object!

So, for *immutable* objects that can't be modified in place, Python 
behaves superficially like pass by value (except it doesn't copy values) 
and for *mutable* objects that can be modified in place, Python behaves 
superficially like pass by reference (except you can't assign to the 
global variable, only the local). So Python's behaviour combines some 
behaviour of both pass by value and pass by reference, while being 
implemented differently from both.

This strategy has been known as "pass by sharing" or "pass by object 
sharing" since 1974, when it was invented by Barbara Liskov. It is the 
same as what Ruby calls "pass by reference" and Java calls "pass by 
value", to the confusion of all. There is no need for this confusion 
except for the stubborn insistence that there are only two argument 
passing strategies.



-- 
Steven



More information about the Tutor mailing list