Explanation of list reference

Sat Feb 15 00:36:12 EST 2014

On Fri, 14 Feb 2014 12:31:56 -0600, Ryan Gonzalez wrote:

> On 02/14/2014 12:08 PM, dave em wrote:
>> Hello,
>>
>> Background:  My twelve y/o son and I are still working our way through
>> Invent Your Own Computer Games with Python, 2nd Edition. (We finished
>> the Khan Academy Javascript Tutorials is the extent of our experience)
>>
>> He is asking a question I am having trouble answering which is how a
>> variable containing a value differs from a variable containing a list
>> or more specifically a list reference.
>>
>> I tried the to explain as best I can remember is that a variable is
>> assigned to a specific memory location with a value inside of it. 
>> Therefore, the variable is kind of self contained and if you change the
>> variable, you change the value in that specific memory location.
>>
>> However, when a variable contains a list reference, the memory location
>> of the variable points to a separate memory location that stores the
>> list.  It is also possible to have multiple variable that point to the
>> memory location of the list reference.  And all of those variable can
>> act upon the list reference.
>>
>> Question:  Is my explanation correct?  If not please set me straight :)
>>
>> And does anyone have an easier to digest explanation?
>>
>> Thanks in advance,
>> Dave
> 
> You've got it backwards. In Python, /everything/ is a reference.

What's a reference?

How is the value 23 a reference? What is it a reference to?

> The
> variable is just a "pointer" to the actual value. When you change a
> variable, you're just changing the memory location it points to.

What do memory locations have to do with Python code? When I execute 
Python code in my head, perhaps using a pencil and paper, or build a 
quantum computer (or analog clockwork device) to execute Python code, 
where are the memory locations?

I think you are conflating the *implementation* of Python's virtual 
machine in a C-like language written for a digital computer with the 
*defined behaviour* of the Python virtual machine. If you think about the 
Python execution model, there is almost nothing about memory locations in 
it. The only exception I can think of is the id() function, which uses 
the memory address of the object as the ID, and even that is *explicitly* 
described as an implementation detail and not a language feature. And in 
fact Jython and IronPython assign IDs to objects consecutively from 1, 
and PyPy has to go through heroic and complicated measures to ensure that 
objects have the same ID at all times.

Thinking about the implementation of Python as written for certain types 
of digital computing devices can be useful, but we must be very careful 
to avoid mixing up details at the underlying C (or Java, or Haskell, or 
Lisp, or ...) layer with questions about the Python execution model.

As soon as you mention "pointers", you're in trouble, because Python has 
no pointers. There is nothing in Python that will give you a pointer to 
an object, or dereference a pointer to get the object at the other end. 
Pointers in the sense of C or Pascal pointers to memory addresses simply 
don't have any existence in Python. Python compilers can even be written 
in languages like Java that don't have pointers. The fact that the C 
implementation of Python uses pointers internally is not very 
interesting, any more than the fact that a Python implementation running 
on a Turing Machine would use a pencil and eraser that can draw marks on 
a very long paper tape.

> Strings, ints, tuples, and floats behave differently because they're
> /immutable/. That means that they CANNOT modify themselves. That's why
> all of the string methods return a new string. It also means that, when
> you pass one two a function, a /copy/ of it is made and passed instead.

Yes, strings etc. are immutable, but no, they are not copied when you 
pass them to a function. We can be sure of this for two reasons:

(1) We can check the id() of the string from the inside and the outside 
of the function, and see that they are the same; and 

(2) We can create a HUGE string, hundreds of megabytes, and pass it to 
dozens of functions, and see no performance slowdown. It might take a 
second or five to build the initial string, and microseconds or less to 
pass it to function after function after function.

> So, back to the original subject. Everything is a reference. 

To really under stand Python's behaviour, we need to see that there are 
two kinds of entities, names and values. Or another way to put it, 
references and objects. Or another way to put it, there's actually only 
one kind of thing in Python, that is, everything in Python is an object, 
but Python *code* can refer to objects indirectly by names and other 
references. Names aren't "things", but the things that names refer to are 
things.

Objects have a clear definition in the Python world: they are an entity 
that has a type (e.g. a string), a set of behaviour (methods), and a 
value ("Hello World").

References can be names like `mystring`, or list items `mylist[0]`, or 
items in mappings `mydict["key"]`, or attributes `myobject.attr`, or even 
expressions `x+y*(1-z)`. References themselves aren't "things" as such 
(although in Python, *names* are implemented as string keys in 
namespaces), but a way to indirectly refer to things (values, objects).

> When you do this:
> 
> x = [1,2,3]
> x = [4,5,6]
> 
> x now points to a different memory location. 

Memory locations are irrelevant. Objects may not even have a single, well-
defined memory location. (If you think this is impossible, you're 
focusing too much on a single computer architecture.) They might use some 
sort of copy-on-write mechanism so that objects don't even exist until 
you modify them. Who knows?

Instead, we should say that x now refers to a different object.

An analogy, the name "President of the United States" stopped referring 
to George Bush Jr and started referring to Barack Obama a few years back, 
but the "objects" (people) have an existence separate from the name used 
to refer to them.

(By the way, I try to avoid using the term "points to" if I can, since it 
has connotations to those familiar with C which don't apply to Python.)

> And, when you do this:
> 
> x[0] =99000
> x[0] =100
> 
> you're just changing the memory location that |x[0]| points to.

Again, I'd say that x[0] now refers to a different object.

-- 
Steven