why cannot assign to function call

Tue Jan 6 22:36:50 EST 2009

Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:

> If I wanted a reference to a list, I'd expect to *dereference* the 
> reference to get to the list. That's not what Python forces you do to: 
> you just use the list as the list object itself.

That's odd.  No, you give a reference to the list to a function, and the
function messes with the list for you.

> This isn't hard people. Stop confusing the implementation details of
> how CPython works under the hood with Python level code.

I'm not confusing anything.  This is conceptual-model stuff, not
implementation details.  (Further discussion below.)

> In Python, [1, 2, 3] is a list, not a reference to a list. 

In [1]: l = [1, 2, 3]

In [2]: l[1] = l

In [3]: l
Out[3]: [1, <Recursion on list with id=3079286092>, 3]

Now, if you're right, then l directly contains itself and some other
stuff, which is obviously absurd: it's a list, not a Tardis.  If I'm
right, then l is bound to a reference to a list which contains three
references, two of them to integers and a third to the list itself.
This doesn't seem absurd at all any more.

Variables are not the only places where sharing can occur!  Explaining
this is much harder if you don't start from the idea that all you're
doing is carting uniformly shaped references about.

Another example:

In [1]: a = [1, 2, 3]

In [2]: b = (a, a)

In [3]: b
Out[3]: ([1, 2, 3], [1, 2, 3])

In [4]: a[1] = 0

In [5]: b
Out[5]: ([1, 0, 3], [1, 0, 3])

If b is a tuple containing two copies of a, then this shouldn't have
happened.  The only satisfactory explanation is that the tuple that b
refers to actually contains two references to the same list, so when I
mutate that list, the change shows up twice.

> When you pass [1, 2, 3] to a function, the function sees the list you
> passed it. The function doesn't see "a reference to a list", it sees a
> list:
> 
> >>> def func(x):
> ...     print type(x)
> ...
> >>> func([1, 2, 3])
> <type 'list'>

It sees the reference.  `type' sees the reference.  `type' digs the type
of the object out of the reference, and returns you a reference to the
type.

> It's so easy, some people refuse to believe it could be that easy, and 
> insist on complicating matters by bring the implementation details into 
> the discussion. Just stop, please. References belong in the 
> *implementation*, nothing to do with Python level code.

I'm not getting into implementation details.  I'm presenting a mental
model.  The `they're objects: they contain other objects' model is
invalidated when you create circular or shared structures, as I've shown
above.

There are other epicyclic explanations you could invent to explain
sharing, maybe -- like keeping lists of clones, and magically updating
all the clones whenever one us mutated.  That may even be a valid
implementation for a distributed Python (with cached copies of objects
and a cache-coherency protocol and all that), but it makes a rotten
mental model.  And it still doesn't explain circularity.

Internally, Tcl uses pointers to values in its implementation.  The
common currency inside the Tcl interpreter is a Tcl_Obj *.  (It used to
be a char *, before Tcl 8.)  So Tcl could easily offer the same
semantics as Python and friends.  But there's a twist.  Tcl does
copy-on-write.  It's just impossible to make a circular value in Tcl,
and sharing doesn't happen.  For example, here's a snippet of a tclsh
session.

% set l {a b c}
a b c
% lreplace $l 1 1 $l
a {a b c} c

Tcl really /can/ be explained without talking about references.  The
existence of Tcl_Obj, and its strange dual-ported nature (it contains a
string and an internal representation, and lazily updates one from the
other, and uses the string in order to allow changes of internal
representation as necessary) really is an implementation detail, and
it's possible to have a full understanding of the behaviour of Tcl
programs without knowing about it.

This is just impossible with Python.  Reference semantics pervade the
language.

> In Python code, there are no references and no dereferencing.

You're right!  But the concept is essential in understanding the
semantics of the language.  Even though no references are explicitly
made, and no dereferencing explicitly performed, these things are done
repeatedly under the covers -- and failure to understand that will lead
to confusion.

> > Python decided that all values are passed around as and manipulated
> > through references.
> 
> Oh really? Then why can't I write a Python function that does this?
> 
> x = 1
> y = 2
> swap(x, y)
> assert x == 2
> assert y == 1

Because the function is given references to the objects.  It's not given
references to /your/ references to those objects.  Therefore it can't
modify /your/ references, only its ones.

Pedantic answer:

def swap(hunoz, hukairz):
  global x, y
  x, y = y, x

(Maybe nonlocal for Python 3.)

> You can't, because Python doesn't have references. In a language with 
> references, that's easy. Here's an untested Pascal version for swap:

[snip]

That's call-by-reference, which is a different thing.  Python, like
Lisp, Scheme, Javascript, ML, Haskell, Lua, Smalltalk, Erlang, Prolog,
Java, and indeed C, does call-by-value exclusively.  This deserves to be
called out as a display:

  Python passes references by value.

By contrast, Pascal (sometimes) passes values by reference!

The terminology is admittedly confusing, because it comes from different
places.  If you don't like me talking about Python having references,
then pretend I've been saying `pointer' instead, all the way through.  I
think `pointer' and `reference' are synonymous in this context, attempts
by Stroustrup to confuse everybody notwithstanding.  But `pointer' has
unhelpful connotations:

  * pointers are more usually values rather than strange behind-the-
    scenes things, e.g., in C;

  * pointers, probably because of C, are associated with scariness and
    non-safe-ness; and

  * talking of pointers does seem like it's getting towards
    implementation details, and I wanted to avoid that.

Anyway, call-by-value means that the function gets given copies of the
caller's arguments; call-by-reference means that the function gets told
where the caller's arguments are.  In other words call-by-reference
introduces /another/ level of indirection between variables and values.

> > (There are no locatives: references are not values,
> 
> But references *are* locatives.

No!  A locative is a /reified/ reference -- a /value/ that /is/ a
reference.  That is a locative is a reference, but a reference need not
be a locative.  (A sheep is a mammal, but not all mammals are sheep.)

The Lisp Machine had real locatives.  You dereferenced them using CAR
and RPLACA -- most unpleasant.  In modern Lisp systems they seem to have
died a death, probably because making them work with fancy things like
resizing arrays when you don't have invisible pointers is too painful.

> No no no, lists and tuples store *objects*. Such storage happens to be 
> implemented as pointers in CPython, but that's an irrelevant detail at 
> the level of Python.

Uh-uh.  There's no way that an object can store itself and still have
room left over.  The idea is just crazy.  It can't possibly fit.  (Axiom
of foundation, if you're into that funky stuff.)  If you say `no, but it
can store a reference to itself', then everything makes sense.  It's
completely uniform, and not very scary.

Lisp people draw these box-and-pointer diagrams all over the place

   +-----+-----+
   |  *  |  *------> NIL
   +--|--+-----+
      |
      v
      3

to show how the data model works.  Of course, the Lisp data model is
/exactly the same as Python's/.

After a while, the diagrams get less pedantic, and tend to show values
stashed /in/ the cons cells.  In fact, this is actually closer to most
implementations for small things like fixnums, characters and flonums,
because they're represented by stashing the actual value with some
special not-really-a-pointer tag bits -- but the program can't tell, so
this doesn't need to be part of your mental model until you start
worrying about performance hacking and why your program is consing so
much.

-- [mdw]