Finding the instance reference of an object [long and probably boring]

Fri Nov 7 10:48:19 EST 2008

On Nov 6, 2008, at 10:35 PM, Steve Holden wrote:

> That's good to hear. Your arguments are sometimes pretty good, and
> usually well made, but there's been far too much insistence on all  
> sides
> about being right and not enough on reaching agreement about how
> Python's well-defined semantics for assignment and function calling
> should best be described.
>
> In other words, it's a classic communication problem.

That's a fair point.  I'll try to do better.

> I must say I find it strange when people try to contradict my  
> assertion
> that Python names are references to objects, when the (no pun  
> intended)
> reference implementation of the language uses "reference counting" to
> track how many assignments have been made.

I agree.  It seems like we should be able to take that as a given.

> So any argument that the language "doesn't have the concept of object
> reference (in the sense of e.g. C++ reference)" is simply stating the
> obvious: that Python has no way to declare reference variables. I  
> would
> argue myself that it has no need of such a mechanism precisely because
> names are object references, and I'd like to hear counter-arguments.

Right.  I think of it this way: every variable is an object reference;  
no special syntax needed for it because that's the only type of  
variable there is.  (Just as with Java or .NET, when dealing with any  
class type; Python is just a little more extreme in that even simple  
things like numbers are wrapped in objects.)

Note: I tried to say "name" above instead of "variable" but I couldn't  
bring myself to do it -- "name" seems to generic to do that job.  Lots  
of things have names that are not variables: modules have names,  
classes have names, methods have names, and so do variables.  If I say  
"name," an astute listener would reasonably say "name of what" -- and  
I don't want to have to say "name of some thing in a name space which  
can be flexibly associated with an object" when the simple term  
"variable" seems to work as well.

> Well that's not true either. If I remember all the way back to my
> computational science degree I seem to remember being taught that  
> there
> was call by *simple reference*, which is what I understand you to  
> mean.
> Suppose I write the following on some not-quite-Python language:
>
> lst = ['one', 'two', 'three']
>
> index = 1
>
> def foo(item, i):
>   i = 2
>   item = "ouch"
>
> foo(lst[index], index)
> ...
> With call by simple reference, after the call I would expect the
> following conditions to be true:
>
> index == 2
> lst == ['one', 'ouch', 'three']

Yes, I guess so, though it would require that lst[index] evaluate to  
an lvalue to which the 'item' parameter could be an alias.  (With the  
second parameter, 'i', the situation is more straightforward because  
you're passing in a simple variable rather than a more complex  
expression.)

> With full call by reference, however, arguably the change to the value
> of index would induce the post-conditions
>
> index == 2
> lst == ['one', 'two', 'ouch']
>
> because the reference made by the first argument depends on the  
> value of
> a variable mutated inside the function call.

I confess that I've never heard of "call by simple reference" or "call  
by full reference" before.  What you're describing in the second case  
sounds more like call by name to me.

But I think we can agree that neither of these behaviors describes  
Python.

>> Why the resistance to these simple and basic terms that apply to  
>> any OOP
>> language?
>>
> Ideally I'd like to see this discussion concluded without resorting to
> democratic appeals. Otherwise, after all, we should all eat shit:  
> sixty
> billion flies can't possibly be wrong.

I think I could make a good argument that the nutritional needs of  
flies are different from those of humans. On the other hand, what  
argument is there that the Python community should use its own unique  
terminology for concepts that apply equally well to other languages?   
Wouldn't communication be easier and smoother if we adopted standard  
terms for standard behavior?

>> What does "give a new name to an object" mean?  I submit that it  
>> means
>> exactly the same thing as "assigns the name to refer to the object".
>
> I normally internalize "x = 3" as meaning "store a reference to the
> object 3 in the slot named x", and when I see "x" in an expression I
> understand it to be a reference to some object, and that the value  
> will
> be used after dereferencing has taken place.

Works for me.

> I've seen various descriptions of Python's name binding behavior in
> terms of attaching Port-It notes bearing names to the objects  
> reference
> by the names, and I have never found them convincing. The reason for
> this is that names live in namespaces, whereas values live in some  
> other
> universe altogether (that I normally describe as "object space" to
> beginners, though this is not a term you will come across in the  
> python
> literature).

Agreed.  That model implies that all names are global, and completely  
fails to explain how one object might be named "x" and a completely  
different object might also be "x" (albeit in a different namespace).   
I suppose your post-its could be color-coded by namespace, and then  
you could add additional warts and caveats and addendums to explain  
recursion, or explain why you don't have to search all objects in  
existence to find the right one every time a name is dereferenced, but  
the whole thing seems like a house of cards to me.

> So I see the Post-it as being attached to a portion of some
> namespace, and that little fixed-size piece of object space being
> attached by a piece of string to a specific object. Of course any  
> object
> can have many piece of string attached, and not all of them come from
> names -- some of them come from container elements, for example.

Right.

>> There certainly is no difference in behavior that anyone has been  
>> able
>> to point out between what assignment does in Python, and what  
>> assignment
>> does in RB, VB.NET, Java, or C++ (in the context of object  
>> pointers, of
>> course).  If the behavior is the same, why should we make up our own
>> unique and different terminology for it?
>>
> One reason would be that in the other languages you have other choices
> as well, so you need to distinguish between them. Python is simpler,  
> and
> so I don't see us needing the terminological complexity required in  
> the
> other contexts you name, for a start.

OK, that's a fair argument, and I do suspect this is a big part of it  
-- when your language clearly supports passing object references and  
other types by-ref and by-val, and you can easily demonstrate the  
difference, then there is little temptation to claim that it doesn't  
do either one.  But if your language supports only one of these, and  
you have no choices about it and can't (within the language itself)  
compare and contrast that one against another, then it is easy to make  
all sorts of claims about what that one is.

But getting back to your point: is the standard terminology really  
more complex than whatever else we can come up with?

> Java messed up the whole deal by
> having different kinds of objects as a sacrifice to run-time speed,
> thereby breeding a whole generation of programmers with little clue
> about these matters, and the .NET environment also has to resort to
> "boxing" and "unboxing" from time to time. I say away with comparisons
> to such horrendously complex issues. One of the reasons for Python's
> continue march towards world domination (allow me my fantasies) is its
> consistent simplicity. Those last two words would be my candidate for
> the definition of "Pythonicity".

I'm with you there.  To me, the consistent simplicity is exactly this:  
all variables are object references, and these are always passed by  
value.

>>> - the parameters of a function are local names for the call  
>>> arguments
>>
>> Agreed; they're not aliases of the call arguments.
>>
> They are actually names local to the function namespace, containing
> references to the arguments. Some of those arguments were provided as
> names, in which case the local name contains a copy of the reference
> bound to the name provided as an argument. This is, however, merely a
> degenerate case of the general instance, in which an expression is
> provided as an argument and evaluated, yielding (a reference to) an
> object which is then bound to the parameter name in the local  
> namespace.

Quite right.

>>>   (I guess 'pass by object' is a good name).
>>
>> Well, I'm not sure why that would be.  What you've just described is
>> called "pass by value" in every other language.
>>
> Sigh. This surely can only be true if you insist that references are
> themselves values. I hold that they are not.

Here's an example of the above, I guess.  In a language that supports  
integers and doubles as simple types, stored directly in a variable,  
then it is an obvious generalization that in the case of an object  
type, the value is a reference to an object.  (Then you can  
"dereference" such a value to get to the values stored within the  
object.)  It is the only simple and consistent description of such a  
language (which includes Java, RB, and .NET, as well as C++ if you  
consider an object pointer equivalent to a reference in more modern  
languages.)

But Python doesn't have those simple types, so there is a temptation  
to try to skip this generalization and say that references are not  
values, but rather the values are the objects themselves (despite the  
dereferencing step that is still required to get any data out of  
them).  Well, and of course in the case of immutable objects, there is  
very little observable difference between references and values.

However, it seems to me that when you start denying that the value of  
an object reference is a reference to an object, this is when you get  
led into a quagmire of contradictions.  Perhaps I'm wrong and I just  
haven't explored that path far enough, because it appears dark and  
cobwebby to my eyes.  I will try to give it a chance.

> It seems so transparent to me that the parameters are copies of the  
> references passed as arguments
> I find it difficult to understand how, or why, anyone would  
> conceptualize it differently.

Now you seem to be saying the same thing I've been saying all along.   
But this really is called "pass by value" in at least RB, VB.NET, and  
Java.  And that makes sense to me.

> OK, so above you argue quite cogently that Python uses a reference- 
> passing mechanism.

Yes, of course.

> This make you insistence in the preceding paragraph on calling it  
> "pass by value" a little stubborn.

Why?  Are you really meaning to insist that the RB/VB.NET example:

   Function GetAgeInDogYears(ByVal whom As Person) As Integer
     return whom.age * 7
   End Function

is not actually using a by-value parameter?  Or that it's not passing  
an object reference?

> Sigh again. You appear to want to have your cake and eat it. You  
> are, if
> effect, saying "there are no values in Python, only references",
> completely ignoring the fact that it is semantically impossible to  
> have
> a reference without having something to *refer to*

Well of course.  I'm pretty sure I've said repeatedly that Python  
variables refer to objects on the heap.  (Please replace "heap" with  
"object space" if you prefer.)  I'm only saying that Python variables  
don't contain any other type of value than references -- no integers  
or doubles, for example.  This is unlike the other languages under  
discussion (and may be at the root of the confusion).

> (which we in the Python world, in our usual sloppy way, often call  
> "a value").

Yes, and as long as we're agreed that this is only a sloppy shorthand,  
I'm OK with it (especially in the case of immutable objects, where the  
distinction is irrelevant).

> I suspect this may be at the root of our equally stubborn insistence
> that calling this mechanism "pass by value" is inviting
> misunderstanding. If we didn't want to eliminate misunderstanding we
> would all have stopped replying to you long ago.

Ditto right back at you.  :)  So maybe here's the trouble: since all  
Python variables are references, there is no need to distinguish  
reference types from any other types (there aren't any other types).   
So, with the distinction gone, there is a strong temptation to gloss  
over the fact that they are references at all, and try to say that the  
variables directly contain their objects.

But it seems to me that this claim quickly breaks down -- even as you  
said yourself; you need instead some mental model that shows the  
variables as pointing to (tied to via strings, associated via a lookup  
table, or whatever) the objects, which exist in object space.  In  
other words, they're references.

But continuing to attempt to gloss over that fact, when you come to  
parameter passing, you're then stuck trying to avoid describing it as  
call by value, since if you claim that what a variable contains is the  
object itself, then that doesn't fit (since clearly the object itself  
is not copied).  You also have to describe the assignment operator as  
different from all other languages, since clearly that's not copying  
the object either.

So you end up in this (to me, very strange) state where you're making  
up new terms to describe the parameter behavior, and the assignment  
behavior, which behavior is exactly the same as any other modern OOP  
language.  It makes it (again, IMHO) all seem very much more complex  
and mysterious than it really is.  And this all results inevitably  
from trying to gloss over the fact that Python variables are references.

So, while I'm trying this path on for size (and will continue to mull  
it over further), please try on this approach: boldly admit that  
they're references, and embrace that fact.  An assignment copies the  
RHS reference into the LHS variable, nothing more or less.  A  
parameter copies the argument reference into the formal parameter,  
nothing more or less.  And all this is exactly the same as in any  
other OOP language the reader is likely to know.  Isn't that simple,  
clear, and far easier to explain?

> Well, I started with Simula and SmallTalk back in 1973, so my  
> experience
> may be a bit light. Sorry about that. This terminology wasn't made  
> up by
> Python beginners, but by the people who invented Python.

Was it?  Has our BDFL weighed in on this terminology issue anywhere?   
So far, the only "official" words I've found related to this  
discussion are the ones plainly admitting that Python uses references  
(which some in this thread seem to want to deny, though not you Steve).

> I believe they did so on the grounds that it's easier for beginners  
> to understand
> Python's semantics without having to reference too many similar in
> theory but confusingly different in practice other environments.

I wonder if that could be tested systematically.  Perhaps we could  
round up 20 newbies, divide them into two groups of 10, give each one  
a 1-page explanation either based on passing object references by- 
value, or passing values sort-of-kind-of-by-reference, and then check  
their comprehension by predicting the output of some code snippets.   
That'd be very interesting.  It's hard for me to believe that the  
glossing-over-references approach really is easier for anybody, but  
maybe I'm wrong.

> I would even argue that your confusion supports this argument. Your
> understanding of Python is perfectly adequate, so get with the program
> for Pete's sake!

In my case, my understanding of Python became clear only once I  
stopped listening to all the confusing descriptions here, and realized  
that Python is no different from other OOP languages I already knew.

Best,
- Joe