Finding the instance reference of an object [long and probably boring]

Fri Nov 7 00:35:07 EST 2008

Joe Strout wrote:
> On Nov 6, 2008, at 12:44 PM, Arnaud Delobelle wrote:
> 
>> I know this thread has grown quite personal for some of its
>> participants.  I am posting in a spirit of peace and understanding :)
> 
> Thanks, I'll do the same.
> 
That's good to hear. Your arguments are sometimes pretty good, and
usually well made, but there's been far too much insistence on all sides
about being right and not enough on reaching agreement about how
Python's well-defined semantics for assignment and function calling
should best be described.

In other words, it's a classic communication problem.

>>> Um, no, I've admitted that it's a reference all along.  Indeed, that's
>>> pretty much the whole point: that variables in Python don't contain
>>> objects, but merely contain references to objects that are actually
>>> stored somewhere else (i.e. on the heap).  This is explicitly stated
>>> in the Python docs [1], yet many here seem to want to deny it.
>>
>> You refer to docs about the *implementation* of Python in C.  This is
>> irrelevant.
> 
> It's supportive.  I don't understand how/why anybody would deny that
> Python names are references -- it's all over the place, from any
> discussion of "reference counting" (necessary to understand the life
> cycle of Python object) to understanding the basics of what "a = b"
> does.  It seems absurd to argue that Python does NOT use references.  So
> the official documentation calmly discussing Python references, with no
> caveats about it being internal implementation detail, seemed relevant.
> 
I must say I find it strange when people try to contradict my assertion
that Python names are references to objects, when the (no pun intended)
reference implementation of the language uses "reference counting" to
track how many assignments have been made.

>> Also, you talk about variables 'containing' something.  In Python,
>> variables don't contain anything, they're simply names for objects.
> 
Though there is an equally vociferous faction who will happily jump up
and down all day shouting "objects don't have names", a tendency I have
myself been known to indulge from time to time (but usually only when
some novitiate asks how they can find out "what the name of an object
is"). Being of the old school, I do tend to think of Python names as
being reference variables in the sense of Algol 68. Thus they are
fixed-size and frequently of limited lifetime. Since assignment (whether
by name binding or to a container element) copies the reference, and
since strong references keep objects alive, this is one way to explain
why Python doesn't suffer from C++'s dangling pointer issue.

> You say "names for", I say "references to".  We're saying the same thing
> (though I'm saying it with terminology that is more standard, at least
> in the wider OOP world).
> 
Naughty, naughty, there's that little "I'm right, you're wrong" thing
sneaking in again. I don't want to have to get the clue stick out here ...

>> 'Pass by value' is not relevant to Python as variables do not contain
>> anything. 'Pass by reference' is not relevant to Python as the language
>> doesn't have the concept of object reference (in the sense of e.g. C++
>> reference).

"Variables do not contain anything" seems to be a little extreme here.
They must store information of some sort, or no Python program could
ever produce a useful output. And while the concept of "object
reference" may not exist in the language, it is definitely valid for
implementers.

Interestingly, while "variable" isn't an indexed term in the (2.6)
documentation, "reference count" appears in the glossary and the
Language Reference Manual (again, no pun intended) explicitly states in
its discussion of Python's data model (vis a vis the exact meaning of
immutability) that container objects contain references to other
objects. It shortly thereafter mentions the reference-counting technique
of the CPython implementation, but does not claim it as part of the
language.

The same section also mentions "reference to 'external' resources such
as files or windows ..." and "references to other objects".

Interestingly it is also made explicit that "for immutable types,
operations that compute new values may return a reference to any
existing object with the same type and value, while for mutable objects
this is not allowed" (and if any reader disagrees that the reasons for
this are obvious their part in this thread was long since over).

There's even a built-in type called a "weak reference".

So any argument that the language "doesn't have the concept of object
reference (in the sense of e.g. C++ reference)" is simply stating the
obvious: that Python has no way to declare reference variables. I would
argue myself that it has no need of such a mechanism precisely because
names are object references, and I'd like to hear counter-arguments.
Consider my memory short -- I have a large dose of crotchety to go with
that it you'd like.

> 
> Both are relevant to answering simple questions, like what happens to x
> in this case:
> 
>    def foo(spam):
>       spam = 5
>    foo(x)
> 
> This is a basic and fundamental thing that a programmer of a language
> should know.  If it's call-by-reference, then x becomes 5.  If it's
> call-by-value, it does not.
> 
Well that's not true either. If I remember all the way back to my
computational science degree I seem to remember being taught that there
was call by *simple reference*, which is what I understand you to mean.
Suppose I write the following on some not-quite-Python language:

lst = ['one', 'two', 'three']

index = 1

def foo(item, i):
   i = 2
   item = "ouch"

foo(lst[index], index)

index == 2
lst == ['one', 'two', 'ouch']

With call by simple reference, after the call I would expect the
following conditions to be true:

index == 2
lst == ['one', 'ouch', 'three']

With full call by reference, however, arguably the change to the value
of index would induce the post-conditions

index == 2
lst == ['one', 'two', 'ouch']

because the reference made by the first argument depends on the value of
a variable mutated inside the function call.

> Why the resistance to these simple and basic terms that apply to any OOP
> language?
> 
Ideally I'd like to see this discussion concluded without resorting to
democratic appeals. Otherwise, after all, we should all eat shit: sixty
billion flies can't possibly be wrong.

>> Here lies, IMHO, the reason why you think you need Python to 'pass by
>> value'.  As you believe that variables must contain something, you think
>> that assignment is about copying the content of a variable.  Assignment
>> in Python is simply giving a new name to an object.
> 
> What does "give a new name to an object" mean?  I submit that it means
> exactly the same thing as "assigns the name to refer to the object". 

I normally internalize "x = 3" as meaning "store a reference to the
object 3 in the slot named x", and when I see "x" in an expression I
understand it to be a reference to some object, and that the value will
be used after dereferencing has taken place.

I've seen various descriptions of Python's name binding behavior in
terms of attaching Port-It notes bearing names to the objects reference
by the names, and I have never found them convincing. The reason for
this is that names live in namespaces, whereas values live in some other
universe altogether (that I normally describe as "object space" to
beginners, though this is not a term you will come across in the python
literature). So I see the Post-it as being attached to a portion of some
namespace, and that little fixed-size piece of object space being
attached by a piece of string to a specific object. Of course any object
can have many piece of string attached, and not all of them come from
names -- some of them come from container elements, for example.

> There certainly is no difference in behavior that anyone has been able
> to point out between what assignment does in Python, and what assignment
> does in RB, VB.NET, Java, or C++ (in the context of object pointers, of
> course).  If the behavior is the same, why should we make up our own
> unique and different terminology for it?
> 
One reason would be that in the other languages you have other choices
as well, so you need to distinguish between them. Python is simpler, and
so I don't see us needing the terminological complexity required in the
other contexts you name, for a start. Java messed up the whole deal by
having different kinds of objects as a sacrifice to run-time speed,
thereby breeding a whole generation of programmers with little clue
about these matters, and the .NET environment also has to resort to
"boxing" and "unboxing" from time to time. I say away with comparisons
to such horrendously complex issues. One of the reasons for Python's
continue march towards world domination (allow me my fantasies) is its
consistent simplicity. Those last two words would be my candidate for
the definition of "Pythonicity".

>> To understand variables (which I prefer to call 'names') and function
>> calls in Python you need simply to understand that:
>>
>>  - a variable is a name for an object
> 
> A reference to an object, got it.
> 
>>  - assignment is naming an object
> 
> Assigning the reference to the object, yes.
> 
Nope, storing the reference "against" the name (more exactly, in the
memory area associated with name, though I can hear hackles rising
throughout Pythonland as I type those words).

>>  - the parameters of a function are local names for the call arguments
> 
> Agreed; they're not aliases of the call arguments.
> 
They are actually names local to the function namespace, containing
references to the arguments. Some of those arguments were provided as
names, in which case the local name contains a copy of the reference
bound to the name provided as an argument. This is, however, merely a
degenerate case of the general instance, in which an expression is
provided as an argument and evaluated, yielding (a reference to) an
object which is then bound to the parameter name in the local namespace.

>>    (I guess 'pass by object' is a good name).
> 
> Well, I'm not sure why that would be.  What you've just described is
> called "pass by value" in every other language.
> 
Sigh. This surely can only be true if you insist that references are
themselves values. I hold that they are not. It seems so transparent to
me that the parameters are copies of the references passed as arguments
I find it difficult to understand how, or why, anyone would
conceptualize it differently.

>> I would say that an oject is passed, not a reference.
> 
> That seems to contradict the actual behavior, as well as what you said
> yourself above.  The only way I know how to interpret "an object is
> passed" is "the data of that object is copied onto the stack".  But of
> course that's not what happens.  What actually happens is what you said
> above: a name (reference) is assigned to the object.  The name is a
> reference; it is made to refer to the same thing that the argument
> (actual parameter) referred to.  This is exactly what "the reference is
> passed" means, nothing more or less.
> 
OK, so above you argue quite cogently that Python uses a
reference-passing mechanism. This make you insistence in the preceding
paragraph on calling it "pass by value" a little stubborn.

>>> I know it seems nit-picky on the surface, but it is important.  It is
>>> the distinction that lets you answer whether:
>>>
>>> def foo(x):
>>>   x = Foo()
>>>
>>> x = Bar()
>>> foo(x)
>>>
>>> ...results in x (after the call) now referring to a Foo, or still
>>> referring to a Bar.
>>
>> You don't need this to decide.  This is what happens:
>>
>> x = Bar() # Call this new Bar object 'x'
> 
> In other words, make 'x' refer to this new object.
> 
So far so good.

>> foo(x)    # call function foo with argument the object known as 'x'
> 
> Yes.  But what does that mean?  Does the parameter within 'foo' become
> an alias of x, or a copy of it?  That's what we DO need to decide.
> 
>> # Now, in foo:
>> def foo(x):   # Call 'x' locally the object passed to foo
> 
> I think you mean here that local 'x' is made to refer to the object
> passed to foo.  Agreed.  It is NOT an alias of the actual parameter. 
> And that's what we need to know.  So it's not call-by-reference, it's
> call-by-value; the value of x (a reference to whatever object Bar()
> returned) is copied from the value of the parameter (a reference to that
> same object, of course).
> 
Sigh again. You appear to want to have your cake and eat it. You are, if
effect, saying "there are no values in Python, only references",
completely ignoring the fact that it is semantically impossible to have
a reference without having something to *refer to* (which we in the
Python world, in our usual sloppy way, often call "a value").

In Algol 68 terms what you are saying is "they are refs, not ref refs".

I suspect this may be at the root of our equally stubborn insistence
that calling this mechanism "pass by value" is inviting
misunderstanding. If we didn't want to eliminate misunderstanding we
would all have stopped replying to you long ago.

>>    x = Foo() # Call 'x' locally this new Foo object.
> 
> Make 'x' refer to the new Foo() result, yes.
> 
>> Obviously after all this, 'x' is still the name of the Bar object
>> created at the start.
> 
> Obvious only once you've determined that Python is call-by-value.  If it
> were like FORTRAN, where everything is call-by-reference, then that
> wouldn't be the case at all; the assignment within the function would
> affect the variable (or name, if you prefer) passed in.
> 
Sorry, I'd need to be a Fortran expert to decipher that or make a
judgment on its validity, so your appeal to the masses goes out of the
window.

>> To sum up: for 'pass by value' to make sense in Python you need to
>> create an unnecessarily complex model of how Python works.
> 
> I have to disagree.  The model is simple, self-consistent, and
> consistent with all other languages.  It's making up terms and trying to
> justify them and get out of the logical knots (such as your claim above
> that the object itself is passed to a method) that is unnecessarily
> complex.
> 
I have to disagree. The model is clearly based on a wrong-headed
interpretation of a fairly exact understanding of Python's semantics.

>> By letting go of 'pass by value' you can simplify your model of the
>> language
>> (keeping it correct of course) and it fits in your brain more easily.
> 
> I'm afraid I don't see that.
> 
>> Of course your own model is valid but there is a better one which is
>> easier to grasp for people without a background in C/C++ - like
>> languages.
> 
> Well, if by C/C++-like languages, you mean also Java, VB.NET, and so on,
> then maybe you're right -- perhaps my view is colored by my experience
> with those.  But alternatively, perhaps there are enough Python users
> without experience in other OOP languages, that the standard terminology
> was unfamiliar to them, and they made up their own, resulting in the
> current linguistic mess.
> 
Well, I started with Simula and SmallTalk back in 1973, so my experience
may be a bit light. Sorry about that. This terminology wasn't made up by
Python beginners, but by the people who invented Python. I believe they
did so on the grounds that it's easier for beginners to understand
Python's semantics without having to reference too many similar in
theory but confusingly different in practice other environments.

I would even argue that your confusion supports this argument. Your
understanding of Python is perfectly adequate, so get with the program
for Pete's sake!

regards
 Steve

-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/