I'm wrong or Will we fix the ducks limp?

Steven D'Aprano steve at pearwood.info
Sun Jun 5 23:52:09 EDT 2016


On Mon, 6 Jun 2016 03:42 am, Random832 wrote:

> On Sun, Jun 5, 2016, at 02:37, Steven D'Aprano wrote:
>> No they don't. You are confusing the implementation with the programming
>> model.
>> 
>> Following the assignment:
>> 
>> x = 99
>> 
>> if you print(x), do you see something like "reference 0x12345"? No.
>> 
>> Do you have to dereference that reference to get the value of x? No.
>>
>> At the Python level, the value of x is 99, not some invisible,
>> untouchable reference to 99.
> 
> Sure, but that is the value of the object referenced by x, it is not a
> property of x itself.

You need to be clear about what you are referring to.

x itself is the object 99. When we say "x + 1", we expect to get 100. x is
not some reference to 99, it is 99, just like Barrack Obama is not a
reference to the President of the United States, he is the POTUS.

The *name* "x" is an entity which is bound to (i.e. a reference to) the
object 99, in some specific namespace, at some specific time. The name
itself is an English word consisting of a single letter, "x". Its
implementation in Python is likely to be a str, "x", used as a key in some
namespace (often a dict). The name itself doesn't have any numeric value,
just as the English words "Barrack Obama" themselves are not a person.

There are contexts where we need to refer to names themselves in Python, but
they are comparatively rare. You will usually recognise them from context,
occasionally implicitly, but usually explicitly by talking about "the name
x" or "x" in quotes rather than x. In code, it will usually involve eval or
exec, or sometimes a namespace lookup:

    method = vars(self)[methodname.upper()]


Likewise, it is rare to refer to the words rather than the person Barrack
Obama, but when you do, it is usually easy to recognise because you will
generally refer to "Barrack Obama" in quotes.

We might say:

    "x" is a single-letter name and x is an int one less than 100

but you normally wouldn't say:

    x is a single-letter name and x is an int one less than 100

unless your aim is to cause confusion.

This is no different from the issue in plain English that words have
meaning, and when we use a word, we normally expect them to be interpreted
according to that meaning, and rarely as an abstract word. When we do, we
normally put it in quotation marks, or explicitly state that we are
referring to it as a word:

A cat is a four-legged carnivorous mammal that purrs. "Cat" is a
three-letter word; the word cat is of unknown origin.



> x = y = 999; z = int('999')
> x is y # True
> x is z # False
> 
> How would you describe the difference between x and y, and z? 

The names x and y are bound to the same object. The name z is bound to a
distinct object with equal value.


> This is not a mere implementation detail 

Of course it is. An implementation might be clever enough to recognise that
int('999') is the same as 999 and reuse the same int object. An
implementation might cache *all* ints, regardless of size, or do no caching
at all. The part of that example which is not implementation defined is
that given x = y = 999, the language requires that x and y be the same
object.


> [though, the fact that for int in 
> particular I had to go higher than 99 to get this result is - I wouldn't
> have had this problem with strings,

Really?

py> a = b = "cat"; c = str("cat")
py> a is b is c
True

But again, this is an implementation detail. The caching of strings depends
on the version and implementation, and the string itself.


> and I *couldn't* have had this 
> problem with lists]. The fact that two objects can have the same value
> while being different objects, and that two variables can point to the
> same object, are part of the programming model.

Right. But that's actually not relevant to the question.


[...]
> The fact that == is an operator distinct from 'is' inherently means that
> variables contain references, not objects.

No, you are mixing levels of explanation and conflating implementation and
interface. Would you insist that because we can say that the words Barack
Obama contain eleven letters that the the United States has two words as a
president? That Michelle Obama is married to two words rather than a man?

Of course not. At least, if you are going to make that argument, then you
are so confused that I'm not even going to discuss this with you.

Imagine a horrendously memory profligate implementation that implemented
variables using the fixed memory location model "variables are boxes". When
I write:

x = y = 999; z = 999

the interpreter creates three boxes "x", "y", "z". All three boxes need to
be big enough for an int object, which in this case is about 12 bytes.
Actually a bit bigger, as you will see: let's say 16 bytes, because the
implementation needs (say) four bytes for an object identity tag. All three
boxes get *mostly* the same content, namely the 12 byte object 999, but the
x and y boxes get the identity tag (lets say) 633988994, and the z box gets
the identity tag 633988995.

`x == y` is implemented the usual way (calling int.__eq__), while `x is y`
is implemented by testing whether the first four bytes of each box are
equal.

This would be a horribly profligate (and slow) implementation. Parameter
passing into functions would occur via copying; it would be even more
horribly slow because every mutation to an object would require a scan of
all the objects, so as to update all of the boxes. But speed and efficiency
is not part of the Python language model, its a quality of implementation
issue, and besides, Moore's Law will make everything fast eventually.
(Wishful thinking.)

As inefficient and awful as this module is, it would still be Python, as the
interface (the programming model) is the same. (Give or take a few bugs.)


> There is, obviously, only one value in my example. 

There is nothing "obvious" about that at all. There is only one abstract
number in use, namely 999, but there are two objects. What counts as the
value? Is it the object, or the numeric value?

Before you answer, consider the value of x and y given:

x = Widget(part='123xy', colour='red')
y = x.copy()

just to make it clear that they are distinct objects. I would be unhappy
with any definition of "value" that says that x and y don't have values,
just because they are Widgets rather than strings or numbers, or the values
are undefined.

The question "the value of x?" is tricky, as we have to decide whether we
mean the value of the variable or the value of the object. The value of the
variable, I believe, must be identified with the object bound to that
variable. What else could it be? Surely we would want:

w = 999+0j
x = 999
y = 999.0
z = Decimal("999.0000")

to be distinguished: the variables aren't the same, they are bound to
objects of different types, and surely we want "the value of w" to be
understood as different from "the value of x" at least sometimes.

Those four variables are numerically equal, but may not even compare equal,
e.g. Decimal("999.0") == 999.0 returns False in 2.5. In principle, two
numerically equal values might even raise TypeError on comparison. So there
is nothing *obvious* about talking about value.

I think that, under usual circumstances, the best way to understand "the
value of x" is to distinguish between two cases: if x refers to the name,
normally written "the value of the variable 'x'" but sometimes implied by
the context, then the answer should be the object bound to 'x'. But if x
refers to the variable, as in "the value of x", it refers to some
type-specific definition of value, e.g. the numeric value.

So your example has two values, which happen to be equal as they have the
same, er, numeric value. (Sometimes words get used with different contexts.
We need more words.)


> If variables 
> contained objects, then my example would have three objects,

Certainly not. x = y = 999 is required to bind the same object to x and y.


> none more or less distinct. They contain references, 

If your variable x were a reference, then we would expect type(x) to return
something like "Reference", but it doesn't, it returns int.

(Why do I say this? We say "x is an int" after "x = 999", and type(x)
returns int. So if "x is a reference", then type(x) ought to return
Reference. If you *don't* say "x is an int", well, frankly, I don't believe
you. See below.)

If x were a reference to 999, then we would need to dereference x to get to
the 999, but we don't. We don't write:

    print(dereference(x) + 1)


to get 1000, we write print(x+1). If x + 1 is 1000, what is the value of x?

(a) 999
(b) some invisible, intangible, untouchable reference to 999


If your answer is (b), then your view of Python programming is obfuscatory
and muddled and I cannot understand how you can effectively reason about
your code. If you look at an expression like 

    x.y.z['k'] = [999]

and reason like this:

    "x is a reference to a Spam object, so x.y is a reference to a reference
    to an Eggs object, so x.y.z is a reference to a reference to a reference
    to a dict, so x.y.z['k'] is a reference to a reference to a reference 
    to a dict containing a reference to a key 'k' which maps to a reference
    to a list containing a reference to 999."


then I cannot imagine how you get any work done. I don't believe you do. I
believe you reason about the code just like I do:

    "x is a Spam object, so x.y is an Eggs object, so x.y.z is a dict, so
    x.y.z['k'] is a dict with key 'k' which maps to a list containing 999."


So I'll be frank: anyone who says that "variables are references" surely
doesn't behave as if that were true. They don't program as if variables
were references, they don't reason about the code as if the variables were
references, and they certainly don't talk about code as if they were
references. They behave as if variables are the values (objects) bound to
them, just as I have argued all along.

(Except for the occasional and unusual situation where "peeking under the
bonnet" into the implementation actually is helpful -- such cases do exist,
such as a list which contains itself. Python's object model is an
abstraction, and all abstractions leak. Sometimes we cannot help but talk
about the implementation, in order to understand the leaks.)


>> There is no analog to dereferencing in Python, nothing like print(x^).
> 
> I don't see where anyone said there was. I think you've inferred meaning
> to the statement "python variables contain references" that it does not
> actually have.

Of course it does, if you are talking about the Python variable model rather
than the implementation.

I'm perfectly happy for people to say that Python variables are implemented
as references to objects. That's completely unobjectional, as it is true
for all implementations I know of, and it will likely be true for any
future implementations as well. We can be even more concrete and say that
CPython variables are implemented as pointers to objects.

But that's not the same as talking about the Python model. If you look at
Python code, and read "x = 999", and don't think of x as now being 999,
then I cannot fathom your mindset. You're thinking about code is an
obfuscatory manner that is of no use to me.

(In that case, I wonder why you don't go all the way to thinking about x as
some sequence of bits: in Python 2.7, 96 bits.)


>> You bind values (that is, objects)
> 
> Values are not objects. x and z have the same value, and their objects
> are identical but distinct,

"Identical but distinct" is a contradiction according to the way object
identity is defined in Python.


> but they are different because they point 
> (or refer, or by your weird terminology "bind") to different objects. 

"Bind" is not weird terminology. It is a standard term in widespread use in
computer science, particularly object oriented languages.

https://en.wikipedia.org/wiki/Name_binding

https://en.wikipedia.org/wiki/Late_binding

https://en.wikipedia.org/wiki/Binding#In_computing



-- 
Steven




More information about the Python-list mailing list