Objects in Python

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Aug 26 09:43:33 EDT 2012


On Sun, 26 Aug 2012 00:45:55 -0500, Evan Driscoll wrote:

> On 08/24/2012 05:00 AM, Steven D'Aprano wrote:
>> No. The compiler remembers the address of 'a' by keeping notes about it
>> somewhere in memory during the compilation process. When you run the
>> compiled program, there is no longer any reference to the name 'a'.
>>
>> ...
>>
>> The mapping of name:address is part of the *compilation* process -- the
>> compiler knows that variable 'x' corresponds to location 12345678, but
>> the compiled code has no concept of anything called 'x'. It only knows
>> about locations. The source code 'x = 42' is compiled into something
>> like 'store 42 into location 12345678'. (Locations may be absolute or
>> relative.)
>>
>> In languages with name bindings, the compiler doesn't need to track
>> name:address pairs. The compiled application knows about names, but not
>> about addresses. The source code 'x = 42' is compiled into something
>> like 'store 42 into the namespace using key "x"'.
>
> What you describe is sorta correct, but it's also not... you're
> describing implementations rather than the language. And while the
> language semantics certainly impose restrictions on the implementation,

I accept that languages may choose to leave the variable-model 
unspecified. I don't think they can define behaviour without implying one 
model or the other. Or at least not easily - far too much language 
behaviour is tied to the implementation to let us say "it's only 
implementation".

For example, the reason that locals() is not writable inside Python 
functions is because CPython moves away from the name binding model 
inside functions as an optimization. This function prints 1 under both 
CPython and Jython (but not IronPython):

def spam():
    x = 1
    locals()['x'] = 2
    print(x)

Binding to the local namespace does not work, because functions don't 
*actually* use a namespace, they use something closer to the C model. So 
the two models are not interchangable and hence they aren't *just* 
implementation details, they actually do affect the semantics of the 
language.

I suppose you could arrange for locals() to return a proxy dictionary 
which knew about the locations of variables. But what happens if you 
returned that proxy to the caller, which then assigned to it later after 
the function variables no longer existed?

Similarly, there are operations which are no longer allowed simply 
because of the difference between name binding and locational variables:


py> def ham():
...     from math import *
...
  File "<stdin>", line 1
SyntaxError: import * only allowed at module level


(In some older versions of Python, wildcard imports are allowed, and the 
function then falls back on a namespace instead of fixed locations. That 
is no longer the case in Python 3.2 at least.)


> I think in this case the situation is closer than you acknowledge:
> 
>  From the Python side, I suspect that for most functions, you'd be able
> to create a Python implementation that behaves more like C, and
> allocates locals in a more traditional fashion.

As I discuss above, CPython and Jython actually do something like that 
inside functions. And there are observable differences in behaviour (not 
just performance) between function scope and global scope.

So an implementation of Python which used fixed memory addresses 
everywhere, not just in functions, would be detectably different in 
behaviour than CPython. Whether those differences would be enough to 
disqualify it from being called "Python" is a matter of opinion.

(Probably Guido's opinion is the only one that matters.)


[...]
> On the C side, imagine a function with locals x, y, and z which never
> takes the address of any of them. (You said later that "Just because the
> public interface of the language doesn't give you any way to view the
> fixed locations of variables, doesn't mean that variables cease to have
> fixed locations.")
> 
> First, C variables may not even have a memory address. They can
> disappear completely during compilation, or live in a register for their
> entire life.

Variables that don't exist at runtime don't have an address at all -- in 
a way, they aren't even a variable any more. They have a name in the 
source code, but that's all.

As for registers, they are memory addresses, of a sort. (I didn't mean to 
imply that they must live in main motherboard memory.) I call any of 
these an address:

 - in the heap at address 12345678
 - in the GPU's video memory at address 45678
 - 12th entry from the top of the stack
 - register 4


> Second, it's possible that those variables *don't* occupy a fixed
> location. If you never explicitly take an address of a variable (&x),
> then I can't think of any way that the address can be observed without
> invoking undefined behavior -- and this means the C compiler is free to
> transform it to anything that is equivalent under the C semantics.

I may have been too emphatic about the "fixed" part. A sufficiently 
clever compiler may implement its own memory manager (on top of the 
operating system's memory manager?) and relocate variables during their 
lifetime. But for my purposes, the important factor is that the compiler 
knows the address at every moment, even if that address changes from time 
to time.

In contrast, a name binding system *doesn't* know the address of a 
variable. The analogy I like is making a delivery to a hotel room. C-like 
languages say:

"Deliver this package to room 1234."

Pointer semantics are like:

"Go to room 1234 and collect an envelope; deliver this package to the 
room number inside the envelope."

On the other hand, name binding languages say:

"Go to the concierge at the front desk and ask for Mr Smith's room, wait 
until he looks it up in the register, then deliver this package to the 
room number he tells you."

Typically, you don't even have any way to store the room number for later 
use. In Python, name lookups involve calculating a hash and searching a 
dict. Once you've looked up a name once, there is no way to access the 
hash table index to bypass that process for future lookups.

It gets worse: Python has multiple namespaces that are searched.

"Go to the Excelsior Hotel and ask the concierge for Mr Smith. If Mr 
Smith isn't staying there, go across the road to the Windsor Hotel and 
ask there. If he's not there, try the Waldorf Astoria, and if he's not 
there, try the Hyperion."

Considering just how much work Python has to do to simply access a named 
variable, it's amazing how slow it isn't.


-- 
Steven



More information about the Python-list mailing list