What other languages use the same data model as Python?

Tue May 10 03:41:38 EDT 2011

Steven D'Aprano wrote:

> All very good, but that's not what takes place at the level of Python 
> code. It's all implementation.

Actually, you're right. What I've presented is a paper-and-pencil
implementation of the Python data model. Together with a set of
rules for manipulating the diagram under the direction of Python
code, you have a complete implementation of Python that you can
execute in your head.

And you NEED such an implementation in order to reason correctly
about Python programs under all circumstances.

I find it very difficult to imagine *any* implementation of
Python, computer-based or otherwise, that doesn't have something
equivalent to references. Whether you call them that, or pointers,
or arrows, or object bindings, or something else, the concept
needs to be there in some form.

> But surely, if a = 1234 creates a reference from a to the big 
> box 1234, then b = a should create a reference from b to the box a?
> 
>                              +-------------+
>        +---+                 |             |
>      a | --+---------------->|             |
>        +---+                 |             |
>          ^                   +-------------+
>          |
>        +-|-+
>      b | | |
>        +---+

You can't expect anyone to draw correct conclusions from the
diagram alone; you also need to explain the rules for manipulating
the diagram under control of Python code.

Here, the explanation goes something like this:

1. The right hand side of an assignment denotes a big box. The
literal 1234 in a program denotes a big box containing the integer
value 1234.

2. The left hand side of an assignment denotes a little box. The
effect of an assignment is to make the arrow from the left hand
side's little box point to the big box denoted by the right hand
side.

So the assignment a = 1234 results in

                               +-------------+
         +---+                 |             |
       a | --+---------------->|   1234      |
         +---+                 |             |
                               +-------------+

3. When a name is used on the right hand side, it denotes whichever
big box is pointed to by the arrow from its little box. So given
the above diagram, the assignment b = a results in

                               +-------------+
         +---+                 |             |
       a | --+---------------->|   1234      |
         +---+                 |             |
                               +-------------+
                                  ^
         +---+                    |
       b | --+---------------------
         +---+

Furthermore, from rule 2 alone it's evident that no assignment
can ever make an arrow lead from one little box to another
little box. Arrows can only lead from a little box to a big
box.

> That's how it works in C and Pascal (well, at least with the appropriate 
> type declarations).

Um, no, it doesn't, really. There's no way 'b = a' can give
you that in C; you would have to write 'b = &a'. And you
couldn't do it at all in standard Pascal, because there is
no equivalent to the & operator there.

> Your model is closer to what the CPython implementation 
> actually does,

I think it's close -- actually, I would say isomorphic --
to what *any conceivable* Python implementation would do in
some form.

> n = len('hello world')
> 
> What about outside len? Where's the little box 
> pointing to 'hello world'?

> So it seems your model fails to deal with sufficiently anonymous objects.

Anonymous objects are fine. You just draw a little box and
don't write any label beside it. Or you don't bother drawing
a little box at all and just draw a big box until such time
as some little box that you care about needs to point to it.

If that's a problem, then you have the same problem talking
about names bound to objects. An anonymous object obviously
doesn't have any name bound to it. So you have to admit that
objects can exist, at least temporarily, without being bound
to anything, or bound to some anonymous thing.

> Both the call to len and the call to func push their results onto the 
> stack. There's no little box pointing to the result.

If you want to model things at that level of detail, then the
stack itself is an array of little boxes inside a frame object.
And the frame object is pointed to by a little box in its
calling frame object, etc. until you get to some global little
box, that doesn't have a name in Python, but exists somewhere
and keeps the chain of active stack frames alive.

But you don't have to draw any of that if you don't want to.

> For practical reasons, there must be some sort of 
> indirection. But that's implementation and not the VM's model.

No, it's not just implementation. Indirection is needed for
*correct semantics*, not just practicality.

> There is a problem with my model of free-floating objects in space: it 
> relies on objects being able to be in two places at once,

Yes, that's the point I'm trying to make. While it might be
possible to make such a model work, it would require bizarre
contortions that actually obscure what is going on instead
of clarifying it. Trying to teach someone about Python using
a model like that would be actively harmful, and probably
vilolate several human rights conventions.

> But if you're a science fiction fan from way back, 
> then you won't have any problem with the idea that objects can be inside 
> themselves:
> 
> http://www.youtube.com/watch?v=51JtuEa_OPc

Yeah, it's fun to play around with ideas like that precisely
because they twist our brains into knots. But it's not a
good way to explain Python semantics clearly!

> Now, that's a good challenge for your model. Little boxes only point to 
> big boxes. So how do you model cycles, including lists that contain 
> themselves?

I'll answer your next question first, and come back to that.

>>      +---+      +---+
>>    a | --+----->|   |      +-------------+
>>      +---+      +---+      |             |
>>                 | --+----->|             |
>>                 +---+      |             |
>>                 |   |      +-------------+
>>                 +---+
> 
> But that's wrong! Names (little boxes) can't point to *slots in a list*,

The arrow from a is to be understood as pointing to the whole
list, not any particular little box within the list. If you want
to clarify that, you can embellish the big boxes with some kind
of header area and point to that instead.

       +---+
     a | --+----->/---\
       +---+      +---+
                  |   |
                  +---+
                  | --+----->/-------------\
                  +---+      +-------------+
                  |   |      |             |
                  +---+      |             |
                             |             |
                             +-------------+

Now, a list that "contains" itself:

                    --------
                    |      |
       +---+        V      |
     a | --+----->/---\    |
       +---+      +---+    |
                  |   |    |
                  +---+    |
                  | --+-----
                  +---+
                  |   |
                  +---+

> But I wouldn't do it like that. I'd do it like this:
> 
>       0        1        2        3        4
>     +--------+--------+--------+--------+--------+
>   a |        |        |        |        |        |
>     |        |        |        |        |        |
>     |        |        |        |        |        |
>     +--------+--------+--------+--------+--------+

But then you can't model two list items bound to the same object,
unless you invoke the two-places-at-once idea. Even then, you would
have trouble unambiguously indicating that boxes draw in two places
are mean to actually represent the *same* object as against two
different objects with equal values.

> Why are the boxes so 
> small? Just because. Why can't you carefully tease the thread of blutack 
> apart, into a bifurcated Y shaped thread? Just because.

Yes, it's probably just as good to leave it as an arbitrary rule
that a little box can only point to one big box at a time.

-- 
Greg