Why not an __assign__ method?

Carlos Alberto Reis Ribeiro cribeiro at mail.inet.com.br
Wed Apr 4 16:16:41 EDT 2001


At 15:49 02/04/01 -0400, Robin Thomas wrote:
>Best of luck once you get the source; I look forward to reading your 
>discoveries and updated proposal.

Now I'm back from the source :-) Unfortunately, I don't have MS VC 
installed, so I can't test any change. I tried to check my chances with two 
options: BC++ and GCC/CygWin, with no success so far. BTW, why aren't these 
two compilers supported? it seems that this question was answered before, 
but I could find no conclusive answer. Some people have had success with 
GCC/CygWin recently but I could not reproduce it here; maybe it's something 
related to my installation. Anyway, back to the topic...

(Oops. I think I'm still offtopic here :-) But I'm not a member of the dev 
list; maybe someone can point me to the procedure to follow for such 
discussions)

Following Robin's hints, I took a look at the sources. Examining the 
problem better, I saw that, while a partial solution may be easy, a 
complete solution really needs a more well-structured approach.

At first, I thought that there are two possible approches for the 
__assign__ implementation:

1) Modify the parser to insert a new opcode for the assign statement.
    (this was not what I had in my mind at first)

2) Include some code on methods to detect "assign-like" behavior.
    There are no changes to the parser.

To check if (2) is possible, I tried several constructs, and found two 
opcodes that could be intercepted: STORE_NAME and STORE_SUBSCR. These 
opcodes happen in the code stream whenever an assignment takes place. 
However, there are two other cases that may arise and that make it much 
more difficult (in fact, for both of the proposed approches).

Let's focus first on the simple case. I found this test in ceval.c, at line 
1502 (Python 2.1b2a):

        (...)
         case STORE_NAME:
         w = GETNAMEV(oparg);
         v = POP();
         if ((x = f->f_locals) == NULL) {
                 PyErr_Format(PyExc_SystemError,
                              "no locals found when storing %s",
                              PyObject_REPR(w));
                 break;
         }
         err = PyDict_SetItem(x, w, v);
         Py_DECREF(v);
         break;
        (...)

In this case, my proposal is to insert the callback to the assign method 
(if there is any) immediately before the PyDict_SetItem call. This can be 
done both for STORE_NAME and STORE_SUBSCR. It seems to solve the problem, 
but unfortunately things are not so simple.

THE PROBLEM...

There are other situations that makes things harder, or even impossible :-(

  - BUILD_TUPLE: the construct z = (a+b+c, ) builds a tuple with the
    intermediate object before assigning it to the name "z". In this
    case, all objects that are being put in the tuple would need to
    be "assigned". This is NOT a good idea for a lot of reasons. First
    of all, this was not exactly what I originally meant. Tuples may
    be built for a lot of reasons, even while the expression is being
    evaluated. Also there are performance concerns, because we would
    need to make this *every time* this opcode is executed, even for
    potentially large tuples.

  - CALL_FUNCTION: a similar thing can happen when passing the result
    of a expression as a parameter to a function. In this case,
    things seems to be complicated by the LOAD_FAST/STORE_FAST opcodes
    that are used to access objects directly from the stack.


In fact, these two problems affect the two approaches that I was proposing. 
So, it is impossible to use __ASSIGN__ in the way that I devised at first. 
However, there is a alternative that I just began exploring, and it may (or 
not) make sense. First of all let us restate the original intention:

THE INTENTION

My intention is to devise a way to optimize operations by avoiding the 
creation of new objects for every intermediate result. Such intermediate 
objects are created inside the methods that execute the operations and 
returned by them. With a little knowledge of the nature of the operands - 
namely, if at least one of the operands is a intermediate result valid in 
the context of the expression - it is possible for the operator 
implementation code to re-use such object, executing the operation 
in-place. This is safe to do because it relies on some cooperation on the 
operator part; if the operation can't be done safely, then it's up to the 
operator to create a new object and use it.

THE PROPOSAL

The proposal now is to call a predefined method, on the object that is the 
result of any expression, whenever the expression finishes being 
calculated. The method to be called could be named either __fix__ (the 
object is being "fixed" after the expression); or __result__ (indicating 
that the expression was finally evaluated, and the object is the result of 
the expression). For example, on:

z = a + b + c

... the method will be called on the resulting object, right before 
assignment takes place.

If the expression is comprised of the object alone, the method will not be 
called. For example,

z = a

... it's just the assignment, no "fix" required.

This need to be implemented by the compiler, and an extra opcode need to be 
inserted on the bytecode stream. It must be included whenever a 
mathematical/logical/sequence expression takes place. I'm still checking 
the details, but at least I now have a little bit more knowledge of the 
scenario.


Carlos Ribeiro






More information about the Python-list mailing list