[Python-Dev] Accessing globals without dict lookup

Ka-Ping Yee ping@lfw.org
Mon, 11 Feb 2002 07:14:09 -0600 (CST)


All right -- i have attempted to diagram a slightly more interesting
example, using my interpretation of Guido's scheme.

    http://lfw.org/repo/cells.gif

    http://lfw.org/repo/cells-big.gif for a bigger image

    http://lfw.org/repo/cells.ai for the source file

The diagram is supposed to represent the state of things after
"import spam", where spam.py contains

    import eggs

    i = -2
    max = 3

    def foo(n):
        y = abs(i) + max
        return eggs.ham(y + n)

How does it look?  Guido, is it anything like what you have in mind?


A couple of observations so far:

    1.  There are going to be lots of global-cell objects.
        Perhaps they should get their own allocator and free list.

    2.  Maybe we don't have to change the module dict type.
        We could just use regular dictionaries, with the special
        case that if retrieving the value yields a cell object,
        we then do the objptr/cellptr dance to find the value.
        (The cell objects have to live outside the dictionaries
        anyway, since we don't want to lose them on a rehashing.)

    3.  Could we change the name, please?  It would really suck
        to have two kinds of things called "cell objects" in
        the Python core.

    4.  I recall Tim asked something about the cellptr-points-to-itself
        trick.  Here's what i make of it -- it saves a branch: instead of

            PyObject* cell_get(PyGlobalCell* c)
            {
                if (c->cell_objptr) return c->cell_objptr;
                if (c->cell_cellptr) return c->cell_cellptr->cell_objptr;
            }

        it's

            PyObject* cell_get(PyGlobalCell* c)
            {
                if (c->cell_objptr) return c->cell_objptr;
                return c->cell_cellptr->cell_objptr;
            }

        This makes no difference when c->cell_objptr is filled,
        but it saves one check when c->cell_objptr is NULL in
        a non-shadowed variable (e.g. after "del x").  I believe
        that's the only case in which it matters, and it seems
        fairly rare to me that a module function will attempt to
        access a variable that's been deleted from the module.

        Because the module can't know what new variables might
        be introduced into __builtin__ after the module has been
        loaded, a failed lookup must finally fall back to a lookup
        in __builtin__.  Given that, it seems like a good idea to
        set c->cell_cellptr = c when c->cell_objptr is set (for
        both shadowed and non-shadowed variables).  In my picture,
        this would change the cell that spam.max points to, so
        that it points to itself instead of __builtin__.max's cell.
        That is:

            PyObject* cell_set(PyGlobalCell* c, PyObject* v)
            {
                c->cell_objptr = v;
                c->cell_cellptr = c;
            }

        This simplifies things further:

            PyObject* cell_get(PyGlobalCell* c)
            {
                return c->cell_cellptr->cell_objptr;
            }

        This buys us no branches, which might be a really good
        thing on today's speculative execution styles.

I know i'm a few messages behind on the discussion -- i'll do
some reading to catch up before i say any more.  But i hope
the diagram is somewhat helpful, anyway.


-- ?!ng