[Python-Dev] Accessing globals without dict lookup

Tim Peters tim.one@comcast.net
Mon, 11 Feb 2002 04:27:27 -0500


[Jeremy Hylton]
> Here's a brief review of the example function.
>
> def mylen(s):
>     return len(s)
>
> LOAD_BUILTIN       0 (len)
> LOAD_FAST          0 (s)
> CALL_FUNCTION      1
> RETURN_VALUE
>
> The interpreter has a dlict for all the builtins.  The details don't
> matter here.

Actually, the details are everything here <wink>.

> Let's say that len is at index 4.
>
> The function mylen has an array:
> func_builtin_index = [4]  # an index for each builtin used in mylen
>
> The entry at index 0 of func_builtin_index is the index of len in the
> interpreter's builtin dlict.  It is either initialized when the
> function is created or on first use of len.

All clear except for the referent of "It" (the subject of the preceding
sentence is "The entry at index 0", but that doesn't seem to make much sense
as a referent).

> (It doesn't matter for the mechanism and there's no need to decide which
> is better yet.)
>
> The module has an md_globals_dirty flag.  If it is true, then a
> global was introduced dynamically, i.e. a name binding op occurred
> that the compiler did not detect statically.

Once it becomes true, can md_globals_dirty ever become false again?

> The code object has a co_builtin_names that is like co_names except
> that it only contains the names of builtins used by LOAD_BUILTIN.
> It's there to get the correct behavior when shadowing of a builtin by
> a local occurs at runtime.
    ^^^^^

Can that happen?  Or did you mean when shadowing of a builtin by a global
occurs at runtime?  The LOAD_BUILTIN code below seems most consistent with
the "global" rewording.

> The frame grows a bunch of pointers --
>
>     f_module from the function (which stores it instead of func_globals)
>     f_builtin_names from the code object
>     f_builtins from the interpreter
>
> The implementation of LOAD_BUILTIN 0 is straightforward -- in pidgin C:
>
> case LOAD_BUILTIN:
>     if (f->f_module->md_globals_dirty) {
>         PyObject *w = PyTuple_GET_ITEM(f->f_builtin_names);

Presumably this is missing an ", oparg" argument.

>         ... /* rest is just like current LOAD_GLOBAL
>                except that is used PyDLict_GetItem()
>              */
>     } else {
>         int builtin_index = f->f_builtin_index[oparg];
>         PyObject *x = f->f_builtins[builtin_index];
>         if (x == NULL)
>            raise NameError
>         Py_INCREF(x);
>         PUSH(x);
>     }

OK, that's the gritty detail I was looking for.  When it comes time to code,
note that it's better to negate the test and swap the "if" branches (a
not-taken branch is usually quicker than a taken branch, and you want to
favor the expected case).

Question:  couldn't the LOAD_BUILTIN opcode use builtin_index directly as
its argument (and so skip one level of indirection)?  We know which builtins
the interpreter supplies, and the compiler could be taught a fixed
correspondence between builtin names and little integers.  There are only
<snort> 114 keys in __builtin__.__dict__ today, so there's plenty of room in
an instruction to hold the index.  A tuple of std builtin names could also
be a C extern shared by everyone, eliminating the need for f_builtin_names.

> The LOAD_GLOBAL opcode ends up looking basically the same, except that
> it doesn't need to check md_globals_dirty.
>
> case LOAD_GLOBAL:
>     int global_index = f->f_global_index[oparg];
>     PyObject *x = f->f_module->md_globals[global_index];
>     if (x == NULL) {
>        check for dynamically introduced builtin
>     }
>     Py_INCREF(x);
>     PUSH(x);

f_global_index wasn't mentioned before its appearance in this code block.  I
can guess what it is.  Again I wonder whether it's possible to snip a layer
of indirection (for a fixed function and fixed oparg, can
f->f_global_index[oparg] change across invocations of LOAD_GLOBAL?  I'm
guessing "no", in which case a third of the normal-case code is burning
cycles without real need).

> In the x == NULL case above, we need to take extra care for a builtin
> that the compiler didn't expect.  It's an odd case.  There is a
> global for the module named spam

The module is named spam, or the global is named spam?  I think the latter
was intended.

> that hasn't yet been assigned to in the module and there's also a
> builtin named spam that will be hidden once spam is bound in the module.

And can also be revealed again if someone reaches into the module and del's
spam again, right?

This looks fast, provided it works <wink>, and is along the lines of what I
had in mind when I first tortured Guido with the idea of dlicts way back
when.  One major correction:  you pronounce it "dee-likt".  That's a
travesty.  I picked the name dlict because it's unpronounceable in any human
language -- as befits an unthinkable idea <wink>.