[Python-Dev] Accessing globals without dict lookup

Jeremy Hylton jeremy@alum.mit.edu
Sat, 9 Feb 2002 20:13:53 -0500


Let's try an attribute of a module.  

import math

def mysin(x):
    return math.sin(x)

There are two variants of support for this that differ in the way they
handle math being rebound.  Say another function is:

    def yikes():
        global math
        import string as math

We can either check on each use of math.attr to see if math is
rebound, or we can require that STORE_GLOBAL marks all the math.attr
entries as invalid.  I'm not sure which is better, so I'll try to
describe both.

Case #1: Binding operation responsible for invalidating cache.

The module has a dlict for globals that contains three entries:
[math, mysin, yikes].  Each is a PyObject *.

The module also has a global attrs cache, where each entry is
struct {
    int ce_initialized; /* just a flag */
    PyObject **ce_ref;
} cache_entry;

In the case we're considering, ce_module points to math and
ce_module_index is math's index in the globals dlict.  It's assigned
to when the module object is created and never changes.

There is one entry in the global attrs cache, for math.sin.  There's
only one entry because the compiler only found one attribute access of
a global bound by an import statement.

The function mysin(x) uses 
    LOAD_GLOBAL_ATTR  0 (math.sin).

case LOAD_GLOBAL_ATTR:
    cache_entry *e = f->f_module->md_cache[oparg];
    if (!e->ce_initialized) {
        /* lookup module and find it's sin attr.
           store pointer to module dlict entry in ce_ref.
           NB: cache shared by all functions.

           if the thing we expected to be a module isn't actually
           a module, handle that case here and leave initalized set to
           false.
         */
    }
    if (*e->ce_ref == NULL) {
        /* raise NameError if global module isn't bound yet.
           raise AttributeError if module is bound, but doesn't have
           attr.
         */
    }
    Py_INCREF(*e->ce_ref);
    PUSH(*e->ce_ref);

To support invalidation of cache entries, we need to arrange the cache
entries in a particular order and add an auxiliary data structure that
maps from module globals to cache entries it must invalidation.

For example, say a module use math.sin, math.cos, and math.tan.  The
three cache entries for the math module should be stored contiguously
in the cache.

cache_entry *cache[] = { math.sin entry,
                         math.cos entry,
                         math.tan entry,
                       }

struct {
    int index;   /* first attr of this module in cache */
    int length;  /* number of attrs for this module in cache */
} invalidation_info;

There is one invalidation_info for each module that has cached
attributes.  (And only for things that the compiler determines to be
modules.)  The invalidation_info for math would be {0, 3}.  If a
STORE_GLOBAL rebinds math, it must walk through the cache and set
ce_initialized to false for each cache entry.

This isn't exactly the scheme I described in the slides, where I
suggested that the LOAD_GLOBAL_ATTR would check if the module binding
was still valid on each use.  A question from Ping pushed me back in
favor of the approach that I just described.

No time this weekend to describe that check-on-each-use scheme.

Jeremy