[Python-Dev] Proposed PEP: Optimizing Global Variable and Attribute Access

Skip Montanaro skip@pobox.com (Skip Montanaro)
Tue, 14 Aug 2001 00:57:44 -0500


Here's a first stab at a PEP for optimizing global variable and attribute
access.  Barry, when you get a chance, could you assign it a number and let
me know about any PEP-related formatting mistakes I made?  (It would be nice
if PEP 1 referenced a piece of boilerplate people could use.  While the PEP
Style section lists the headers, I wasn't sure what the Version and
Last-Modified headers were actually supposed to look like.  Also, it wasn't
clear what I was supposed to put in the Post-History: header.)

Skip

------------------------------------------------------------------------------
PEP: 
Title: Optimizing Global Variable and Attribute Access
Version: $Revision:$
Author: skip@pobox.com (Skip Montanaro)
Status: Draft
Type: Standards Track
Python-Version: 2.3
Created: 13-Aug-2001
Post-History:

Abstract

    The bindings for most global variables and attributes of other modules
    typically never change during the execution of a Python program, but
    because of Python's dynamic nature, code which accesses such global
    objects must run through a full lookup each time the object is needed.
    This PEP proposes a mechanism that allows code that accesses most global
    objects to treat them as local objects and places the burden of updating
    references on the code that changes the name bindings of such objects.

Introduction

    Consider the workhorse function sre_compile._compile.  It is the
    internal compilation function for the sre module.  It consists almost
    entirely of a loop over the elements of the pattern being compiled,
    comparing opcodes with known constant values and appending tokens to an
    outpu list.  Most of the comparisons are with constants imported from
    the sre_constants module.  This means there are lots of LOAD_GLOBAL
    bytecodes in the compiled output of this module.  Just by reading the
    code it's apparent that the author intended LITERAL, NOT_LITERAL,
    OPCODES and many other symbols to be constants.  Still, each time they
    are involved in an expression, they must be looked up anew.

    Most global accesses are actually to objects that are "almost
    constants".  This includes global variables in the current module as
    well as the attributes of other imported modules.  Since they rarely
    change, it seems reasonable to place the burden of updating references
    to such objects on the code that changes the name bindings.  If
    sre_constants.LITERAL is changed to refer to another object, perhaps it
    would be worthwhile for the code that modifies the sre_constants module
    dict to correct any active references to that object.  By doing so, in
    many cases global variables and the attributes of many objects could be
    cached as local variables.  If the bindings between the names given to
    the objects and the objects themselves changes rarely, the cost of
    keeping track of such objects should be low and the potential payoff
    fairly large.

Proposed Change

    I propose that the Python virtual machine be modified to include
    TRACK_OBJECT and UNTRACK_OBJECT opcodes.  TRACK_OBJECT would associate a
    global name or attribute of a global name with a slot in the local
    variable array and perform an initial lookup of the associated object to
    fill in the slot with a valid value.  The association it creates would
    be noted by the code responsible for changing the name-to-object binding
    to cause the associated local variable to be updated.  The
    UNTRACK_OBJECT opcode would delete any association between the name and
    the local variable slot.

Rationale

    Global variables and attributes rarely change.  For example, once a
    function imports the math module, the binding between the name "math"
    and the module it refers to aren't likely to change.  Similarly, if the
    function that uses the math module refers to its "sin" attribute, it's
    unlikely to change.  Still, every time the module wants to call the
    math.sin function, it must first execute a pair of instructions:

        LOAD_GLOBAL     math
        LOAD_ATTR       sin

    If the client module always assumed that math.sin was a local constant
    and it was the responsibility of "external forces" outside the function
    to keep the reference correct, we might have code like this:

        TRACK_OBJECT       math.sin
        ...
        LOAD_FAST          math.sin
        ...
        UNTRACK_OBJECT     math.sin

    If the LOAD_FAST was in a loop the payoff in reduced global loads and
    attribute lookups could be significant.

    This technique could, in theory, be applied to any global variable
    access or attribute lookup.  Consider this code:

        l = []
        for i in range(10):
            l.append(math.sin(i))
        return l

    Even though l is a local variable, you still pay the cost of loading
    l.append ten times in the loop.  The compiler (or an optimizer) could
    recognize that both math.sin and l.append are being called in the loop
    and decide to generate the tracked local code, avoiding it for the
    builtin range() function because it's only called once during loop
    setup.

    According to a post to python-dev by Marc-Andre Lemburg [1], LOAD_GLOBAL
    opcodes account for over 7% of all instructions executed by the Python
    virtual machine.  This can be a very expensive instruction, at least
    relative to a LOAD_FAST instruction, which is a simple array index and
    requires no extra function calls by the virtual machine.  I believe many
    LOAD_GLOBAL instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be
    converted to LOAD_FAST instructions.

    Code that uses global variables heavily often resorts to various tricks
    to avoid global variable and attribute lookup.  The aforementioned
    sre_compile._compile function caches the append method of the growing
    output list.  Many people commonly abuse functions' default argument
    feature to cache global variable lookups.  Both of these schemes are
    hackish and rarely address all the available opportunities for
    optimization.  (For example, sre_compile._compile does not cache the two
    globals that it uses most frequently: the builtin len function and the
    global OPCODES array that it imports from sre_constants.py.

Discussion

    Jeremy Hylton has an alternate proposal on the table [2].  His proposal
    seeks to create a hybrid dictionary/list object for use in global name
    lookups that would make global variable access look more like local
    variable access.  While there is no C code available to examine, the
    Python implementation given in his proposal still appears to require
    dictionary key lookup.  It doesn't appear that his proposal could
    speed local variable attribute lookup, which might be worthwhile in some
    situations.

Backwards Compatibility

    I don't believe there will be any serious issues of backward
    compatibility.  Obviously, Python bytecode that contains TRACK_OBJECT
    opcodes could not be executed by earlier versions of the interpreter,
    but breakage at the bytecode level is often assumed between versions.

Implementation

    TBD.  This is where I need help.  I believe there should be either a
    central name/location registry or the code that modifies object
    attributes should be modified, but I'm not sure the best way to go about
    this.  If you look at the code that implements the STORE_GLOBAL and
    STORE_ATTR opcodes, it seems likely that some changes will be required
    to PyDict_SetItem and PyObject_SetAttr or their String variants.
    Ideally, there'd be a fairly central place to localize these changes.
    If you begin considering tracking attributes of local variables you get
    into issues of modifying STORE_FAST as well, which could be a problem,
    since the name bindings for local variables are changed much more
    frequently.  (I think an optimizer could avoid inserting the tracking
    code for the attributes for any local variables where the variable's
    name binding changes.)

Performance

    I believe (though I have no code to prove it at this point), that
    implementing TRACK_OBJECT will generally not be much more expensive than
    a single LOAD_GLOBAL instruction or a LOAD_GLOBAL/LOAD_ATTR pair.  An
    optimizer should be able to avoid converting LOAD_GLOBAL and
    LOAD_GLOBAL/LOAD_ATTR to the new scheme unless the object access
    occurred within a loop.  Further down the line, a register-oriented
    replacement for the current Python virtual machine [3] could conceivably
    eliminate most of the LOAD_FAST instructions as well.

    The number of tracked objects should be relatively small.  All active
    frames of all active threads could conceivably be tracking objects, but
    this seems small compared to the number of functions defined in a given
    application.

References

    [1] http://mail.python.org/pipermail/python-dev/2000-July/007609.html

    [2] http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobalsPEP

    [3] http://www.musi-cal.com/~skip/python/rattlesnake20010813.tar.gz

Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: