[Types-sig] Low-hanging fruit: recognizing builtins

Guido van Rossum guido@CNRI.Reston.VA.US
Wed, 15 Dec 1999 12:07:30 -0500


It's always bothered me from a performance point of view that using a
built-in costs at least two dict lookups, one failing (in the modules'
globals), one succeeding (in the builtins).  This is done so that you
can define globals or locals that override the occasional builtin;
which is good since new Python versions can define new builtins, and
if you weren't allowed to override builtins this would break old code.

Here's a way that per-module analysis plus a conservative assumption
plus an addition to the PVM (Python Virtual Machine) bytecode can
remove *both* dict lookups for most uses of builtins.

Per-module analysis can easily detect that there are no global
variables named "len", say.  In this case, any expression calling
len() on some object can be transformed into a new bytecode that calls
PyObjectt_Length() on the object at the top of the stack.  Thus, a
sequence like

          LOAD_GLOBAL         0 (len)
          LOAD_FAST           0 (a)
          CALL_FUNCTION       1

can be replaced by

          LOAD_FAST           0 (a)
          UNARY_LEN

which saves one PVM roundtrip and two dictionary lookups, plus the
argument checking code inside the len() function.

There are plenty of bytecodes available.

In addition, we can now optimize common idioms involving builtins, the
most important one probably

     for i in range(n): ...

We lose a tiny bit of dynamic semantics: if some bozo replaces
__builtin__.len with something that always returns 0, this won't
affect our module.  Do we care?  I don't.  We don't have to do this
for *every* builtin; for example __import__() has as explicit
semantics that you can replace it with something else; for open() I
would guess that there must be plenty of programs that play tricks
with it.  But range()?  Or len()?  Or type()?  I don't think anybody
would care if these were less dynamic.  Note that you can still
override these easily as globals, it just has to be visible to the
global analyzer.

The per-module analysis required is major compared to what's currently
happening in compile.c (which only checks one function at a time
looking for assignments to locals) but minor compared to any serious
type inferencing.  Clearly this does nothing for (ERR), but I bet it
could speed up the typical Python program with a substantial factor...

Any takers?

--Guido van Rossum (home page: http://www.python.org/~guido/)