[Python-Dev] Re: Python 2.1 slower than 2.0

Jeremy Hylton jeremy@alum.mit.edu
Mon, 29 Jan 2001 12:39:05 -0500 (EST)


The recursion test in pybench is testing the performance of the nested
scopes changes, which must do some extra bookkeeping to reference the
recursive function in a nested scope.  To some extent, a performance
hit is a necessary consequence for nested functions with free
variables.

Nonetheless, there are two interesting things to say about this
situation.

First, there is a bug in the current implementation of nested scopes
that the benchmark tickles.  The problem is with code like this:

def outer():
    global f
    def f(x):
        if x > 0:
            return f(x - 1)

The compiler determines that f is free in f.  (It's recursive.)  If f
is free in f, in the absence of the global decl, the body of outer
must allocate fresh storage (a cell) for f each time outer is called
and add a reference to that cell to f's closure.

If f is declared global in outer, then it ought to be treated as a
global in nested scopes, too.  In general terms, a free variable
should use the binding found in the nearest enclosing scope.  If the
nearest enclosing scope has a global binding, then the reference is
global. 

If I fix this problem, the recursion benchmark shouldn't be any slower
than a normal function call.

The second interesting thing to say is that frame allocation and
dealloc is probably more expensive than it needs to be in the current
implementation.  The frame object has a new f_closure slot that holds
a tuple that is freshly allocated every time the frame is allocated.
(Unless the closure is empty, then f_closure is just NULL.)

The extra tuple allocation can probably be done away with by using the
same allocation strategy as locals & stack.  If the f_localsplus array
holds cells + frees + locals + stack, then a new frame will never
require more than a single malloc (and often not even that).

Jeremy