Is LOAD_GLOBAL really that slow?

Chris Mellon arkanes at gmail.com
Thu Aug 30 14:04:13 EDT 2007


On 8/30/07, Rhamphoryncus <rhamph at gmail.com> wrote:
> On Aug 29, 8:33 pm, Carsten Haese <cars... at uniqsys.com> wrote:
> > On Wed, 2007-08-29 at 19:23 -0600, Adam Olsen wrote:
> > There is no loop overhead here, and after subtracting the function call
> > overhead, I get 31 nanoseconds per local lookup and 63 nanoseconds per
> > global lookup, so local lookups are just about twice as fast as global
> > lookups.
> >

__builtins__ lookups are an extra dict lookup slower than just global
variables, too. Don't forget those.


> > True, whether this difference is significant does depend on how many
> > name lookups your code makes and how much else it's doing, but if you're
> > doing a lot of number crunching and not a lot of I/O, the difference
> > might be significant. Also, even if using local names is only slightly
> > faster than using globals, it's still not slower, and the resulting code
> > is still more readable and more maintainable. Using locals is a win-win
> > scenario.
>
> You get very small speed gains (assuming your code is doing anything
> significant), for a lot of effort (trying out different options,
> seeing if they're actually faster on different boxes.)  The
> readability cost is there, even if it is smaller than many of the
> other obfuscations people attempt.  If the speed gains were really
> that important you should rewrite in C, where you'd get far greater
> speed gains.
>

I've doubled the speed of a processing loop by moving globals lookups
out of the loop. Rewriting in C would have taken at least a day, even
with Pyrex, localizing the lookup took about 2 minutes.

> So it only seems worthwhile when you really, *really* need to get a
> slight speedup on your box, you don't need to get any more speedup
> than that, and C is not an option.
>

It's not a huge optimization, but it's really easy to write if you
don't mind adding fake kwargs to your functions. Just for the heck of
it I also wrote a decorator that will re-write the bytecode so that
any global that can be looked up at function definition will be
re-written as a local (actually with LOAD_CONST). You can see it at
http://code.google.com/p/wxpsvg/wiki/GlobalsOptimization. Disclaimer:
While I've tested it with a variety of functions and it's never broken
anything, I've never actually used this for anything except an
intellectual exercise. Use at your own risk.

> Fwiw, I posted this after developing yet another patch to optimize
> global lookups.  It does sometimes show an improvement on specific
> benchmarks, but overall it harms performance.  Looking into why, it
> doesn't make sense that a python dictionary lookup can have less cost
> than two simple array indexes, but there you go.  Python dictionaries
> are already damn fast.
>

I certainly believe that changes to pythons internals to try to make
LOAD_GLOBAL itself faster can be difficult, with even "obvious"
optimizations ending up slower. However, LOAD_FAST (and LOAD_CONST)
are faster than LOAD_GLOBAL and, for the reason you just stated, is
unlikely to change.



More information about the Python-list mailing list