Is LOAD_GLOBAL really that slow?

Rhamphoryncus rhamph at gmail.com
Thu Aug 30 16:31:08 EDT 2007


On Aug 30, 12:04 pm, "Chris Mellon" <arka... at gmail.com> wrote:
> On 8/30/07, Rhamphoryncus <rha... at gmail.com> wrote:
>
> > On Aug 29, 8:33 pm, Carsten Haese <cars... at uniqsys.com> wrote:
> > > On Wed, 2007-08-29 at 19:23 -0600, Adam Olsen wrote:
> > > There is no loop overhead here, and after subtracting the function call
> > > overhead, I get 31 nanoseconds per local lookup and 63 nanoseconds per
> > > global lookup, so local lookups are just about twice as fast as global
> > > lookups.
>
> __builtins__ lookups are an extra dict lookup slower than just global
> variables, too. Don't forget those.

Heh right, I forgot that.  That's what my benchmark was actually
testing.


> > > True, whether this difference is significant does depend on how many
> > > name lookups your code makes and how much else it's doing, but if you're
> > > doing a lot of number crunching and not a lot of I/O, the difference
> > > might be significant. Also, even if using local names is only slightly
> > > faster than using globals, it's still not slower, and the resulting code
> > > is still more readable and more maintainable. Using locals is a win-win
> > > scenario.
>
> > You get very small speed gains (assuming your code is doing anything
> > significant), for a lot of effort (trying out different options,
> > seeing if they're actually faster on different boxes.)  The
> > readability cost is there, even if it is smaller than many of the
> > other obfuscations people attempt.  If the speed gains were really
> > that important you should rewrite in C, where you'd get far greater
> > speed gains.
>
> I've doubled the speed of a processing loop by moving globals lookups
> out of the loop. Rewriting in C would have taken at least a day, even
> with Pyrex, localizing the lookup took about 2 minutes.

I'm guessing that was due to deep voodoo involving your processor's
pipeline, branch prediction, caching, etc.  I'd be interested in
seeing small examples of it though.


> > So it only seems worthwhile when you really, *really* need to get a
> > slight speedup on your box, you don't need to get any more speedup
> > than that, and C is not an option.
>
> It's not a huge optimization, but it's really easy to write if you
> don't mind adding fake kwargs to your functions. Just for the heck of
> it I also wrote a decorator that will re-write the bytecode so that
> any global that can be looked up at function definition will be
> re-written as a local (actually with LOAD_CONST). You can see it athttp://code.google.com/p/wxpsvg/wiki/GlobalsOptimization. Disclaimer:
> While I've tested it with a variety of functions and it's never broken
> anything, I've never actually used this for anything except an
> intellectual exercise. Use at your own risk.

Doubling the throughput while doing *real work* is definitely more
significant than maybe-or-maybe-not-quite-doubling without any real
work.

--
Adam Olsen, aka Rhamphoryncus




More information about the Python-list mailing list