Is LOAD_GLOBAL really that slow?

Thu Aug 30 16:44:59 EDT 2007

On 8/30/07, Rhamphoryncus <rhamph at gmail.com> wrote:
> On Aug 30, 12:04 pm, "Chris Mellon" <arka... at gmail.com> wrote:
> > On 8/30/07, Rhamphoryncus <rha... at gmail.com> wrote:
> >
> > > On Aug 29, 8:33 pm, Carsten Haese <cars... at uniqsys.com> wrote:
> > > > On Wed, 2007-08-29 at 19:23 -0600, Adam Olsen wrote:
> > > > There is no loop overhead here, and after subtracting the function call
> > > > overhead, I get 31 nanoseconds per local lookup and 63 nanoseconds per
> > > > global lookup, so local lookups are just about twice as fast as global
> > > > lookups.
> >
> > __builtins__ lookups are an extra dict lookup slower than just global
> > variables, too. Don't forget those.
>
> Heh right, I forgot that.  That's what my benchmark was actually
> testing.
>
>
> > > > True, whether this difference is significant does depend on how many
> > > > name lookups your code makes and how much else it's doing, but if you're
> > > > doing a lot of number crunching and not a lot of I/O, the difference
> > > > might be significant. Also, even if using local names is only slightly
> > > > faster than using globals, it's still not slower, and the resulting code
> > > > is still more readable and more maintainable. Using locals is a win-win
> > > > scenario.
> >
> > > You get very small speed gains (assuming your code is doing anything
> > > significant), for a lot of effort (trying out different options,
> > > seeing if they're actually faster on different boxes.)  The
> > > readability cost is there, even if it is smaller than many of the
> > > other obfuscations people attempt.  If the speed gains were really
> > > that important you should rewrite in C, where you'd get far greater
> > > speed gains.
> >
> > I've doubled the speed of a processing loop by moving globals lookups
> > out of the loop. Rewriting in C would have taken at least a day, even
> > with Pyrex, localizing the lookup took about 2 minutes.
>
> I'm guessing that was due to deep voodoo involving your processor's
> pipeline, branch prediction, caching, etc.  I'd be interested in
> seeing small examples of it though.
>

There's certainly deep voodoo involved in the function. For example, I
used to have a test for early exit but the branch was slower than the
extra 3 function calls prevented by the branch.  Very surprised to get
that result at such a high level, but I tested (over and over) and it
was very consistent. This was with psyco as well, so perhaps some
characteristic of the generated assembly.

Psyco of course is even less work and more benefit than localizing
globals, so thats my first step for "make it faster".

>
> > > So it only seems worthwhile when you really, *really* need to get a
> > > slight speedup on your box, you don't need to get any more speedup
> > > than that, and C is not an option.
> >
> > It's not a huge optimization, but it's really easy to write if you
> > don't mind adding fake kwargs to your functions. Just for the heck of
> > it I also wrote a decorator that will re-write the bytecode so that
> > any global that can be looked up at function definition will be
> > re-written as a local (actually with LOAD_CONST). You can see it athttp://code.google.com/p/wxpsvg/wiki/GlobalsOptimization. Disclaimer:
> > While I've tested it with a variety of functions and it's never broken
> > anything, I've never actually used this for anything except an
> > intellectual exercise. Use at your own risk.
>
> Doubling the throughput while doing *real work* is definitely more
> significant than maybe-or-maybe-not-quite-doubling without any real
> work.
>

I wouldn't actually recommend that anyone use this, I just wrote it for fun.