Is LOAD_GLOBAL really that slow?

Thu Aug 30 13:37:27 EDT 2007

On Aug 29, 8:33 pm, Carsten Haese <cars... at uniqsys.com> wrote:
> On Wed, 2007-08-29 at 19:23 -0600, Adam Olsen wrote:
> > It seems a common opinion that global access is much slower than local
> > variable access.  However, my benchmarks show a relatively small
> > difference:
>
> > ./python -m timeit -r 10 -v -s 'x = [None] * 10000
> > def foo():
> >   for i in x:
> >     list; list; list; list; list; list; list; list; list; list' 'foo()'
> > 10 loops -> 0.0989 secs100 loops -> 0.991 secs
> > raw times: 0.999 0.985 0.987 0.985 0.985 0.982 0.982 0.982 0.981 0.985
> > 100 loops, best of 10: 9.81 msec per loop
>
> > ./python -m timeit -r 10 -v -s 'x = [None] * 10000
> > def foo():
> >   mylist = list
> >   for i in x:
> >     mylist; mylist; mylist; mylist; mylist; mylist; mylist; mylist;
> > mylist; mylist' 'foo()'
> > 10 loops -> 0.0617 secs
> > 100 loops -> 0.61 secs
> > raw times: 0.603 0.582 0.582 0.583 0.581 0.583 0.58 0.583 0.584 0.582
> > 100 loops, best of 10: 5.8 msec per loop
>
> > So global access is about 70% slower than local variable access.  To
> > put that in perspective, two local variable accesses will take longer
> > than a single global variable access.
>
> > This is a very extreme benchmark though.  In practice, other overheads
> > will probably drop the difference to a few percent at most.  Not that
> > important in my book.
>
> Your comparison is flawed, because the function call and the inner for
> loop cause a measurement offset that makes the locals advantage seems
> smaller than it is. In the interest of comparing the times for just the
> local lookup versus just the global lookup, I think the following
> timings are more appropriate:

That's why I used far more name lookups, to minimize the overhead.

> $ python2.5 -mtimeit -r10 -s"y=42" -s"def f(x): pass" "f(42)"
> 1000000 loops, best of 10: 0.3 usec per loop
> $ python2.5 -mtimeit -r10 -s"y=42" -s"def f(x): x" "f(42)"
> 1000000 loops, best of 10: 0.331 usec per loop
> $ python2.5 -mtimeit -r 10 -s"y=42" -s"def f(x): y" "f(42)"
> 1000000 loops, best of 10: 0.363 usec per loop

On my box, the best results I got after several runs were 0.399,
0.447, 0464.  Even less difference than my original results.

> There is no loop overhead here, and after subtracting the function call
> overhead, I get 31 nanoseconds per local lookup and 63 nanoseconds per
> global lookup, so local lookups are just about twice as fast as global
> lookups.
>
> True, whether this difference is significant does depend on how many
> name lookups your code makes and how much else it's doing, but if you're
> doing a lot of number crunching and not a lot of I/O, the difference
> might be significant. Also, even if using local names is only slightly
> faster than using globals, it's still not slower, and the resulting code
> is still more readable and more maintainable. Using locals is a win-win
> scenario.

You get very small speed gains (assuming your code is doing anything
significant), for a lot of effort (trying out different options,
seeing if they're actually faster on different boxes.)  The
readability cost is there, even if it is smaller than many of the
other obfuscations people attempt.  If the speed gains were really
that important you should rewrite in C, where you'd get far greater
speed gains.

So it only seems worthwhile when you really, *really* need to get a
slight speedup on your box, you don't need to get any more speedup
than that, and C is not an option.

Fwiw, I posted this after developing yet another patch to optimize
global lookups.  It does sometimes show an improvement on specific
benchmarks, but overall it harms performance.  Looking into why, it
doesn't make sense that a python dictionary lookup can have less cost
than two simple array indexes, but there you go.  Python dictionaries
are already damn fast.

--
Adam Olsen, aka Rhamphoryncus