how to handle cpu cache in python ( or fastest way to call a function once)

Sun Aug 23 07:59:27 EDT 2015

On Sun, 23 Aug 2015 04:10 pm, Yuzhi Xu wrote:

> I find out that python's VM seems to be very unfriendly with CPU-Cache.

Possibly. More comments below.

> for example:
> *******************************************
> import time
> a = range(500)
> 
> sum(a)
> 
> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
>     pass
> 
> 
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
> 
> *********************************************
> time:> 100us

On my machine, I get about 20-25 μs for this:

(a.py contains your code above)

[steve at ando ~]$ python2.7 a.py
21.9345092773
[steve at ando ~]$ python2.7 a.py
21.9345092773
[steve at ando ~]$ python2.7 a.py
24.0802764893
[steve at ando ~]$ python2.7 a.py
23.8418579102

> another case:
> *********************************************
> import time
> a = range(500)
> 
> for i in range(100000):
>     st = time.time()
>     sum(a)
>     print (time.time() - st)*1e6
> 
> *********************************************
> time:~ 20us

Running this as b.py, I get times of around 15μs, a bit faster than the
first version, but not a factor of five times faster as you get.

[steve at ando ~]$ python2.7 b.py
[...]
15.0203704834
15.0203704834
25.0339508057
16.9277191162
20.0271606445
94.8905944824
15.9740447998
15.0203704834
15.0203704834
15.0203704834
15.0203704834
14.066696167
13.8282775879
15.0203704834
Traceback (most recent call last):
  File "b.py", line 6, in <module>
    sum(a)
KeyboardInterrupt

Above, you say:

> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
>     pass

But remember that range() is a function, and so, yes, it may disturb the CPU
cache. What did you expect?

But I'm not sure how the CPU cache will interact with code in a high-level
language like Python. I suspect that more likely, it simply has something
to do with range(1000000) building an enormous list of integers.

Here's another version:

[steve at ando ~]$ cat c.py
import time
a = range(500)
sum(a)
for i in range(1000000):
    pass
sum(a)
st = time.time()
sum(a)
print (time.time() - st)*1e6

[steve at ando ~]$ python2.7 c.py
15.9740447998

And one more:

[steve at ando ~]$ cat d.py
import time
a = range(500)
sum(a)
for i in xrange(1000000): # Use xrange instead of range
    pass
st = time.time()
sum(a)
print (time.time() - st)*1e6

[steve at ando ~]$ python2.7 d.py
22.1729278564
[steve at ando ~]$ python2.7 d.py
23.1266021729

So... on my machine, the difference between xrange and range makes no
difference: in both cases, calling sum() takes about 22μs.

But calling sum() twice speeds up the second call to about 16μs, or about
25% faster. (Not 80% faster, as you find.)

One last test:

[steve at ando ~]$ cat e.py
import time
a = range(500)
# Without warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Second time, with warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Add a delay.
for i in xrange(1000):
    pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
st = time.time()
sum(a)
print (time.time() - st)*1e6

[steve at ando ~]$ python2.7 e.py
15.0203704834
15.0203704834
10.9672546387
10.9672546387
[steve at ando ~]$ python2.7 e.py
15.9740447998
12.8746032715
12.1593475342
10.9672546387
[steve at ando ~]$ python2.7 e.py
15.9740447998
20.0271606445
15.0203704834
15.9740447998

-- 
Steven