how to handle cpu cache in python ( or fastest way to call a function once)
Steven D'Aprano
steve at pearwood.info
Sun Aug 23 07:59:27 EDT 2015
On Sun, 23 Aug 2015 04:10 pm, Yuzhi Xu wrote:
> I find out that python's VM seems to be very unfriendly with CPU-Cache.
Possibly. More comments below.
> for example:
> *******************************************
> import time
> a = range(500)
>
> sum(a)
>
> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
> pass
>
>
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
>
> *********************************************
> time:> 100us
On my machine, I get about 20-25 μs for this:
(a.py contains your code above)
[steve at ando ~]$ python2.7 a.py
21.9345092773
[steve at ando ~]$ python2.7 a.py
21.9345092773
[steve at ando ~]$ python2.7 a.py
24.0802764893
[steve at ando ~]$ python2.7 a.py
23.8418579102
> another case:
> *********************************************
> import time
> a = range(500)
>
> for i in range(100000):
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
>
> *********************************************
> time:~ 20us
Running this as b.py, I get times of around 15μs, a bit faster than the
first version, but not a factor of five times faster as you get.
[steve at ando ~]$ python2.7 b.py
[...]
15.0203704834
15.0203704834
25.0339508057
16.9277191162
20.0271606445
94.8905944824
15.9740447998
15.0203704834
15.0203704834
15.0203704834
15.0203704834
14.066696167
13.8282775879
15.0203704834
Traceback (most recent call last):
File "b.py", line 6, in <module>
sum(a)
KeyboardInterrupt
Above, you say:
> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
> pass
But remember that range() is a function, and so, yes, it may disturb the CPU
cache. What did you expect?
But I'm not sure how the CPU cache will interact with code in a high-level
language like Python. I suspect that more likely, it simply has something
to do with range(1000000) building an enormous list of integers.
Here's another version:
[steve at ando ~]$ cat c.py
import time
a = range(500)
sum(a)
for i in range(1000000):
pass
sum(a)
st = time.time()
sum(a)
print (time.time() - st)*1e6
[steve at ando ~]$ python2.7 c.py
15.9740447998
And one more:
[steve at ando ~]$ cat d.py
import time
a = range(500)
sum(a)
for i in xrange(1000000): # Use xrange instead of range
pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
[steve at ando ~]$ python2.7 d.py
22.1729278564
[steve at ando ~]$ python2.7 d.py
23.1266021729
So... on my machine, the difference between xrange and range makes no
difference: in both cases, calling sum() takes about 22μs.
But calling sum() twice speeds up the second call to about 16μs, or about
25% faster. (Not 80% faster, as you find.)
One last test:
[steve at ando ~]$ cat e.py
import time
a = range(500)
# Without warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Second time, with warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Add a delay.
for i in xrange(1000):
pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
st = time.time()
sum(a)
print (time.time() - st)*1e6
[steve at ando ~]$ python2.7 e.py
15.0203704834
15.0203704834
10.9672546387
10.9672546387
[steve at ando ~]$ python2.7 e.py
15.9740447998
12.8746032715
12.1593475342
10.9672546387
[steve at ando ~]$ python2.7 e.py
15.9740447998
20.0271606445
15.0203704834
15.9740447998
--
Steven
More information about the Python-list
mailing list