[SciPy-User] Element-wise multiplication performance: why operations on sliced matrices are faster?

Fri Feb 10 04:27:32 EST 2012

I found an interesting fact about element-wise matrix multiplication
performance. I expected that full vectorized NumPy operations should always
be faster then the loops. But it is not always true. Look at the code:

 import numpy as np

import time

def calc(P):

for i in range(30):

P2 = P * P

 # full matrix

N = 4000

P = np.random.rand(N, N)

t0 = time.time()

calc(P)

t1 = time.time()

print " full matrix {:.5f} seconds".format(t1 - t0)

# sliced matrix

N = 2000

P = np.random.rand(N, N)

t0 = time.time()

for i in range(4):

calc(P)

t1 = time.time()

print " sliced matrix {:.5f} seconds".format(t1 - t0)

The results are:
full matrix 2.60245 seconds
sliced matrix 1.49381 seconds

I continue study of this case and found that the performance depends on
matrix size. Look at the attached plot. The x-axis is the dimension of
matrices, the y-axis is the execution time. Red line are the full matrix
executions times, blue line are the sliced matrix execution times. The plot
shows that the 2000 is the critical dimension that cause performance
degradation step. Could you, please, explain me this fact?

My configuration:
OS: Ubuntu 11.10 (oneiric)
CPU: Intel(R) Core(TM) i5 CPU M 480  @ 2.67GHz
CPUs: 4
Memory: 3746 MiB
L2 cache: 3072 KB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20120210/a712be42/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fig1.png
Type: image/png
Size: 35984 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20120210/a712be42/attachment.png>