[Numpy-discussion] Matrix multiply benchmark

Thu May 4 14:51:06 EDT 2006

Hello all

My current work involves multiplication of some rather large matrices and
vectors, so I was wondering about benchmarks for figuring out how fast NumPy
is multiplying.

Matrix Toolkits for Java (MTJ) has a Java Native Interface for calling
through to ATLAS, MKL and friends. Some benchmark results with some
interesting graphs are here:

http://rs.cipr.uib.no/mtj/benchmark.html

There is also some Java code for measuring the number of floating point
operations per second here:

http://rs.cipr.uib.no/mtj/bench/NNIGEMM.html

I attempted to adapt this code to Python (suggestions and fixes welcome). My
attempt at benchmarking general matrix-matrix multiplication:

#!/usr/bin/env python
import time
import numpy as N
print N.__version__
print N.__config__.blas_opt_info
for n in range(50,501,10):
    A = N.rand(n,n)
    B = N.rand(n,n)
    C = N.empty_like(A)
    alpha = N.rand()
    beta = N.rand()
    if n < 100:
        r = 100
    else:
        r = 10
    # this gets the cache warmed up?
    for i in range(10):
        C[:,:] = N.dot(alpha*A, beta*B)
    t1 = time.clock()
    for i in range(r):
        C[:,:] = N.dot(alpha*A, beta*B)
    t2 = time.clock()
    s = t2 - t1
    f = 2 * (n + 1) * n * n;
    mfs = (f / (s * 1000000.)) * r;
    print '%d %f' % (n, mfs)

I think you might want to make r a bit larger to get more accurate results
for smaller matrices, depending on your CPU speed.

Is this benchmark comparable to the MTJ benchmark? NumPy might not be
performing the same operations. MTJ probably uses BLAS to do the scaling and
multiplication with one function call to dgemm.

By the way, is there a better way of assigning into a preallocated matrix?

I eagerly await your comments and/or results.

Regards,

Albert