[Numpy-discussion] Scimark, icc, & Core 2 Duo

Wed Apr 18 17:00:34 EDT 2007

Keith Goodman <kwgoodman at gmail.com> [2007-04-18 12:46]:
> Thanks for that. For a variety of reasons I'm sticking with atlas.
> Does the parallel flag give you a big speed increase? I imagine it
> speeds things up more for larger matrices.

Surprisingly little. Below are the results of running Scimark with
various icc and gcc compiler flags set. The maximum Scimark score is
55% larger with icc than with gcc, though there may be flags other than
-O3 that would help gcc.

The optimized (for Xeon, not for Core 2 Duo) LINPACK that ships with MKL
runs at about 7 gigaflops max on my Core 2 Duo overclocked to 2.93 GHz
(it's different from LINPACK 1000). There is a Core 2 Duo optimized
version for OSX.

icc with no flags set:
> icc *.c -o no_flags
> ./noflags -large
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:          605.84
FFT             Mflops:   111.70    (N=1048576)
SOR             Mflops:   868.52    (1000 x 1000)
MonteCarlo:     Mflops:   120.37
Sparse matmult  Mflops:   853.33    (N=100000, nz=1000000)
LU              Mflops:  1075.27    (M=1000, N=1000)

> icc -fast  *.c -o fast
> ./fast -large
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:          785.63
FFT             Mflops:   108.31    (N=1048576)
SOR             Mflops:   985.81    (1000 x 1000)
MonteCarlo:     Mflops:   848.81
Sparse matmult  Mflops:   825.81    (N=100000, nz=1000000)
LU              Mflops:  1159.42    (M=1000, N=1000)

> icc -fast  -parallel *.c -o fast_para
IPO: performing multi-file optimizations
IPO: generating object file /tmp/ipo_iccvHW42m.o
scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED.
kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED.
kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED.
> ./fast_para -large
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:          796.33
FFT             Mflops:   111.70    (N=1048576)
SOR             Mflops:  1001.91    (1000 x 1000)
MonteCarlo:     Mflops:   855.57
Sparse matmult  Mflops:   832.52    (N=100000, nz=1000000)
LU              Mflops:  1179.94    (M=1000, N=1000)

> icc -fast -parallel -fno-alias *.c -o fast_para_noali
IPO: performing multi-file optimizations
IPO: generating object file /tmp/ipo_iccLUySDv.o
scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED.
kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED.
kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED.
> ./fast_para_noali -large
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:          890.46
FFT             Mflops:   109.70    (N=1048576)
SOR             Mflops:  1488.28    (1000 x 1000)
MonteCarlo:     Mflops:   855.57

Sparse matmult  Mflops:   829.15    (N=100000, nz=1000000)
LU              Mflops:  1169.59    (M=1000, N=1000)
> icc -fast -parallel -fno-alias -funroll-loops *.c -o fast_para_noali_unr
IPO: performing multi-file optimizations
IPO: generating object file /tmp/ipo_icc2KA1ui.o
scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED.
kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED.
kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED.
> ./fast_para_noali_unr -large
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:          901.11
FFT             Mflops:   113.48    (N=1048576)
SOR             Mflops:  1510.28    (1000 x 1000)
MonteCarlo:     Mflops:   865.92
Sparse matmult  Mflops:   835.92    (N=100000, nz=1000000)
LU              Mflops:  1179.94    (M=1000, N=1000)

> gcc -lm *.c -o ggc_none
> ./ggc_none -large
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:          323.63
FFT             Mflops:    83.56    (N=1048576)
SOR             Mflops:   729.97    (1000 x 1000)
MonteCarlo:     Mflops:    73.75
Sparse matmult  Mflops:   329.26    (N=100000, nz=1000000)
LU              Mflops:   401.61    (M=1000, N=1000)

> gcc -lm -O3 *.c -o ggc_O3
> ./gcc_O3 -large
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:          580.55
FFT             Mflops:   108.86    (N=1048576)
SOR             Mflops:   842.27    (1000 x 1000)
MonteCarlo:     Mflops:   115.70
Sparse matmult  Mflops:   825.81    (N=100000, nz=1000000)
LU              Mflops:  1010.10    (M=1000, N=1000)

-rex