[Numpy-discussion] Scimark, icc, & Core 2 Duo
rex
rex at nosyntax.com
Wed Apr 18 17:00:34 EDT 2007
Keith Goodman <kwgoodman at gmail.com> [2007-04-18 12:46]:
> Thanks for that. For a variety of reasons I'm sticking with atlas.
> Does the parallel flag give you a big speed increase? I imagine it
> speeds things up more for larger matrices.
Surprisingly little. Below are the results of running Scimark with
various icc and gcc compiler flags set. The maximum Scimark score is
55% larger with icc than with gcc, though there may be flags other than
-O3 that would help gcc.
The optimized (for Xeon, not for Core 2 Duo) LINPACK that ships with MKL
runs at about 7 gigaflops max on my Core 2 Duo overclocked to 2.93 GHz
(it's different from LINPACK 1000). There is a Core 2 Duo optimized
version for OSX.
icc with no flags set:
> icc *.c -o no_flags
> ./noflags -large
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 605.84
FFT Mflops: 111.70 (N=1048576)
SOR Mflops: 868.52 (1000 x 1000)
MonteCarlo: Mflops: 120.37
Sparse matmult Mflops: 853.33 (N=100000, nz=1000000)
LU Mflops: 1075.27 (M=1000, N=1000)
> icc -fast *.c -o fast
> ./fast -large
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 785.63
FFT Mflops: 108.31 (N=1048576)
SOR Mflops: 985.81 (1000 x 1000)
MonteCarlo: Mflops: 848.81
Sparse matmult Mflops: 825.81 (N=100000, nz=1000000)
LU Mflops: 1159.42 (M=1000, N=1000)
> icc -fast -parallel *.c -o fast_para
IPO: performing multi-file optimizations
IPO: generating object file /tmp/ipo_iccvHW42m.o
scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED.
kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED.
kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED.
> ./fast_para -large
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 796.33
FFT Mflops: 111.70 (N=1048576)
SOR Mflops: 1001.91 (1000 x 1000)
MonteCarlo: Mflops: 855.57
Sparse matmult Mflops: 832.52 (N=100000, nz=1000000)
LU Mflops: 1179.94 (M=1000, N=1000)
> icc -fast -parallel -fno-alias *.c -o fast_para_noali
IPO: performing multi-file optimizations
IPO: generating object file /tmp/ipo_iccLUySDv.o
scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED.
kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED.
kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED.
> ./fast_para_noali -large
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 890.46
FFT Mflops: 109.70 (N=1048576)
SOR Mflops: 1488.28 (1000 x 1000)
MonteCarlo: Mflops: 855.57
Sparse matmult Mflops: 829.15 (N=100000, nz=1000000)
LU Mflops: 1169.59 (M=1000, N=1000)
> icc -fast -parallel -fno-alias -funroll-loops *.c -o fast_para_noali_unr
IPO: performing multi-file optimizations
IPO: generating object file /tmp/ipo_icc2KA1ui.o
scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED.
kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED.
kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED.
> ./fast_para_noali_unr -large
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 901.11
FFT Mflops: 113.48 (N=1048576)
SOR Mflops: 1510.28 (1000 x 1000)
MonteCarlo: Mflops: 865.92
Sparse matmult Mflops: 835.92 (N=100000, nz=1000000)
LU Mflops: 1179.94 (M=1000, N=1000)
> gcc -lm *.c -o ggc_none
> ./ggc_none -large
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 323.63
FFT Mflops: 83.56 (N=1048576)
SOR Mflops: 729.97 (1000 x 1000)
MonteCarlo: Mflops: 73.75
Sparse matmult Mflops: 329.26 (N=100000, nz=1000000)
LU Mflops: 401.61 (M=1000, N=1000)
> gcc -lm -O3 *.c -o ggc_O3
> ./gcc_O3 -large
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo at nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 580.55
FFT Mflops: 108.86 (N=1048576)
SOR Mflops: 842.27 (1000 x 1000)
MonteCarlo: Mflops: 115.70
Sparse matmult Mflops: 825.81 (N=100000, nz=1000000)
LU Mflops: 1010.10 (M=1000, N=1000)
-rex
More information about the NumPy-Discussion
mailing list