[Numpy-discussion] Fwd: Re: Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
Francesc Altet
faltet at carabos.com
Sun Mar 23 08:54:41 EDT 2008
Hi,
Here are my results for an AMD Opteron machine:
gcc version 4.1.3 (SUSE Linux) | Dual Core AMD Opteron 270 @ 2 GHz
$ gcc -msse -O2 vec_bench.c -o vec_bench
$ ./vec_bench
Testing methods...
All OK
Problem size Simple Intrin
Inline
100 0.0005ms (100.0%) 0.0003ms ( 48.5%) 0.0002ms
( 36.6%)
1000 0.0030ms (100.0%) 0.0023ms ( 75.3%) 0.0015ms
( 51.2%)
10000 0.0423ms (100.0%) 0.0387ms ( 91.5%) 0.0271ms
( 63.9%)
100000 0.6138ms (100.0%) 0.5978ms ( 97.4%) 0.5834ms
( 95.0%)
1000000 5.1213ms (100.0%) 5.0689ms ( 99.0%) 4.8771ms
( 95.2%)
10000000 51.6820ms (100.0%) 51.0792ms ( 98.8%) 51.1346ms
( 98.9%)
Using gcc version 4.2.1 (SUSE Linux) | Dual Core AMD Opteron 270 @ 2 GHz
$ gcc -msse -O2 vec_bench.c -o vec_bench
$ ./vec_bench
Testing methods...
All OK
Problem size Simple Intrin
Inline
100 0.0005ms (100.0%) 0.0003ms ( 49.0%) 0.0002ms
( 37.6%)
1000 0.0030ms (100.0%) 0.0023ms ( 75.4%) 0.0016ms
( 51.5%)
10000 0.0422ms (100.0%) 0.0387ms ( 91.7%) 0.0273ms
( 64.7%)
100000 0.5833ms (100.0%) 0.5190ms ( 89.0%) 0.4756ms
( 81.5%)
1000000 5.2302ms (100.0%) 4.6074ms ( 88.1%) 4.4121ms
( 84.4%)
10000000 50.2559ms (100.0%) 48.5409ms ( 96.6%) 49.2436ms
( 98.0%)
and for my laptop wearing a Pentium 4 Mobile @ 2 GHz:
Using version 4.1.3 (Ubuntu 4.1.2-16ubuntu2)
$ gcc -msse -O2 vec_bench.c -o vec_bench
$ ./vec_bench
Testing methods...
All OK
Problem size Simple Intrin
Inline
100 0.0002ms (100.0%) 0.0002ms ( 88.8%) 0.0002ms
(103.1%)
1000 0.0020ms (100.0%) 0.0015ms ( 75.9%) 0.0021ms
(103.5%)
10000 0.0198ms (100.0%) 0.1507ms (761.8%) 0.0205ms
(103.6%)
100000 1.6296ms (100.0%) 1.2533ms ( 76.9%) 1.2586ms
( 77.2%)
1000000 13.9571ms (100.0%) 12.8786ms ( 92.3%) 13.6840ms
( 98.0%)
10000000 135.3217ms (100.0%) 128.5314ms ( 95.0%)
128.5189ms ( 95.0%)
Using gcc version 4.2.1 (Ubuntu 4.2.1-5ubuntu4)
$ gcc -msse -O2 vec_bench.c -o vec_bench
$ ./vec_bench
Testing methods...
All OK
Problem size Simple Intrin
Inline
100 0.0002ms (100.0%) 0.0002ms ( 90.6%) 0.0002ms
(103.9%)
1000 0.0022ms (100.0%) 0.0017ms ( 75.2%) 0.0020ms
( 90.1%)
10000 0.0181ms (100.0%) 0.2540ms (1403.8%) 0.0319ms
(176.5%)
100000 1.2600ms (100.0%) 1.2710ms (100.9%) 1.3510ms
(107.2%)
1000000 12.9181ms (100.0%) 12.8595ms ( 99.5%) 12.9160ms
(100.0%)
10000000 128.8301ms (100.0%) 128.2373ms ( 99.5%)
128.4255ms ( 99.7%)
It is curious to see a venerable Pentium 4 running this code 2x faster
than a powerful AMD Opteron for small datasets (<10000), and with
similar speed than recent Core2 processors. I suppose the first level
cache in Pentiums is pretty fast.
Cheers,
--
Francesc Altet
-------------------------------------------------------
--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"
More information about the NumPy-Discussion
mailing list