[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
Emanuele Olivetti
emanuele at relativita.com
Sun Mar 23 04:20:28 EDT 2008
James Philbin wrote:
> OK, i've written a simple benchmark which implements an elementwise
> multiply (A=B*C) in three different ways (standard C, intrinsics, hand
> coded assembly). On the face of things the results seem to indicate
> that the vectorization works best on medium sized inputs. If people
> could post the results of running the benchmark on their machines
> (takes ~1min) along with the output of gcc --version and their chip
> model, that wd be v useful.
>
> It should be compiled with: gcc -msse -O2 vec_bench.c -o vec_bench
>
CPU: Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz
(macbook, intel core 2 duo)
gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
(ubuntu gutsy gibbon 7.10)
$ ./vec_bench
Testing methods...
All OK
Problem size Simple
Intrin Inline
100 0.0003ms (100.0%) 0.0002ms ( 68.3%) 0.0002ms
( 75.6%)
1000 0.0023ms (100.0%) 0.0018ms ( 76.7%) 0.0020ms
( 87.1%)
10000 0.0361ms (100.0%) 0.0193ms ( 53.4%) 0.0338ms
( 93.7%)
100000 0.2839ms (100.0%) 0.1351ms ( 47.6%) 0.0937ms
( 33.0%)
1000000 4.2108ms (100.0%) 4.1234ms ( 97.9%) 4.0886ms
( 97.1%)
10000000 45.3192ms (100.0%) 45.5359ms (100.5%) 45.3466ms
(100.1%)
Note that there is some variance in the results. Here is a second run to
have
an idea (look at Inline, size=10000):
$ ./vec_bench
Testing methods...
All OK
Problem size Simple
Intrin Inline
100 0.0003ms (100.0%) 0.0002ms ( 69.5%) 0.0002ms
( 74.1%)
1000 0.0024ms (100.0%) 0.0018ms ( 75.9%) 0.0020ms
( 86.4%)
10000 0.0324ms (100.0%) 0.0186ms ( 57.3%) 0.0226ms
( 69.6%)
100000 0.2840ms (100.0%) 0.1171ms ( 41.2%) 0.0939ms
( 33.1%)
1000000 4.4034ms (100.0%) 4.3657ms ( 99.1%) 4.0465ms
( 91.9%)
10000000 44.4854ms (100.0%) 43.9502ms ( 98.8%) 43.6824ms
( 98.2%)
HTH
Emanuele
More information about the NumPy-Discussion
mailing list