[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
James Philbin
philbinj at gmail.com
Sat Mar 22 19:03:18 EDT 2008
OK, i've written a simple benchmark which implements an elementwise
multiply (A=B*C) in three different ways (standard C, intrinsics, hand
coded assembly). On the face of things the results seem to indicate
that the vectorization works best on medium sized inputs. If people
could post the results of running the benchmark on their machines
(takes ~1min) along with the output of gcc --version and their chip
model, that wd be v useful.
It should be compiled with: gcc -msse -O2 vec_bench.c -o vec_bench
Here's two:
CPU: Core Duo T2500 @ 2GHz
gcc --version: gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
Problem size Simple Intrin Inline
100 0.0003ms (100.0%) 0.0002ms ( 67.7%) 0.0002ms ( 50.6%)
1000 0.0030ms (100.0%) 0.0021ms ( 69.2%) 0.0015ms ( 50.6%)
10000 0.0370ms (100.0%) 0.0267ms ( 72.0%) 0.0279ms ( 75.4%)
100000 0.2258ms (100.0%) 0.1469ms ( 65.0%) 0.1273ms ( 56.4%)
1000000 4.5690ms (100.0%) 4.4616ms ( 97.6%) 4.4185ms ( 96.7%)
10000000 47.0022ms (100.0%) 45.4100ms ( 96.6%) 44.4437ms ( 94.6%)
CPU: Intel Xeon E5345 @ 2.33Ghz
gcc --version: gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
Problem size Simple Intrin Inline
100 0.0001ms (100.0%) 0.0001ms ( 69.2%) 0.0001ms ( 77.4%)
1000 0.0010ms (100.0%) 0.0008ms ( 78.1%) 0.0009ms ( 86.6%)
10000 0.0108ms (100.0%) 0.0088ms ( 81.2%) 0.0086ms ( 79.6%)
100000 0.1131ms (100.0%) 0.0897ms ( 79.3%) 0.0872ms ( 77.1%)
1000000 5.2103ms (100.0%) 3.9153ms ( 75.1%) 3.8328ms ( 73.6%)
10000000 54.1815ms (100.0%) 51.8286ms ( 95.7%) 51.4366ms ( 94.9%)
James
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vec_bench.c
Type: text/x-csrc
Size: 4004 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080322/6f643ffc/attachment.c>
More information about the NumPy-Discussion
mailing list