[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
Francesc Altet
faltet at carabos.com
Sun Mar 23 08:41:20 EDT 2008
A Sunday 23 March 2008, Charles R Harris escrigué:
> gcc --version: gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
> cpu: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
>
> Problem size Simple Intrin
> Inline
> 100 0.0002ms (100.0%) 0.0001ms ( 68.7%)
> 0.0001ms ( 74.8%)
> 1000 0.0015ms (100.0%) 0.0011ms ( 72.0%)
> 0.0012ms ( 80.4%)
> 10000 0.0154ms (100.0%) 0.0111ms ( 72.1%)
> 0.0122ms ( 79.1%)
> 100000 0.1081ms (100.0%) 0.0759ms ( 70.2%)
> 0.0811ms ( 75.0%)
> 1000000 2.7778ms (100.0%) 2.8172ms (101.4%)
> 2.7929ms ( 100.5%)
> 10000000 28.1577ms (100.0%) 28.7332ms (102.0%)
> 28.4669ms ( 101.1%)
I'm mystified about your machine requiring just 28s for completing the
10 million test, and most of the other, similar processors (some faster
than yours), in this thread falls pretty far from your figure. What
sort of memory subsystem are you using?
> It looks like memory access is the bottleneck, otherwise running 4
> floats through in parallel should go a lot faster.
Yes, that's probably right. This test is mainly measuring the memory
access speed of machines for large datasets. For small ones, my guess
is that the data is directly placed in caches, so there is no need to
transport them to the CPU prior to do the calculations. However, I'm
not sure whether this kind of optimizations for small datasets would be
very useful in practice (read general NumPy calculations), but I'm
rather sceptical about this.
Cheers,
--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"
More information about the NumPy-Discussion
mailing list