[Numpy-discussion] Python ctypes and OpenMP mystery
Eric Carlson
ecarlson at eng.ua.edu
Sat Feb 12 15:19:39 EST 2011
Hello All,
I have been toying with OpenMP through f2py and ctypes. On the whole,
the results of my efforts have been very encouraging. That said, some
results are a bit perplexing.
I have written identical routines that I run directly as a C-derived
executable, and through ctypes as a shared library. I am running the
tests on a dual-Xeon Ubuntu system with 12 cores and 24 threads. The C
executable is SLIGHTLY faster than the ctypes at lower thread counts,
but the C eventually has a speedup ratio of 12+, while the python caps
off at 7.7, as shown below:
threads C-speedup Python-speedup
1 1 1
2 2.07 1.98
3 3.1 2.96
4 4.11 3.93
5 4.97 4.75
6 5.94 5.54
7 6.83 6.53
8 7.78 7.3
9 8.68 7.68
10 9.62 7.42
11 10.38 7.51
12 10.44 7.26
13 7.19 6.04
14 7.7 5.73
15 8.27 6.03
16 8.81 6.29
17 9.37 6.55
18 9.9 6.67
19 10.36 6.9
20 10.98 7.01
21 11.45 6.97
22 11.92 7.1
23 12.2 7.08
These ratios are quite consistent from 100KB double arrays to 100MB
double arrays, so I do not think it reflects a Python overhead issue.
There is no question the routine is memory bandwidth constrained, and I
feel lucky to squeeze the eventual 12+ ratio, but I am very perplexed as
to why the performance of the Python-invoked routine seems to cap off.
Does anyone have an explanation for the caps? Am I seeing some effect
from ctypes, or the Python engine, or what?
Cheers,
Eric
More information about the NumPy-Discussion
mailing list