[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Sebastian Haase seb.haase at gmail.com
Tue Feb 15 10:50:27 EST 2011


Hi,
I assume that someone here could maybe help me, and I'm hoping it's
not too much off topic.
I have 2 arrays of 2d point coordinates and would like to calculate
all pairwise distances as fast as possible.
Going from Python/Numpy to a (Swigged) C extension already gave me a
55x speedup.
(.9ms vs. 50ms for arrays of length 329 and 340).
I'm using gcc on Linux.
Now I'm wondering if I could go even faster !?
My hope that the compiler might automagically do some SSE2
optimization got disappointed.
Since I have a 4 core CPU I thought OpenMP might be an option;
I never used that, and after some playing around I managed to get
(only) 50% slowdown(!) :-(

My code in short is this:
(My SWIG typemaps use obj_to_array_no_conversion() from numpy.i)
-------<Ccode> ----------
void dists2d(
		   double *a_ps, int nx1, int na,
		   double *b_ps, int nx2, int nb,
		   double *dist, int nx3, int ny3)  throw (char*)
{
  if(nx1 != 2)	throw (char*) "a must be of shape (n,2)";
  if(nx2 != 2)	throw (char*) "b must be of shape (n,2)";
  if(nx3 != nb || ny3 != na)	throw (char*) "c must be of shape (na,nb)";

  double *dist_;
  int i, j;

#pragma omp parallel private(dist_, j, i)
  {
#pragma omp for nowait
	for(i=0;i<na;i++)
	  {
		//num_threads=omp_get_num_threads();  --> 4
		dist_ = dist+i*nb;                 // dists_  is  only introduced for OpenMP
		for(j=0;j<nb;j++)
		  {
			*dist_++  = sqrt( sq(a_ps[i*nx1]   - b_ps[j*nx2]) +
							  sq(a_ps[i*nx1+1] - b_ps[j*nx2+1]) );
		  }
	  }
  }
}
-------</Ccode> ----------
There is probably a simple mistake in this code - as I said I never
used OpenMP before.
It should be not too difficult to use OpenMP correctly here
 or -  maybe better -
is there a simple SSE(2,3,4) version that might be even better than OpenMP... !?

I supposed, that I did not get the #pragma omp lines right - any idea ?
Or is it in general not possible to speed this kind of code up using OpenMP !?

Thanks,
Sebastian Haase



More information about the NumPy-Discussion mailing list