[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

Sat Mar 22 14:07:41 EDT 2008

On Sat, Mar 22, 2008 at 11:43 AM, Neal Becker <ndbecker2 at gmail.com> wrote:

> James Philbin wrote:
>
> > Personally, I think that the time would be better spent optimizing
> > routines for single-threaded code and relying on BLAS and LAPACK
> > libraries to use multiple cores for more complex calculations. In
> > particular, doing some basic loop unrolling and SSE versions of the
> > ufuncs would be beneficial. I have some experience writing SSE code
> > using intrinsics and would be happy to give it a shot if people tell
> > me what functions I should focus on.
> >
> > James
>
> gcc keeps advancing autovectorization.  Is manual vectorization worth the
> trouble?
>

The inner loop of a unary ufunc looks like

/*UFUNC_API*/
static void
PyUFunc_d_d(char **args, intp *dimensions, intp *steps, void *func)
{
    intp i;
    char *ip1=args[0], *op=args[1];
    for(i=0; i<*dimensions; i++, ip1+=steps[0], op+=steps[1]) {
        *(double *)op = ((DoubleUnaryFunc *)func)(*(double *)ip1);
    }
}

While it might help the compiler to put the steps on the stack as constants,
it is hard to see how the compiler could vectorize the loop given the
information available and the fact that the input data might not be aligned
or contiguous. I suppose one could make a small local buffer, copy the data
into it, and then use sse, and that might actually help for some things. But
it is also likely that the function itself won't deal gracefully with
vectorized data.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080322/a5153c88/attachment.html>