[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

Fri Nov 9 06:37:12 EST 2007

Am Freitag, 09. November 2007 00:16:12 schrieb Travis E. Oliphant:
> C-order is "special" in NumPy due to the history.  I agree that it
> doesn't need to be and we have taken significant steps in that
> direction.

Thanks for this clarifying statement.

> Right now, the fundamental problem is probably due to the 
> fact that the output arrays are being created as C-order arrays when the
> input is a Fortran order array.  Once that is changed then we can cause
> Fortran-order inputs to use the simplest path through the code.

That looks like a plan.  I have read about __array_wrap__ in your book, and it 
sounds exactly like what we need for our imaging library; this way we can 
ensure that the result of operations on images is again an image.  Here, it 
is crucial that the resulting image has Fortran order, too (for C/C++ 
algorithms to agree with the Python idea of the images' orientation).

> Fortran order arrays can be preserved but it takes a little extra work
> because backward compatible expectations had to be met.  See for example
> the order argument to the copy method of arrays.

What do you mean exactly (if you have something specific in mind at all)?  
Should the order be configurable for all operations, and default to "C" for 
backwards compatibility?
(BTW: copy() has no arguments yet in my numpybook, page 57.)

At last, I had a look at the relevant code for adding arrays; this is what I 
found: In core/src/umathmodule.c.src, the code templates for the relevant 
ufuncs (e.g. add) are defined, which will get expanded to e.g. DOUBLE_add.
This will get wrapped into an ufunc in 'InitOperators' using

  f = PyUFunc_FromFuncAndData(add_functions, add_data, add_signatures, 18,
                              2, 1, PyUFunc_Zero, "add",
                              "adds the arguments elementwise.", 0);

and inserted as "add" into a dictionary which is later passed to
PyArray_SetNumericOps.  This will install the ufunc in the static
NumericOps struct 'n_ops', which is used in the 'array_add' function
installed in the PyTypeObject.

Thus, it is the ufunc code (not suprisingly), which seems to call
DOUBLE_add for the inner loop:

| static void
| DOUBLE_add(char **args, intp *dimensions, intp *steps, void *func)
| {
|     register intp i;
|     intp is1=steps[0],is2=steps[1],os=steps[2], n=dimensions[0];
|     char *i1=args[0], *i2=args[1], *op=args[2];
|     for(i=0; i<n; i++, i1+=is1, i2+=is2, op+=os) {
|         *((double *)op)=*((double *)i1) + *((double *)i2);
|     }
| }

If I understood David correctly, he suggested an unstrided variant of
this inner loop, which simply uses pointer increments.  While this is
a good idea (also probably quite some work), the real thing bugging me is
that the above DOUBLE_add could (and should!) be called by the ufunc
framework in such a way that it is equally efficient for C and Fortran
arrays.  (I.e. with steps set to sizeof(double), without prior
copying/conversion.)  It could even be used with negative step sizes when the 
input and output arrays have different orders.  Disclaimer: So far, I did not 
look at the ufunc code yet, so I do not know which code paths exist.

BTW: What are the "void *func" in core/src/umathmodule.c.src arguments
for?  Shouldn't that be "void * /*func*/", since the parameters are
never used?

-- 
Ciao, /  /
     /--/
    /  / ANS