[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

Thu Nov 8 12:16:26 EST 2007

On Nov 9, 2007 1:55 AM, Hans Meine <meine at informatik.uni-hamburg.de> wrote:
> Am Donnerstag, 08. November 2007 17:31:40 schrieb David Cournapeau:
> > This is because the current implementation for at least some of the
> > operations you are talking about are using PyArray_GenericReduce and
> > other similar functions, which are really high level (they use python
> > callable, etc..). This is easier, because you don't have to care about
> > anything (type, etc...), but this means that python runtime is
> > handling all the thing.
>
> I suspected that after your last post, but that's really bad for pointwise
> operations on a contiguous, aligned array.  A simple transpose() should
> really not make any difference here.
It should not, but in practise, it is not so easy to do. AFAIK, even
matlab has the same problem, with less difference, though. And they
have much more ressources than numpy. Not to say that we cannot do
better, but this takes time.
>
> > Instead, you should use a pure C
> > implementation (by pure C, I mean a C function totally independant of
> > python, only dealing with standard C types). This would already lead a
> > significant performance leap.
>
> AFAICS, it would be much more elegant and easier to implement this using C++
> templates.  We have a lot of experience with such a design from our VIGRA
> library ( http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/ ), which
> is an imaging library based on the STL concepts (and some necessary and
> convenient extensions for higher-dimensional arrays and a more flexible API).
>
> I am not very keen on writing hundreds of lines of C code for things that can
> easily be handled with C++ functors.  But I don't think that I am the first
> to propose this, and I know that C has some advantages (faster compilation;
> are there more? ;-) )
The advantages of C over C++ are much more than compilation speed: it
is actually portable, is simple, and easily callable from other
languages, 3 significant points that C++ does not meet at all.
Generally, I don't see much benefit for using low level C++ in
numerical code ; I consider most numerical-based container a total
failure (the fact that after 20 years, there is still nothing close to
a standard for even a matrix concept in C++ is for me quite striking).
I think C++ 's aspect which would be more useful in numpy is RAII (I
must confess that I personnally don't like C++ very much, in
particular when template are involved). This is only my opinion, I
don't know the opinion on other people on this.

>
> Yes, I think it does.  It probably depends on the sizes of the segments
> though.  If you have a multi-segment box-sub-range of a large dataset (3D
> volume or even only 2D), processing each contiguous "row" (column/...) at
> once within the inner loop definitely makes a difference.  I.e. as long as
> one dimension is not strided (and the data's extent in this dimension is not
> too small), it should be handled in the inner loop.  The other loops
> probably don't make a big difference.

If you have contiguous segments in subranges, then this is already
handled through the ufunc mechanism, but I don't know anything about
their actual implementation. Other people much more knowledgable than
me can give you more info here.

David