[Numpy-discussion] Funded work on Numpy: proposed improvements and request for feedback

Tue Aug 4 03:37:06 EDT 2009

Hi Chuck,

Charles R Harris wrote:
>
>
>
>     To make purely computational code available to third parties, two
>     things are
>     needed:
>
>     1. the code itself needs to make the split explicit.
>     2. there needs to be support so that reusing those functionalities
>     is as
>       painless as possible, from a build point of view (Note: this is
>     almost
>       done in the upcoming numpy 1.4.0 as long as static linking is OK).
>
>
> Ah, it itches. This is certainly a worthy goal, but are there third
> parties who have expressed an interest in this? I mean, besides trying
> to avoid duplicate bits of code in Scipy.

Actually, I think that's what interests people around the Nipy project
the most. In particular, they need to reuse lapack and random quite a
bit, and for now, they just duplicate the code, with all the problems it
brings (duplication, lack of reliability as far as cross platform is
concerned, etc...).

>  
>
>
>     Splitting the code
>     ------------------
>
>     The amount of work is directly proportional to the amount of
>     functions to be
>     made available. The most obvious candidates are:
>
>     1. C99 math functions: a lot of this has already been done. In
>     particular math
>       constants, and special values support is already implemented.
>     Almost every
>       real function in numpy has a portable npy\_ implementation in C.
>     2. C99-like complex support: this naturally extends the previous
>     series.  The
>       main difficult is to support platforms without C99 complex
>     support, and the
>       corresponding C99 complex functions.
>     3. FFT code: there is no support to reuse FFT at the C level at
>     the moment.
>     4. Random: there is no support either
>     5. Linalg: idem.
>
>
> This is good. I think it should go along with code reorganization. The
> files are now broken up but I am not convinced that everything is yet
> where it should be.

Oh, definitely agreed. Another thing I would like in that spirit is to
split the numy headers like in Python itself: ndarrayobject.h would
still pull out everything (for backward compatibility reasons), but
people could only include a few headers if they want to. The rationale
for me is when I work on numpy itself: it is kind of stupid that
everytime I change the iterator structures, the whole numpy core has to
be rebuilt. That's quite wasteful and frustrating.

Another rationale is to be able to compile and test a very minimal core
numpy (the array object + a few things). I don't see py3k port being
possible in a foreseeable future without this.

>
> The complex support could be a major effort in its own right if we
> need to rewrite all the current functions. That said, it would be nice
> if the complex support was separated out like the current real
> support.  Test to go along with it would be helpful. This also ties in
> with having build support for many platforms.

Pauli has worked on this a little, and I have actually worked quite a
bit myself because I need a minimal support for windows 64 bits support
(to fake libgfortran). I have already implemented around 10 core complex
functions (cabs, cangle, creal, cimag, cexp, cpow, csqrt, clog, ccos,
csin, ctan), in such a way that native C99 complex are used on platforms
which support it, and there is a quite thorough test suite which tests
every special value condition (negative zero, inf, nan) as specified in
the C99 standard. Still lacks actual values (!), FPU exception and
branch cuts tests, and thorough tests on major platforms. And quite a
few other functions would be useful (hyperbolic trigo).

>
>
>
>     Build support
>     -------------
>
>     Once the code itself is split, there needs some support so that
>     the code can be
>     reused by third-parties. The following issues need to be solved:
>
>     1. Compile the computational code into shared or static library
>     2. Once built, making the libraries available to third parties
>     (distutils
>       issues). Ideally, it should work in installed, in-place builds,
>     etc\.\.\.
>       situations.
>     3. Versioning, ABI/API compatibility issues
>
>
> Trying to break out the build support itself might be useful.

What do you mean by breakout exactly ? I have documented the already
implemented support:

http://docs.scipy.org/doc/numpy/reference/distutils.html#building-installable-c-libraries

> I think this needs some thought. This would essentially be a c library
> of iterator code. C++ is probably an easier language for such things
> as it handles the classes and inlining automatically. Which is to say
> if I had to deal with a lot of iterators I might choose a different
> language for implementation.

C++ is not an option for numpy (and if I had to chose another language
compared to C, I would rather take D, or one language which outputs C in
the spirit of vala :) ). I think handling iterators in C is OK: sure, it
is a bit messy, because of the lack of namespace, template and operator
overloading, but the increased portability and implementation simplicity
worths it IMHO. When looking at ITK, I don't find it much more
readable/easy to use than our own.

I also need to think more about this after I finish reading the recent
presentation from A. Alexandrescu ("why iterators must go"). Maybe there
are some bits which could be applied to numpy iterators design.

> As to choosing a project, you should pick one that really interests
> you. How would you rank your own interest in these various proposals?

Well, that's not me to decide what I work on exactly here :) I must say
that almost all of the above are things which are needed for NumPy,
things which I have thought about, and would enjoy working on. Maybe
that's masochism, but I spent so much time understanding the C code in
numpy that I actually enjoy working on it now :)

cheers,

David