[Numpy-discussion] numpy.ndarrays as C++ arrays (wrapped with boost)

Wed Sep 12 02:03:46 EDT 2007

Alexander Schmolck wrote:
>  I just saw a closely related question posted one
> week ago here (albeit mostly from a swig context).

SWIG, Boost, whatever, the issues are similar. I guess what I'd love to 
find is an array implementation that plays well with modern C++, and 
also numpy.

>  The code currently mostly just uses
> plain C double arrays passed around by pointers and I'd like to encapsulate
> this at least with something like stl::vector (or maybe valarray), but I've
> been wondering whether it might not make sense to use (slightly wrapped) numpy
> ndarrays --

Well, you can go back and forth between pointers to data blacks and 
numpy arrays pretty easily. Where you thinking of doing this at the 
python-C++ interface, or where you looking for something you could use 
throughout your code. If the later, then I expect you don't want to use 
a Python Object (unless you're using your code only from Python).

Our case is such: We want to have a nice array-like container that we 
can use in C++ code that makes sense both for pure C++, and interacts 
well with numpy arrays, as the code may be used in pure C++ app, but 
also want to test it, script it, etc from Python.

> Also, ndarrays
> provide fairly rich functionality even at the C-API-level

Yes, the more I look into this, the more I'm impressed with numpy's design.

> but there doesn't seem to be one obvious choice, as
> there is for python. 

Though there may be more than one good choice -- did you check out 
boost::multiarray ? I didn't see that on your list.

> Things that would eventually come in handy, although they're not needed yet,
> are basic linear algebra and maybe two or three LAPACK-level functions (I can
> think of cholesky decomposition and SVD right now)

It would be nice to just have that (is MTL viable?), but writing 
connection code to LAPACK for a few functions is not too bad.

> I think I could get all these things (and more) from scipy
> (and kin) with too much fuzz (although I haven't tried wavelet support yet)
> and it seems like picking together the same functionality from different C++
> libs would require considerably more work.

True -- do-able, but you'd have to do it!

> So my question is: might it make sense to use (a slightly wrapped)
> numpy.ndarray,

I guess what I'd like is a C++ array that was essentially an ndarray 
without the pyobject stuff -- it could then be useful for C++, but also 
easy to go back and forth between numpy and C++.

Ideally, there'd be something that already fits that bill. I see a 
couple design issues that are key:

"View" semantics: numpy arrays have the idea of "views" of data built in 
to them -- a given array can have it's own data block, or a be a view 
onto another. This is quite powerful and flexible, and can save a lot a 
data copying. The STL containers don't seem to have that concept at all. 
std::valarray has utility classes that are views of a valarray, but they 
really only useful as temporaries - they are not full-blown valarrays.

It looks like boost::multiarrays have a similar concept though
"""
The MultiArray concept defines an interface to hierarchically nested 
containers. It specifies operations for accessing elements, traversing 
containers, and creating views of array data.
"""

Another issue is dynamic typing. Templates provide a way to do generic 
programming, but it's only generic at the code level. At compile time, 
types are fixed, so you have a valarray<double>, for instance. numpy 
arrays, on the other hand are of only one type - with the data type 
specified as meta-data essentially. I don't know what mismatch this may 
cause, but it's a pretty different way to structure things. (Side note: 
I used this feature once to re-type an array in place, using the same 
data block -- it was a nifty hack used to unpack an odd binary format). 
Would it make sense to use this approach in C++? I suspect not -- all 
your computational code would have to deal with it.

There is also the re-sizing issue. It's pretty handy to be able to 
re-size arrays -- but then the data pointer can change, making it pretty 
impossible to share the data. Maybe it would be helpful to have a 
pointer-to-a-pointer instead, so that the shared pointer wouldn't 
change. However, there could be uglyness with the pointer changing while 
some other view is working with it.

> <http://thread.gmane.org/gmane.comp.python.c++/11559/focus=11560>

That does look promising -- and it used boost::multiarrays

The more I look at boost::multiarray, the better I like it (and the more 
it looks like numpy) -- does anyone here have experience (good or bad) 
with it?

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception