[Numpy-discussion] numpy.ndarrays as C++ arrays (wrapped with boost)
Christopher Barker
Chris.Barker at noaa.gov
Wed Sep 12 02:03:46 EDT 2007
Alexander Schmolck wrote:
> I just saw a closely related question posted one
> week ago here (albeit mostly from a swig context).
SWIG, Boost, whatever, the issues are similar. I guess what I'd love to
find is an array implementation that plays well with modern C++, and
also numpy.
> The code currently mostly just uses
> plain C double arrays passed around by pointers and I'd like to encapsulate
> this at least with something like stl::vector (or maybe valarray), but I've
> been wondering whether it might not make sense to use (slightly wrapped) numpy
> ndarrays --
Well, you can go back and forth between pointers to data blacks and
numpy arrays pretty easily. Where you thinking of doing this at the
python-C++ interface, or where you looking for something you could use
throughout your code. If the later, then I expect you don't want to use
a Python Object (unless you're using your code only from Python).
Our case is such: We want to have a nice array-like container that we
can use in C++ code that makes sense both for pure C++, and interacts
well with numpy arrays, as the code may be used in pure C++ app, but
also want to test it, script it, etc from Python.
> Also, ndarrays
> provide fairly rich functionality even at the C-API-level
Yes, the more I look into this, the more I'm impressed with numpy's design.
> but there doesn't seem to be one obvious choice, as
> there is for python.
Though there may be more than one good choice -- did you check out
boost::multiarray ? I didn't see that on your list.
> Things that would eventually come in handy, although they're not needed yet,
> are basic linear algebra and maybe two or three LAPACK-level functions (I can
> think of cholesky decomposition and SVD right now)
It would be nice to just have that (is MTL viable?), but writing
connection code to LAPACK for a few functions is not too bad.
> I think I could get all these things (and more) from scipy
> (and kin) with too much fuzz (although I haven't tried wavelet support yet)
> and it seems like picking together the same functionality from different C++
> libs would require considerably more work.
True -- do-able, but you'd have to do it!
> So my question is: might it make sense to use (a slightly wrapped)
> numpy.ndarray,
I guess what I'd like is a C++ array that was essentially an ndarray
without the pyobject stuff -- it could then be useful for C++, but also
easy to go back and forth between numpy and C++.
Ideally, there'd be something that already fits that bill. I see a
couple design issues that are key:
"View" semantics: numpy arrays have the idea of "views" of data built in
to them -- a given array can have it's own data block, or a be a view
onto another. This is quite powerful and flexible, and can save a lot a
data copying. The STL containers don't seem to have that concept at all.
std::valarray has utility classes that are views of a valarray, but they
really only useful as temporaries - they are not full-blown valarrays.
It looks like boost::multiarrays have a similar concept though
"""
The MultiArray concept defines an interface to hierarchically nested
containers. It specifies operations for accessing elements, traversing
containers, and creating views of array data.
"""
Another issue is dynamic typing. Templates provide a way to do generic
programming, but it's only generic at the code level. At compile time,
types are fixed, so you have a valarray<double>, for instance. numpy
arrays, on the other hand are of only one type - with the data type
specified as meta-data essentially. I don't know what mismatch this may
cause, but it's a pretty different way to structure things. (Side note:
I used this feature once to re-type an array in place, using the same
data block -- it was a nifty hack used to unpack an odd binary format).
Would it make sense to use this approach in C++? I suspect not -- all
your computational code would have to deal with it.
There is also the re-sizing issue. It's pretty handy to be able to
re-size arrays -- but then the data pointer can change, making it pretty
impossible to share the data. Maybe it would be helpful to have a
pointer-to-a-pointer instead, so that the shared pointer wouldn't
change. However, there could be uglyness with the pointer changing while
some other view is working with it.
> <http://thread.gmane.org/gmane.comp.python.c++/11559/focus=11560>
That does look promising -- and it used boost::multiarrays
The more I look at boost::multiarray, the better I like it (and the more
it looks like numpy) -- does anyone here have experience (good or bad)
with it?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
More information about the NumPy-Discussion
mailing list