[C++-sig] Writing to numpy array: good practices?

Tue Oct 11 17:01:39 CEST 2011

On 10/11/2011 10:39 AM, Jonas Einarsson wrote:
> Dear list,
>
> First, sorry if this is a double-post, I got confused with the
> subscription. Anyhow, I seek an opinion on good practice.
>
> I'd like to write simple programs that
> 1) (In Python) allocates numpy array,
> 2) (In C/C++) fills said numpy array with data.
>
> To this end I use Boost.Python to compile an extension module. I use the
> (possibly obsolete?) boost/python/numeric.hpp to allow passing an
> ndarray to my C-functions. Then I use the numpy C API directly to
> extract a pointer to the underlying data.
>
> This seemingly works very well, and I can check for correct dimensions
> and data types, etcetera.
>
> As documentation is scarce, I ask you if this is an acceptable
> procedure? Any pitfalls nearby?

This is very much an acceptable procedure.  It is a fairly low-level 
one, so you may want to be a little more careful in some respects (see 
below, and take a closer look at the Numpy C-API documentation).  But 
the principal is fine.

>
> Sample code: C++
>
> void fill_array(numeric::array& y)

I'd recommend just passing boost::python::object, and using 
PyArray_Check() to ensure that it is indeed an array; I really don't 
know how good the old numeric interface is at matching the right types. 
  But maybe I'm unnecessarily distrustful on that point.  Alternately, 
you could use one of the Numpy C-API functions to get an array from just 
about anything.

> {
> const int ndims = 2;
>
> // Get pointer to np array
> PyArrayObject* a = (PyArrayObject*)PyArray_FROM_O(y.ptr());

You might be leaking memory by throwing exceptions after this point; I'd 
suggest making "a" a boost::python::handle<>, which will automatically 
propagate a raised Python exception if you pass it a null pointer.

You should probably use something other than PyArray_FROM_O 
(PyArray_FROM_ANY or PyArray_FROM_OTF, for instance), to ensure that the 
flags on the numpy array are what you're expecting.  You can also have 
numpy do a check on the number of dimensions and the data type at the 
same time.

> if (a == NULL) {
>                throw std::exception("Could not get NP array.");
>        }
> if (a->descr->elsize != sizeof(double))
> {
> throw std::exception("Must be double ndarray");
> }
> if (a->nd != ndims)
> {
> throw std::exception("Wrong dimension on array.");
> }
> int rows = *(a->dimensions);
> int cols = *(a->dimensions+1);
> double* data = (double*)a->data;
>
> for (int i = 0; i < rows; i++)
> {
> for (int j = 0; j < cols; j++)
> {
> *(data + i*cols + j) = really_cool_function(i,j);

This works for most ndarrays (those that are C_CONTIGUOUS), but it won't 
work for all of them.  It will fail if you pass in an array you've 
called transpose() on, for instance.  What you really want to do is 
multiply the indices by the strides.  There are macros to do this in the 
Numpy C-API (PyArray_GETPTR).  I'd recommend you use those.

> }
> }
> }
>

<snip>

>
>
> Simplicity is a major factor for me. I don't want a complete wrapper for
> ndarrays, I just want to compute and shuffle data to Python for further
> processing. Letting Python handle allocation and garbage collection also
> seems like a good idea.
>

This may be the best approach for you now in that case.  There are also 
efforts underway to make the Numpy C-API available through a 
boost::python interface (https://svn.boost.org/svn/boost/sandbox/numpy), 
but it's not entirely stable yet.

Jim