[Numpy-discussion] Fortran order arrays to and from numpy arrays

Fri Feb 23 16:39:23 EST 2007

On 23 Feb 2007 21:19:05 +0000, Alexander Schmolck <a.schmolck at gmx.net>
wrote:
>
> Hi,
>
> I'm currently puzzling over how to best convert (column major order)
> matlab
> arrays to numpy arrays and vice versa -- I'm looking for a solution that's
> simple, general and reasonably fast -- being also applicable to Numeric
> arrays
> would be a plus (I'd like to retain Numeric compatibility for the current
> version of the code).
>
> The solution that mlabwrap (http://mlabwrap.sf.net) currently employs to
> transfer arrays between matlab and python is to have a low level C++
> extension
> (mlabraw) copying the data into a new numpy array created with
> Pyarray_SimpleNew and using a simple or nested for-loop to copy the data
> around (only rank up to 2 is currently handled), e.g.
>
>     //matlab matrix -> numpy
>     for (mwIndex lCol = 0; lCol < lCols; lCol++) {
>       double *lDst = (double *)(lRetval->data) + lCol;
>       for (mwIndex lRow = 0; lRow < lRows; lRow++, lDst += lCols) {
>         *lDst = *lPR++;}}
>
>     //numpy 2d array -> matlab
>     npy_intp lRowDelta = pStrides[0]/sizeof(T);
>     npy_intp lColDelta = pStrides[1]/sizeof(T);
>     for (npy_intp lCol=0; lCol < pCols; lCol++) {
>       T *lSrc = pSrc + lCol*lColDelta;
>       for (npy_intp lRow=0; lRow < pRows; lRow++, lSrc += lRowDelta) {
>         *pDst++ = *lSrc;}}
>
> Some simple timing results suggest 2 things:
>
> 1. There is an enormous overhead associated with calling into matlab (e.g.
>    just using mlabraw to evaluate ``sin(1);`` in matlab, without accessing
> the
>    result the result is about 1000 times slower than ``math.sin(1)``. I
> have
>    no idea where this rather huge overhead comes from, but I suspect there
>    isn't much one can do about it.
>
> 2. Despite this overhead, copying around large arrays (e.g. >=1e5
> elements) in
>    above way causes notable additional overhead. Whilst I don't think
> there's
>    a sane way to avoid copying by sharing data between numpy and matlab
> the
>    copying could likely be done better.
>
> I guess the moral from 1 might be that ctypes will be the way to go for
> future version, I would assume that the overhead associated with ctypes
> doesn't matter that much and that it will be much less painful to maintain
> than having a C or C++ extension module, especially given that the
> Mathworks
> seem fond of changing the C API every minor release. I haven't used ctypes
> yet
> so I'd be interested to know what people think.
>
> The moral from 2. might be just to memcpy the data wholesale where
> possible
> and transpose and reshape afterwards or, in numpy, to set the
> fortran-order
> flag (and query fortran-contiguousness on copying to matlab) -- I suppose
> that specifying fortran order and tranposing/reshaping are pretty much
> equivalent? So matlab to numpy would look roughly so:
>
>   a_fortran_order = memcpy_raw_matlab_data_into_numpy_array(ptr)
>   a = a_fortran_order.T.reshape(a_fortran_order.shape)
>
> I would assume that this ought be on average more efficient than always
> transforming to C-contiguous (as some of the time the returned array will
> anyway be transposed afterwards or accessed in a fashion were column major
> is
> faster; the case that several operations that profit from row-major
> storage
> are carried out on the same array seems unlikely).
>
> Unfortunately I don't see an easy way to use the same approach the other
> way
> (matlab doesn't seem to offer much on the C level to manipulate arrays),
> so
> I'd presumably need something like:
>
>   stuff_into_matlab_array(a.T.reshape(a.shape).copy())
>
> the question is how to avoid doing two copies.
>
> Any comments appreciated,

The easiest way to deal with the ordering is to use the order keyword in
numpy:

In [4]: a = array([0,1,2,3]).reshape((2,2), order='F')

In [5]: a
Out[5]:
array([[0, 2],
       [1, 3]])

You would still need to get access to something to reshape, shared memory or
something, but the key is that you don't have to reorder the elements, you
just need the correct strides and offsets to address the elements in Fortran
order. I have no idea if this works in numeric.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20070223/8c828dc3/attachment.html>