[Numpy-discussion] Calling C code that assumes SIMD aligned data.

Francesc Alted faltet at gmail.com
Fri May 6 09:01:32 EDT 2016

2016-05-05 22:10 GMT+02:00 Øystein Schønning-Johansen <oysteijo at gmail.com>:

> Thanks for your answer, Francesc. Knowing that there is no numpy solution
> saves the work of searching for this. I've not tried the solution described
> at SO, but it looks like a real performance killer. I'll rather try to
> override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think
> that will do it? I'll try it and report back.

I don't think you need that much weaponry.  Just create an array with some
spare space for alignment.  Realize that you want a 64-byte aligned double
precision array.  With that, create your desired array + 64 additional
bytes (8 doubles):

In [92]: a = np.zeros(int(1e6) + 8)

In [93]: a.ctypes.data % 64
Out[93]: 16

and compute the elements to shift this:

In [94]: shift = (64 / a.itemsize) - (a.ctypes.data % 64) / a.itemsize

In [95]: shift
Out[95]: 6

now, create a view with the required elements less:

In [98]: b = a[shift:-((64 / a.itemsize)-shift)]

In [99]: len(b)
Out[99]: 1000000

In [100]: b.ctypes.data % 64
Out[100]: 0

and voila, b is now aligned to 64 bytes.  As the view is a copy-free
operation, this is fast, and you only wasted 64 bytes.  Pretty cheap indeed.


> Thanks,
> -Øystein
> On Thu, May 5, 2016 at 1:55 PM, Francesc Alted <faltet at gmail.com> wrote:
>> 2016-05-05 11:38 GMT+02:00 Øystein Schønning-Johansen <oysteijo at gmail.com
>> >:
>>> Hi!
>>> I've written a little code of numpy code that does a neural network
>>> feedforward calculation:
>>>     def feedforward(self,x):
>>>         for activation, w, b in zip( self.activations, self.weights,
>>> self.biases ):
>>>             x = activation( np.dot(w, x) + b)
>>> This works fine when my activation functions are in Python, however I've
>>> wrapped the activation functions from a C implementation that requires the
>>> array to be memory aligned. (due to simd instructions in the C
>>> implementation.) So I need the operation np.dot( w, x) + b to return a
>>> ndarray where the data pointer is aligned. How can I do that? Is it
>>> possible at all?
>> Yes.  np.dot() does accept an `out` parameter where you can pass your
>> aligned array.  The way for testing if numpy is returning you an aligned
>> array is easy:
>> In [15]: x = np.arange(6).reshape(2,3)
>> In [16]: x.ctypes.data % 16
>> Out[16]: 0
>> but:
>> In [17]: x.ctypes.data % 32
>> Out[17]: 16
>> so, in this case NumPy returned a 16-byte aligned array which should be
>> enough for 128 bit SIMD (SSE family).  This kind of alignment is pretty
>> common in modern computers.  If you need 256 bit (32-byte) alignment then
>> you will need to build your container manually.  See here for an example:
>> http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays
>> Francesc
>>> (BTW: the function works  correctly about 20% of the time I run it, and
>>> else it segfaults on the simd instruction in the the C function)
>>> Thanks,
>>> -Øystein
>> --
>> Francesc Alted
Francesc Alted
