[Numpy-discussion] numpy arrays, data allocation and SIMD alignement

Steven G. Johnson stevenj at alum.mit.edu
Sat Aug 4 23:20:31 EDT 2007


On Aug 4, 3:24 am, "Anne Archibald" <peridot.face... at gmail.com> wrote:

> It seems to me two things are needed:
>
> * A mechanism for requesting numpy arrays with buffers aligned to an
> arbitrary power-of-two size (basically just using posix_memalign or
> some horrible hack on platforms that don't have it).

Right, you might as well allow the alignment (to a power-of-two size)
to be specified at runtime, as there is really no cost to implementing
an arbitrary alignment once you have any alignment.

Although you should definitely use posix_memalign (or the old
memalign) where it is available, unfortunately it's not implemented on
all systems.  e.g. MacOS X and FreeBSD don't have it, last I checked
(although in both cases their malloc is 16-byte aligned).  Microsoft VC
++ has a function called _aligned_malloc which is equivalent.

However, since MinGW (www.mingw.org) didn't have an _aligned_malloc
function, I wrote one for them a few years ago and put it in the
public domain (I use MinGW to cross-compile to Windows from Linux and
need the alignment).  You are free to use it as a fallback on systems
that don't have a memalign function if you want.  It should work on
any system where sizeof(void*) is a power of two (i.e. every extant
architecture, that I know of).  You can download it and its test
program from:
           ab-initio.mit.edu/~stevenj/align.c
           ab-initio.mit.edu/~stevenj/tstalign.c
It just uses malloc with a little extra padding as needed to align the
data, plus a copy of the original pointer so that you can still free
and realloc (using _aligned_free and _aligned_realloc).  It could be
made a bit more efficient, but it probably doesn't matter.

> * A macro (in C, and some way to get the same information from python,
> perhaps just "a.ctypes.data % 16") to test for common alignment cases;
> SIMD alignment and arbitrary power-of-two alignment are probably
> sufficient.

In C this is easy, just ((uintptr_t) pointer) % 16 == 0.

You might also consider a way to set the default alignment of numpy
arrays at runtime, rather than requesting aligned arrays
individually.  e.g. so that someone could come along at a later date
to a large program and just add one function call to make all the
arrays 16-byte aligned to improve performance using SIMD libraries.

Regards,
Steven G. Johnson




More information about the NumPy-Discussion mailing list