segfault calling SSE enabled library from ctypes

Olivier Grisel olivier.grisel at ensta.org
Tue Nov 25 18:04:55 EST 2008


Replying to myself:

haypo found the origin of the problem. Apparently this problem stems
from a GCC bug [1]  (that should be fixed on x86 as of version 4.4).
The bug is that GCC does not always ensure the stack to be 16 bytes
aligned hence the "__m128 myvector" local variable in the previous
code might not be aligned. A workaround would be to align the stack
before calling the inner function as done here:

http://www.bitbucket.org/ogrisel/ctypes_sse/changeset/dc27626824b8/

New version of the previous C code:

<quote>

#include <stdio.h>
#include <emmintrin.h>


void wrapped_dummy_sse()
{
	// allocate an alligned vector of 128 bits
	__m128 myvector;

	printf("[dummy_sse] before calling setzero\n");
	fflush(stdout);

	// initialize it to 4 32 bits float valued to zeros
	myvector = _mm_setzero_ps();

	printf("[dummysse] after calling setzero\n");
	fflush(stdout);

	// display the content of the vector
	float* part = (float*) &myvector;
	printf("[dummysse] myvector = {%f, %f, %f, %f}\n",
			part[0], part[1], part[2], part[3]);
}

void dummy_sse(void)
{
	(void)__builtin_return_address(1); // to force call frame
	asm volatile ("andl $-16, %%esp" ::: "%esp");
	wrapped_dummy_sse();
}

int main()
{
	dummy_sse();
	return 0;
}

</quote>

[1] see e.g. for a nice summary of the issue
http://www.mail-archive.com/gcc%40gcc.gnu.org/msg33101.html

Another workaround would be to allocate myvector in the heap using
malloc / posix_memalign for instance.

Best,

-- 
Olivie



More information about the Python-list mailing list