[Numpy-discussion] numpy.linalg.eig memory issue with libatlas?

Johan Grönqvist johan.gronqvist at gmail.com
Wed Oct 7 03:30:22 EDT 2009


[I am resending this as the previous attempt seems to have failed]

Hello List,

I am looking at memory errors when using numpy.linalg.eig().

Short version:

I had memory errors in numpy.linalg.eig(), and I have reasons (valgrind)
to believe these are due to writing to incorrect memory addresses in the
diagonalization routine zgeev, called by numpy.linalg.eig().

I realized that I had recently installed atlas, and now had several
lapack-like libraries, so I uninstalled atlas, and the issues seemed to
go away.

My question is: Could it be that some lapack/blas/atlas package I use is
incompatible with the numpy I use, and if so, is there a method to
diagnose this in a more reliable way?




Longer version:

The system used is an updated debian testing (squeeze), on amd64.
My program uses numpy, matplotlib, and a module compiled using cython.

I started getting errors from my program this week. Pdb and
print-statements tell me that the errors arise around the point where I
call numpy.linalg.eig(), but not every time. The type of error varies.
Most frequently a segmentation fault, but sometimes a matrix dimension
mismatch, and sometimes a message related to the python GC.

Valgrind tells me that something "impossible" happened, and that this is
probably due to invalid writes earlier during the program execution.
There seems to be two invalid writes after each program crash, and the
log looks like this (it only contains two invalid writes):

[...]
==6508== Invalid write of size 8
==6508==    at 0x92D2597: zunmhr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x920A42B: zlaqr3_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x9205D11: zlaqr0_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x91B0C4D: zhseqr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x911CA15: zgeev_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x881B81B: lapack_lite_zgeev (lapack_litemodule.c:590)
==6508==    by 0x4911D4: PyEval_EvalFrameEx (ceval.c:3612)
==6508==    by 0x491CE1: PyEval_EvalFrameEx (ceval.c:3698)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x490F17: PyEval_EvalFrameEx (ceval.c:3708)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x4DC991: function_call (funcobject.c:517)
==6508==  Address 0x67ab118 is not stack'd, malloc'd or (recently) free'd
==6508==
==6508== Invalid write of size 8
==6508==    at 0x92D25A8: zunmhr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x920A42B: zlaqr3_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x9205D11: zlaqr0_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x91B0C4D: zhseqr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x911CA15: zgeev_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x881B81B: lapack_lite_zgeev (lapack_litemodule.c:590)
==6508==    by 0x4911D4: PyEval_EvalFrameEx (ceval.c:3612)
==6508==    by 0x491CE1: PyEval_EvalFrameEx (ceval.c:3698)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x490F17: PyEval_EvalFrameEx (ceval.c:3708)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x4DC991: function_call (funcobject.c:517)
==6508==  Address 0x67ab110 is not stack'd, malloc'd or (recently) free'd
[...]
valgrind: m_mallocfree.c:248 (get_bszB_as_is): Assertion 'bszB_lo ==
bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 96, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.
[...]





Today I looked in my package installation logs to see what had changed
recently, and I noticed that I installed atlas (debian package
libatlas3gf-common) recently. I uninstalled that package, and now the
same program seems to have no memory errors.

The packages I removed from the system today were
libarpack2
libfltk1.1
libftgl2
libgraphicsmagick++3
libgraphicsmagick3
libibverbs1
libopenmpi1.3
libqrupdate1
octave3.2-common
octave3.2-emacsen
libatlas3gf-base
octave3.2



My interpretation is that I had several packages available containing
the diagonalization functionality, but that they differed subtly in
their interfaces. My recent installation of atlas made numpy use (the
incompatible) atlas instead of its previous choice, and removal of atlas
restored the situation to the state of last week.

Now for the questions: Is this a reasonable hypothesis?
Is it known? Can it be investigated more precisely by comparing versions
somehow?



Regards

/ johan




More information about the NumPy-Discussion mailing list