[Numpy-discussion] Fwd: Multi-distribution Linux wheels - please test
Julian Taylor
jtaylor.debian at googlemail.com
Tue Feb 9 14:55:28 EST 2016
On 09.02.2016 20:52, Matthew Brett wrote:
> On Mon, Feb 8, 2016 at 7:59 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>> [...]
>>>>>> I can't replicate the segfault with manylinux wheels and scipy. On
>>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>>> from manylinux, like this:
>>>>>>
>>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>
>>>>>> ======================================================================
>>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>>> 197, in runTest
>>>>>> self.test(*self.arg)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>>> line 658, in eigenhproblem_general
>>>>>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 892, in assert_array_almost_equal
>>>>>> precision=decimal)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 713, in assert_array_compare
>>>>>> raise AssertionError(msg)
>>>>>> AssertionError:
>>>>>> Arrays are not almost equal to 4 decimals
>>>>>>
>>>>>> (mismatch 100.0%)
>>>>>> x: array([ 0., 0., 0.], dtype=float32)
>>>>>> y: array([ 1., 1., 1.])
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> Ran 1507 tests in 14.928s
>>>>>>
>>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>>
>>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>>> running the tests via:
>>>>>>
>>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>>
>>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>>> with a particular environment / test order.
>>>>>>
>>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>>> debugging than me:
>>>>>>
>>>>>> # Run this script inside a docker container started with this incantation:
>>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>>> apt-get update
>>>>>> apt-get install -y python curl
>>>>>> apt-get install libpython2.7 # this won't be necessary with next
>>>>>> iteration of manylinux wheel builds
>>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>>> python get-pip.py
>>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> I just tried this and on my laptop it completed without error.
>>>>>
>>>>> Best guess is that we're dealing with some memory corruption bug
>>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>>> other calls to openblas have happened (which is different depending on
>>>>> whether numpy is linked to openblas), and which core type openblas has
>>>>> detected.
>>>>>
>>>>> On my laptop, which *doesn't* show the problem, running with
>>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>>
>>>>> Guess the next step is checking what core type the failing machines
>>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>>> file?
>>>>
>>>> My machine (which does give the failure) gives
>>>>
>>>> Core: Core2
>>>>
>>>> with OPENBLAS_VERBOSE=2
>>>
>>> Yep, that allows me to reproduce it:
>>>
>>> root at f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>>> -c 'import scipy.linalg; scipy.linalg.test()'
>>> Core: Core2
>>> [...]
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> [...]
>>>
>>> So this is indeed sounding like an OpenBLAS issue... next stop
>>> valgrind, I guess :-/
>>
>> Here's the valgrind output:
>> https://gist.github.com/njsmith/577d028e79f0a80d2797
>>
>> There's a lot of it, but no smoking guns have jumped out at me :-/
>
> Could you send me instructions on replicating the valgrind run, I'll
> run on on the actual Core2 machine...
>
> Matthew
please also use this suppression file, should reduce the python noise
significantly but it might be a bit out of date. Used to work fine on an
ubuntu built python.
-------------- next part --------------
#
# This is a valgrind suppression file that should be used when using valgrind.
#
# Here's an example of running valgrind:
#
# cd python/dist/src
# valgrind --tool=memcheck --suppressions=Misc/valgrind-python.supp \
# ./python -E -tt ./Lib/test/regrtest.py -u bsddb,network
#
# You must edit Objects/obmalloc.c and uncomment Py_USING_MEMORY_DEBUGGER
# to use the preferred suppressions with Py_ADDRESS_IN_RANGE.
#
# If you do not want to recompile Python, you can uncomment
# suppressions for PyObject_Free and PyObject_Realloc.
#
# See Misc/README.valgrind for more information.
# all tool names: Addrcheck,Memcheck,cachegrind,helgrind,massif
{
ADDRESS_IN_RANGE/Invalid read of size 4
Memcheck:Addr4
fun:Py_ADDRESS_IN_RANGE
}
{
ADDRESS_IN_RANGE/Invalid read of size 4
Memcheck:Value4
fun:Py_ADDRESS_IN_RANGE
}
{
ADDRESS_IN_RANGE/Invalid read of size 8 (x86_64 aka amd64)
Memcheck:Value8
fun:Py_ADDRESS_IN_RANGE
}
{
ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value
Memcheck:Cond
fun:Py_ADDRESS_IN_RANGE
}
#
# Leaks (including possible leaks)
# Hmmm, I wonder if this masks some real leaks. I think it does.
# Will need to fix that.
#
{
Suppress leaking the GIL. Happens once per process, see comment in ceval.c.
Memcheck:Leak
fun:malloc
fun:PyThread_allocate_lock
fun:PyEval_InitThreads
}
{
Suppress leaking the GIL after a fork.
Memcheck:Leak
fun:malloc
fun:PyThread_allocate_lock
fun:PyEval_ReInitThreads
}
{
Suppress leaking the autoTLSkey. This looks like it shouldn't leak though.
Memcheck:Leak
fun:malloc
fun:PyThread_create_key
fun:_PyGILState_Init
fun:Py_InitializeEx
fun:Py_Main
}
{
Hmmm, is this a real leak or like the GIL?
Memcheck:Leak
fun:malloc
fun:PyThread_ReInitTLS
}
{
Handle PyMalloc confusing valgrind (possibly leaked)
Memcheck:Leak
fun:realloc
fun:_PyObject_GC_Resize
# fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING
}
{
Handle PyMalloc confusing valgrind (possibly leaked)
Memcheck:Leak
fun:malloc
fun:_PyObject_GC_New
# fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING
}
{
Handle PyMalloc confusing valgrind (possibly leaked)
Memcheck:Leak
fun:malloc
fun:_PyObject_GC_NewVar
# fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING
}
#
# Non-python specific leaks
#
{
Handle pthread issue (possibly leaked)
Memcheck:Leak
fun:calloc
fun:allocate_dtv
fun:_dl_allocate_tls_storage
fun:_dl_allocate_tls
}
{
Handle pthread issue (possibly leaked)
Memcheck:Leak
fun:memalign
fun:_dl_allocate_tls_storage
fun:_dl_allocate_tls
}
# Object Malloc/Free/Realloc stuff, very broad
{
ADDRESS_IN_RANGE/Invalid read of size 4
Memcheck:Addr4
fun:PyObject_Free*
}
{
ADDRESS_IN_RANGE/Invalid read of size 4
Memcheck:Value4
fun:PyObject_Free*
}
{
ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value
Memcheck:Cond
fun:PyObject_Free*
}
{
ADDRESS_IN_RANGE/Invalid read of size 4
Memcheck:Addr4
fun:PyObject_Realloc*
}
{
ADDRESS_IN_RANGE/Invalid read of size 4
Memcheck:Value4
fun:PyObject_Realloc*
}
{
ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value
Memcheck:Cond
fun:PyObject_Realloc*
}
# Object Malloc/Free/Realloc stuff for size 8
{
ADDRESS_IN_RANGE/Invalid read of size 8
Memcheck:Addr8
fun:PyObject_Free*
}
{
ADDRESS_IN_RANGE/Invalid read of size 8
Memcheck:Value8
fun:PyObject_Free*
}
{
ADDRESS_IN_RANGE/Invalid read of size 8
Memcheck:Addr8
fun:PyObject_Realloc*
}
{
ADDRESS_IN_RANGE/Invalid read of size 8
Memcheck:Value8
fun:PyObject_Realloc*
}
###
### All the suppressions below are for errors that occur within libraries
### that Python uses. The problems to not appear to be related to Python's
### use of the libraries.
###
{
Generic ubuntu ld problems
Memcheck:Addr8
obj:/lib/ld-2.4.so
obj:/lib/ld-2.4.so
obj:/lib/ld-2.4.so
obj:/lib/ld-2.4.so
}
{
Generic gentoo ld problems
Memcheck:Cond
obj:/lib/ld-2.3.4.so
obj:/lib/ld-2.3.4.so
obj:/lib/ld-2.3.4.so
obj:/lib/ld-2.3.4.so
}
{
DBM problems, see test_dbm
Memcheck:Param
write(buf)
fun:write
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
fun:dbm_close
}
{
DBM problems, see test_dbm
Memcheck:Value8
fun:memmove
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
fun:dbm_store
fun:dbm_ass_sub
}
{
DBM problems, see test_dbm
Memcheck:Cond
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
fun:dbm_store
fun:dbm_ass_sub
}
{
DBM problems, see test_dbm
Memcheck:Cond
fun:memmove
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
obj:/usr/lib/libdb1.so.2
fun:dbm_store
fun:dbm_ass_sub
}
{
GDBM problems, see test_gdbm
Memcheck:Param
write(buf)
fun:write
fun:gdbm_open
}
{
ZLIB problems, see test_gzip
Memcheck:Cond
obj:/lib/libz.so.1.2.3
obj:/lib/libz.so.1.2.3
fun:deflate
}
{
Avoid problems w/readline doing a putenv and leaking on exit
Memcheck:Leak
fun:malloc
fun:xmalloc
fun:sh_set_lines_and_columns
fun:_rl_get_screen_size
fun:_rl_init_terminal_io
obj:/lib/libreadline.so.4.3
fun:rl_initialize
}
###
### These occur from somewhere within the SSL, when running
### test_socket_sll. They are too general to leave on by default.
###
###{
### somewhere in SSL stuff
### Memcheck:Cond
### fun:memset
###}
###{
### somewhere in SSL stuff
### Memcheck:Value4
### fun:memset
###}
###
###{
### somewhere in SSL stuff
### Memcheck:Cond
### fun:MD5_Update
###}
###
###{
### somewhere in SSL stuff
### Memcheck:Value4
### fun:MD5_Update
###}
#
# All of these problems come from using test_socket_ssl
#
{
from test_socket_ssl
Memcheck:Cond
fun:BN_bin2bn
}
{
from test_socket_ssl
Memcheck:Cond
fun:BN_num_bits_word
}
{
from test_socket_ssl
Memcheck:Value4
fun:BN_num_bits_word
}
{
from test_socket_ssl
Memcheck:Cond
fun:BN_mod_exp_mont_word
}
{
from test_socket_ssl
Memcheck:Cond
fun:BN_mod_exp_mont
}
{
from test_socket_ssl
Memcheck:Param
write(buf)
fun:write
obj:/usr/lib/libcrypto.so.0.9.7
}
{
from test_socket_ssl
Memcheck:Cond
fun:RSA_verify
}
{
from test_socket_ssl
Memcheck:Value4
fun:RSA_verify
}
{
from test_socket_ssl
Memcheck:Value4
fun:DES_set_key_unchecked
}
{
from test_socket_ssl
Memcheck:Value4
fun:DES_encrypt2
}
{
from test_socket_ssl
Memcheck:Cond
obj:/usr/lib/libssl.so.0.9.7
}
{
from test_socket_ssl
Memcheck:Value4
obj:/usr/lib/libssl.so.0.9.7
}
{
from test_socket_ssl
Memcheck:Cond
fun:BUF_MEM_grow_clean
}
{
from test_socket_ssl
Memcheck:Cond
fun:memcpy
fun:ssl3_read_bytes
}
{
from test_socket_ssl
Memcheck:Cond
fun:SHA1_Update
}
{
from test_socket_ssl
Memcheck:Value4
fun:SHA1_Update
}
#jtaylor added
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:tupledealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:code_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyObject_GC_Del
fun:code_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:PyObject_GC_Del
fun:code_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:PyObject_GC_Del
fun:tupledealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyObject_GC_Del
fun:tupledealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:dict_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyObject_GC_Del
fun:dict_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:PyObject_GC_Del
fun:dict_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:collect.*
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyObject_GC_Del
fun:collect.*
}
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:PyObject_GC_Del
fun:collect.*
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:match_dealloc.*
fun:frame_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:subtype_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:frame_dealloc.*
fun:PyEval_EvalFrameEx
fun:PyEval_EvalFrameEx
fun:PyEval_EvalFrameEx
fun:PyEval_EvalFrameEx
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:PyFrame_ClearFreeList
fun:collect.*
fun:_PyObject_GC_New
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:PyFrame_ClearFreeList
fun:collect.*
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyObject_GC_Del
fun:PyFrame_ClearFreeList
fun:collect.*
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyObject_GC_Del
fun:subtype_dealloc.*
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyObject_GC_Del
fun:PyDict_Fini
fun:Py_Finalize
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyObject_GC_Del
fun:PyDict_Fini
fun:Py_Finalize
}
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:PyObject_GC_Del
fun:PyDict_Fini
fun:Py_Finalize
}
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:PyGrammar_RemoveAccelerators
fun:Py_Finalize
}
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:PyGrammar_RemoveAccelerators
fun:Py_Finalize
}
{
<insert_a_suppression_name_here>
Memcheck:Cond
fun:PyGrammar_RemoveAccelerators
fun:Py_Finalize
}
More information about the NumPy-Discussion
mailing list