[Numpy-discussion] Fwd: Multi-distribution Linux wheels - please test

Julian Taylor jtaylor.debian at googlemail.com
Tue Feb 9 14:55:28 EST 2016


On 09.02.2016 20:52, Matthew Brett wrote:
> On Mon, Feb 8, 2016 at 7:59 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Mon, Feb 8, 2016 at 6:07 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>> [...]
>>>>>> I can't replicate the segfault with manylinux wheels and scipy.  On
>>>>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>>>>> from manylinux, like this:
>>>>>>
>>>>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>>
>>>>>> ======================================================================
>>>>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>>>>> 197, in runTest
>>>>>>     self.test(*self.arg)
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>>>>> line 658, in eigenhproblem_general
>>>>>>     assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 892, in assert_array_almost_equal
>>>>>>     precision=decimal)
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>>>>> line 713, in assert_array_compare
>>>>>>     raise AssertionError(msg)
>>>>>> AssertionError:
>>>>>> Arrays are not almost equal to 4 decimals
>>>>>>
>>>>>> (mismatch 100.0%)
>>>>>>  x: array([ 0.,  0.,  0.], dtype=float32)
>>>>>>  y: array([ 1.,  1.,  1.])
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> Ran 1507 tests in 14.928s
>>>>>>
>>>>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>>>>
>>>>>> This is a very odd error, which we don't get when running over a numpy
>>>>>> installed from source, linked to ATLAS, and doesn't happen when
>>>>>> running the tests via:
>>>>>>
>>>>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>>>>
>>>>>> So, something about the copy of numpy (linked to openblas) is
>>>>>> affecting the results of scipy (also linked to openblas), and only
>>>>>> with a particular environment / test order.
>>>>>>
>>>>>> If you'd like to try and see whether y'all can do a better job of
>>>>>> debugging than me:
>>>>>>
>>>>>> # Run this script inside a docker container started with this incantation:
>>>>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>>>>> apt-get update
>>>>>> apt-get install -y python curl
>>>>>> apt-get install libpython2.7  # this won't be necessary with next
>>>>>> iteration of manylinux wheel builds
>>>>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>>>>> python get-pip.py
>>>>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>>>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>>>>
>>>>> I just tried this and on my laptop it completed without error.
>>>>>
>>>>> Best guess is that we're dealing with some memory corruption bug
>>>>> inside openblas, so it's getting perturbed by things like exactly what
>>>>> other calls to openblas have happened (which is different depending on
>>>>> whether numpy is linked to openblas), and which core type openblas has
>>>>> detected.
>>>>>
>>>>> On my laptop, which *doesn't* show the problem, running with
>>>>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>>>>
>>>>> Guess the next step is checking what core type the failing machines
>>>>> use, and running valgrind... anyone have a good valgrind suppressions
>>>>> file?
>>>>
>>>> My machine (which does give the failure) gives
>>>>
>>>> Core: Core2
>>>>
>>>> with OPENBLAS_VERBOSE=2
>>>
>>> Yep, that allows me to reproduce it:
>>>
>>> root at f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
>>> -c 'import scipy.linalg; scipy.linalg.test()'
>>> Core: Core2
>>> [...]
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> [...]
>>>
>>> So this is indeed sounding like an OpenBLAS issue... next stop
>>> valgrind, I guess :-/
>>
>> Here's the valgrind output:
>>   https://gist.github.com/njsmith/577d028e79f0a80d2797
>>
>> There's a lot of it, but no smoking guns have jumped out at me :-/
> 
> Could you send me instructions on replicating the valgrind run, I'll
> run on on the actual Core2 machine...
> 
> Matthew


please also use this suppression file, should reduce the python noise
significantly but it might be a bit out of date. Used to work fine on an
ubuntu built python.
-------------- next part --------------
#
# This is a valgrind suppression file that should be used when using valgrind.
#
#  Here's an example of running valgrind:
#
#	cd python/dist/src
#	valgrind --tool=memcheck --suppressions=Misc/valgrind-python.supp \
#		./python -E -tt ./Lib/test/regrtest.py -u bsddb,network
#
# You must edit Objects/obmalloc.c and uncomment Py_USING_MEMORY_DEBUGGER
# to use the preferred suppressions with Py_ADDRESS_IN_RANGE.
#
# If you do not want to recompile Python, you can uncomment
# suppressions for PyObject_Free and PyObject_Realloc.
#
# See Misc/README.valgrind for more information.

# all tool names: Addrcheck,Memcheck,cachegrind,helgrind,massif
{
   ADDRESS_IN_RANGE/Invalid read of size 4
   Memcheck:Addr4
   fun:Py_ADDRESS_IN_RANGE
}

{
   ADDRESS_IN_RANGE/Invalid read of size 4
   Memcheck:Value4
   fun:Py_ADDRESS_IN_RANGE
}

{
   ADDRESS_IN_RANGE/Invalid read of size 8 (x86_64 aka amd64)
   Memcheck:Value8
   fun:Py_ADDRESS_IN_RANGE
}

{
   ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value
   Memcheck:Cond
   fun:Py_ADDRESS_IN_RANGE
}

#
# Leaks (including possible leaks)
#    Hmmm, I wonder if this masks some real leaks.  I think it does.
#    Will need to fix that.
#

{
   Suppress leaking the GIL.  Happens once per process, see comment in ceval.c.
   Memcheck:Leak
   fun:malloc
   fun:PyThread_allocate_lock
   fun:PyEval_InitThreads
}

{
   Suppress leaking the GIL after a fork.
   Memcheck:Leak
   fun:malloc
   fun:PyThread_allocate_lock
   fun:PyEval_ReInitThreads
}

{
   Suppress leaking the autoTLSkey.  This looks like it shouldn't leak though.
   Memcheck:Leak
   fun:malloc
   fun:PyThread_create_key
   fun:_PyGILState_Init
   fun:Py_InitializeEx
   fun:Py_Main
}

{
   Hmmm, is this a real leak or like the GIL?
   Memcheck:Leak
   fun:malloc
   fun:PyThread_ReInitTLS
}

{
   Handle PyMalloc confusing valgrind (possibly leaked)
   Memcheck:Leak
   fun:realloc
   fun:_PyObject_GC_Resize
#  fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING
}

{
   Handle PyMalloc confusing valgrind (possibly leaked)
   Memcheck:Leak
   fun:malloc
   fun:_PyObject_GC_New
#   fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING
}

{
   Handle PyMalloc confusing valgrind (possibly leaked)
   Memcheck:Leak
   fun:malloc
   fun:_PyObject_GC_NewVar
#   fun:COMMENT_THIS_LINE_TO_DISABLE_LEAK_WARNING
}

#
# Non-python specific leaks
#

{
   Handle pthread issue (possibly leaked)
   Memcheck:Leak
   fun:calloc
   fun:allocate_dtv
   fun:_dl_allocate_tls_storage
   fun:_dl_allocate_tls
}

{
   Handle pthread issue (possibly leaked)
   Memcheck:Leak
   fun:memalign
   fun:_dl_allocate_tls_storage
   fun:_dl_allocate_tls
}

# Object Malloc/Free/Realloc stuff, very broad
{
   ADDRESS_IN_RANGE/Invalid read of size 4
   Memcheck:Addr4
   fun:PyObject_Free*
}

{
   ADDRESS_IN_RANGE/Invalid read of size 4
   Memcheck:Value4
   fun:PyObject_Free*
}

{
   ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value
   Memcheck:Cond
   fun:PyObject_Free*
}

{
   ADDRESS_IN_RANGE/Invalid read of size 4
   Memcheck:Addr4
   fun:PyObject_Realloc*
}

{
   ADDRESS_IN_RANGE/Invalid read of size 4
   Memcheck:Value4
   fun:PyObject_Realloc*
}

{
   ADDRESS_IN_RANGE/Conditional jump or move depends on uninitialised value
   Memcheck:Cond
   fun:PyObject_Realloc*
}

# Object Malloc/Free/Realloc stuff for size 8
{
   ADDRESS_IN_RANGE/Invalid read of size 8
   Memcheck:Addr8
   fun:PyObject_Free*
}

{
   ADDRESS_IN_RANGE/Invalid read of size 8
   Memcheck:Value8
   fun:PyObject_Free*
}

{
   ADDRESS_IN_RANGE/Invalid read of size 8
   Memcheck:Addr8
   fun:PyObject_Realloc*
}

{
   ADDRESS_IN_RANGE/Invalid read of size 8
   Memcheck:Value8
   fun:PyObject_Realloc*
}


###
### All the suppressions below are for errors that occur within libraries
### that Python uses.  The problems to not appear to be related to Python's
### use of the libraries.
###

{
   Generic ubuntu ld problems
   Memcheck:Addr8
   obj:/lib/ld-2.4.so
   obj:/lib/ld-2.4.so
   obj:/lib/ld-2.4.so
   obj:/lib/ld-2.4.so
}

{
   Generic gentoo ld problems
   Memcheck:Cond
   obj:/lib/ld-2.3.4.so
   obj:/lib/ld-2.3.4.so
   obj:/lib/ld-2.3.4.so
   obj:/lib/ld-2.3.4.so
}

{
   DBM problems, see test_dbm
   Memcheck:Param
   write(buf)
   fun:write
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   fun:dbm_close
}

{
   DBM problems, see test_dbm
   Memcheck:Value8
   fun:memmove
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   fun:dbm_store
   fun:dbm_ass_sub
}

{
   DBM problems, see test_dbm
   Memcheck:Cond
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   fun:dbm_store
   fun:dbm_ass_sub
}

{
   DBM problems, see test_dbm
   Memcheck:Cond
   fun:memmove
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   obj:/usr/lib/libdb1.so.2
   fun:dbm_store
   fun:dbm_ass_sub
}

{
   GDBM problems, see test_gdbm
   Memcheck:Param
   write(buf)
   fun:write
   fun:gdbm_open

}

{
   ZLIB problems, see test_gzip
   Memcheck:Cond
   obj:/lib/libz.so.1.2.3
   obj:/lib/libz.so.1.2.3
   fun:deflate
}

{
   Avoid problems w/readline doing a putenv and leaking on exit
   Memcheck:Leak
   fun:malloc
   fun:xmalloc
   fun:sh_set_lines_and_columns
   fun:_rl_get_screen_size
   fun:_rl_init_terminal_io
   obj:/lib/libreadline.so.4.3
   fun:rl_initialize
}

###
### These occur from somewhere within the SSL, when running
###  test_socket_sll.  They are too general to leave on by default.
###
###{
###   somewhere in SSL stuff
###   Memcheck:Cond
###   fun:memset
###}
###{
###   somewhere in SSL stuff
###   Memcheck:Value4
###   fun:memset
###}
###
###{
###   somewhere in SSL stuff
###   Memcheck:Cond
###   fun:MD5_Update
###}
###
###{
###   somewhere in SSL stuff
###   Memcheck:Value4
###   fun:MD5_Update
###}

#
# All of these problems come from using test_socket_ssl
#
{
   from test_socket_ssl
   Memcheck:Cond
   fun:BN_bin2bn
}

{
   from test_socket_ssl
   Memcheck:Cond
   fun:BN_num_bits_word
}

{
   from test_socket_ssl
   Memcheck:Value4
   fun:BN_num_bits_word
}

{
   from test_socket_ssl
   Memcheck:Cond
   fun:BN_mod_exp_mont_word
}

{
   from test_socket_ssl
   Memcheck:Cond
   fun:BN_mod_exp_mont
}

{
   from test_socket_ssl
   Memcheck:Param
   write(buf)
   fun:write
   obj:/usr/lib/libcrypto.so.0.9.7
}

{
   from test_socket_ssl
   Memcheck:Cond
   fun:RSA_verify
}

{
   from test_socket_ssl
   Memcheck:Value4
   fun:RSA_verify
}

{
   from test_socket_ssl
   Memcheck:Value4
   fun:DES_set_key_unchecked
}

{
   from test_socket_ssl
   Memcheck:Value4
   fun:DES_encrypt2
}

{
   from test_socket_ssl
   Memcheck:Cond
   obj:/usr/lib/libssl.so.0.9.7
}

{
   from test_socket_ssl
   Memcheck:Value4
   obj:/usr/lib/libssl.so.0.9.7
}

{
   from test_socket_ssl
   Memcheck:Cond
   fun:BUF_MEM_grow_clean
}

{
   from test_socket_ssl
   Memcheck:Cond
   fun:memcpy
   fun:ssl3_read_bytes
}

{
   from test_socket_ssl
   Memcheck:Cond
   fun:SHA1_Update
}

{
   from test_socket_ssl
   Memcheck:Value4
   fun:SHA1_Update
}

#jtaylor added
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:tupledealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:code_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyObject_GC_Del
   fun:code_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Value8
   fun:PyObject_GC_Del
   fun:code_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Value8
   fun:PyObject_GC_Del
   fun:tupledealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyObject_GC_Del
   fun:tupledealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:dict_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyObject_GC_Del
   fun:dict_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Value8
   fun:PyObject_GC_Del
   fun:dict_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:collect.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyObject_GC_Del
   fun:collect.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Value8
   fun:PyObject_GC_Del
   fun:collect.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:match_dealloc.*
   fun:frame_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:subtype_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:frame_dealloc.*
   fun:PyEval_EvalFrameEx
   fun:PyEval_EvalFrameEx
   fun:PyEval_EvalFrameEx
   fun:PyEval_EvalFrameEx
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:PyFrame_ClearFreeList
   fun:collect.*
   fun:_PyObject_GC_New
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:PyFrame_ClearFreeList
   fun:collect.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyObject_GC_Del
   fun:PyFrame_ClearFreeList
   fun:collect.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyObject_GC_Del
   fun:subtype_dealloc.*
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyObject_GC_Del
   fun:PyDict_Fini
   fun:Py_Finalize
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyObject_GC_Del
   fun:PyDict_Fini
   fun:Py_Finalize
}
{
   <insert_a_suppression_name_here>
   Memcheck:Value8
   fun:PyObject_GC_Del
   fun:PyDict_Fini
   fun:Py_Finalize
}
{
   <insert_a_suppression_name_here>
   Memcheck:Value8
   fun:PyGrammar_RemoveAccelerators
   fun:Py_Finalize
}
{
   <insert_a_suppression_name_here>
   Memcheck:Addr4
   fun:PyGrammar_RemoveAccelerators
   fun:Py_Finalize
}
{
   <insert_a_suppression_name_here>
   Memcheck:Cond
   fun:PyGrammar_RemoveAccelerators
   fun:Py_Finalize
}


More information about the NumPy-Discussion mailing list