[Wheel-builders] manylinux: futex lock in capnproto hangs when running manylinux wheel on Ubuntu 16.04

Vitaly Kruglikov vkruglikov at numenta.com
Mon Jul 25 14:12:45 EDT 2016


When I build a manylinux wheel for nupuc.core
(https://github.com/numenta/nupic.core/pull/1001), all nupic.core and
nupic (https://github.com/numenta/nupic) tests pass on Ubuntu 14.04.
However, when I run nupic unit tests on Ubuntu 16.04, I always get a futex
lock hang at 
https://github.com/sandstorm-io/capnproto/blob/v0.5.3/c%2B%2B/src/kj/mutex.
c%2B%2B#L87 (a statically-linked copy of capnproto embedded in the python
extension .so that¹s part of the nupic.bindings manylinux wheel built by
nupic.core).

The extension build uses shared libs: libc.so.6, libstdc++.so.6, and
libgcc_s.so.1. Built and running against Python 2.7.11. I use a custom
manylinux docker image that¹s created from a fork of manylinux that
replaces centos5 with centos6.8
(https://github.com/numenta/manylinux/pull/1) as suggested in
https://mail.python.org/pipermail/wheel-builders/2016-July/000175.html.
This image has been pushed to quay.io/numenta/manylinux1_x86_64_centos6.

The traceback to the hang looks like this:

(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f2e042d7d77 in kj::_::Mutex::lock (this=0x42b6610,
exclusivity=<optimized out>)
    at 
/nupic.core/build/scripts/ThirdParty/Source/CapnProto/src/kj/mutex.c++:87
#2  0x00007f2e042a658e in
kj::MutexGuarded<kj::Own<capnp::SchemaLoader::Impl> >::lockExclusive
(this=0x42b6610)
    at 
/nupic.core/build/scripts/ThirdParty/Source/CapnProto/src/kj/mutex.h:300
#3  capnp::SchemaLoader::loadNative (this=0x42b6610,
nativeSchema=0x7f2e045c1f40 <capnp::schemas::s_b414112f4b6b1b45>)
    at 
/nupic.core/build/scripts/ThirdParty/Source/CapnProto/src/capnp/schema-load
er.c++:2069
#4  0x00007f2e04074761 in
capnp::SchemaLoader::loadCompiledTypeAndDependencies<NetworkProto>
(this=<optimized out>)
    at 
/nupic.core/build/scripts/ThirdParty/Install/include/capnp/schema-loader.h:
168
#5  capnp::SchemaParser::loadCompiledTypeAndDependencies<NetworkProto>
(this=<optimized out>)
    at 
/nupic.core/build/scripts/ThirdParty/Install/include/capnp/schema-parser.h:
83
#6  nupic::getBuilder<NetworkProto> (pyBuilder=0x7f2e0a0a55f0) at
/nupic.core/src/nupic/py_support/PyCapnp.hpp:77
#7  0x00007f2e03fcacd3 in nupic_Network_write__SWIG_2 (self=0x3253090,
pyBuilder=<optimized out>)
    at 
/nupic.core/build/scripts/src/nupic/bindings/engine_internalPYTHON_wrap.cxx
:5287
#8  0x00007f2e03ff878f in _wrap_Network_write__SWIG_2
(nobjs=nobjs at entry=2, swig_obj=swig_obj at entry=0x7ffc743cf0d0)
    at 
/nupic.core/build/scripts/src/nupic/bindings/engine_internalPYTHON_wrap.cxx
:27690
#9  0x00007f2e03ff8c05 in _wrap_Network_write (self=0x0, args=<optimized
out>)
    at 
/nupic.core/build/scripts/src/nupic/bindings/engine_internalPYTHON_wrap.cxx
:27812
#10 0x00000000004cb26d in PyEval_EvalFrameEx ()
#11 0x00000000004c22e5 in PyEval_EvalCodeEx ()


I am going to put in additional effort to isolate the issue to a small
code footprint from the vast body of code that it¹s in now. However, in
the meantime, I was hoping that someone might have run into something
similar and might share some helpful clues about the issue or possibly how
to debug it efficiently.

Many thanks,
Vitaly



More information about the Wheel-builders mailing list