[C++-sig] threading problem

manderso at cs.wisc.edu manderso at cs.wisc.edu
Thu Nov 20 22:34:08 CET 2003


Hi --

I've been using Boost.Python in my research work for about 6 months
now, and I've just recently bumped into a problem with threading
during PyFinalize.  It has me confused, and I'm not entirely sure what
path to pursue in trying to debug it.  I don't expect anyone to be
able to solve the problem from the information herein, but I'm hoping
that someone will have an intuition on what *might* be the problem;
candidates that I should try to track down.

On a particular set of input (so only very occasionally, but
repeatably), during interpreter shut down, the program halts while
waiting for a mutex.  From what the debugger is telling me,
specifically it appears to be when the 'str' object is being removed
from the global dictionary and destroyed, and the Boost.Python wrapper
to this is destroying itself.

My research software is not explicitly multi-threaded; any use of
pthreads comes directly from the usage in Boost.Python or the python
interpreter itself.  

The behavior is the same with Boost 1.30.2, and the Boost version in
CVS as of yesterday evening.  The behavior is the same with the
multi-threaded Boost.Python debug build and the non-multi-threaded
Boost.Python debug build.

My project is organized as a central shared library which has all the
main code but no wrappers, and a series of python modules which
individually wrap classes and each link against the main shared
library.

The python script I'm executing is very simple; only about 20 lines.
It comes down to a case whether or not I create a shared_ptr to an
image (and allocate (new) the image it points to). If I do, the
problem occurs.  If I don't do this one thing, the problem does not
occur.  I am highly doubtful that this allocation is *causing* the
problem, which suggests that some complex set of side effects are
contributing to it, thus making it very difficult to isolate.

The main library links against:

  gargamel: [canute/lib] > ldd libcanute.so
        libm.so.6 => /lib/libm.so.6 (0x4009d000)
        libboost_python-gcc-mt-d-1_31.so.1.31.0 => /u/m/a/manderso/research/lib/libboost_python-gcc-mt-d-1_31.so.1.31.0 (0x400bf000)
        libgdal.so => /u/m/a/manderso/research/lib/libgdal.so (0x40180000)
        libgcc_s.so.1 => /s/gcc-3.2.1/i386_rh72/lib/libgcc_s.so.1 (0x4043d000)
        libc.so.6 => /lib/libc.so.6 (0x40446000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x4057c000)
        librt.so.1 => /lib/librt.so.1 (0x40593000)
        libstdc++.so.5 => /s/gcc-3.2.1/i386_rh72/lib/libstdc++.so.5 (0x405a5000)
        libungif.so.4 => /usr/lib/libungif.so.4 (0x40660000)
        libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x40669000)
        libpng.so.2 => /usr/lib/libpng.so.2 (0x40688000)
        libz.so.1 => /usr/lib/libz.so.1 (0x406aa000)
        libdl.so.2 => /lib/libdl.so.2 (0x406b8000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x406bc000)

The backtrace from the debugger, slightly annotated with the
surrounding code which is executing at the time of the inability to
get the lock on the mutex is listed below.

  (gdb) bt
  #0  0x403db9b6 in __sigsuspend (set=0xbfffdd80) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
  #1  0x4002bd45 in __pthread_wait_for_restart_signal (self=0x400355a0) at pthread.c:978
  #2  0x4002dbc3 in __pthread_alt_lock (lock=<incomplete type>, self=0x0) at restart.h:34
  #3  0x40029fb6 in __pthread_mutex_lock (mutex=0x818ee74) at mutex.c:120
  #4  0x40b99c52 in ~pointer_holder (this=0x818eea4) at /u/m/a/manderso/research/include/boost/detail/lwm_pthreads.hpp:71

        scoped_lock(lightweight_mutex & m): m_(m.m_)
        {
-->          pthread_mutex_lock(&m_);
        }

  #5  0x40a78192 in instance_dealloc (inst=0x818ee8c) at /scratch/src/boost_cvs/libs/python/src/object/class.cpp:282

          for (instance_holder* p = kill_me->objects, *next; p != 0; p = next)
          {
              next = p->next();
-->           p->~instance_holder();
              instance_holder::deallocate(inst, dynamic_cast<void*>(p));
          }

  #6  0x0806009f in subtype_dealloc (self=0x818ee8c) at Objects/typeobject.c:470

    	/* Call the base tp_dealloc() */
    	assert(basedealloc);
-->     basedealloc(self);


  #7  0x080dd4b9 in PyDict_SetItem (op=0x816e50c, key=0x816d8d8, value=0x80f827c) at Objects/dictobject.c:549

    	if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
-->    		if (dictresize(mp, mp->ma_used*2) != 0)
    			return -1;
    	}


  #8  0x080e071e in _PyModule_Clear (m=0xfffffffc) at Objects/moduleobject.c:136

    	while (PyDict_Next(d, &pos, &key, &value)) {
    		if (value != Py_None && PyString_Check(key)) {
    			char *s = PyString_AsString(key);
    			if (s[0] != '_' || strcmp(s, "__builtins__") != 0) {
    				if (Py_VerboseFlag > 1)
    				    PySys_WriteStderr("#   clear[2] %s\n", s);
-->    				PyDict_SetItem(d, key, Py_None);
    			}
    		}
    	}

  #9  0x08095b73 in PyImport_Cleanup () at Python/import.c:322
  #10 0x080a15b6 in Py_Finalize () at Python/pythonrun.c:226
  #11 0x08054030 in Py_Main (argc=0, argv=0xbfffe0e4) at Modules/main.c:375
  #12 0x08053c0b in main (argc=-4, argv=0xfffffffc) at Modules/python.c:10
  #13 0x403c9336 in __libc_start_main (main=0x8053bf0 <main>, argc=1, ubp_av=0xbfffe0e4, init=0x8052c88 <_init>, fini=0x80e54b0 <_fini>, rtld_fini=0x4000d2fc <_dl_fini>, stack_end=0xbfffe0dc) at ../sysdeps/generic/libc-start.c:129

Does anyone have any idea where to go, or what might be the cause of
the problem?  The codebase is fairly massive, and small changes in
input or the execution of the program can cause the behavior to change
(and not break).  

If some information not included would be helpful, I would be happy to
post it.

Thanks for any help / feedback you can provide.

-- 
Matt C. Anderson

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
      No thank you Delmar.  A third of a gopher would only
      arouse my appetite without bedding her back down...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-




More information about the Cplusplus-sig mailing list