PyThread_acquire_lock freezes at pthread_cond_wait although lock not occupied

Thu Feb 6 16:38:44 EST 2003

Hi!

Tim Peters wrote:
> [Gernot Hillier]
>> (gdb) bt
>> #0  0x4026cea9 in sigsuspend () from /lib/libc.so.6
>> #1  0x4003bd48 in __pthread_wait_for_restart_signal () from
>>     /lib/libpthread.so.0
> 
> That's enough.  If the it never gets a restart signal, it will stay there
> forever.  Whether it does get a restart signal is out of Python's hands,
> though -- that's up to the pthreads implementation.

Anything I can do to see if this signal arrives? 

Any nice gdb magic to get closer to the solution? 

But as this also seems to be a race condition (won't happen everytime, seems 
to be dependant on the machine speed, doesn't occur on any machine, ...), I 
doubt it will occur when running in gdb at all. *sigh*

>> (gdb) print *((pthread_lock*) interpreter_lock)
>> $7 = {locked = 0 '\0', lock_released = {__c_lock = {__status = 0,
>> __spinlock = 0}, __c_waiting = 0x0}, mut = {__m_reserved = 0,
>> __m_count = 0,
>> __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock =
>> 0}}}
>>
>> So the lock is indeed not locked!! I don't understand this at all.
> 
> "The lock" is ambiguous.  There's the pthreads mutex ("mut" in the above),
> and there's the Python GIL (implemented by that entire data structure). 
> As Jeremy said, it's normal for mut to be unlocked during a condvar wait.

I referred to interpreter_lock->locked.

What I wanted to say: when it blocks at cond_wait_signal() but "locked" 
being 0 at the same time it can't be that I've forgetten a ReleaseLock() 
anywhere in my source code so it can't be my fault, can it?

> Whether the GIL is locked is really irrelevant, because your stack trace
> shows that it's in the bowels of the platform condvar wait implementation,
> presumably waiting for a signal it's never going to get.

Just want to be sure it can't be a problem of my embedding/extending code...

>> It occurs on GNU/Linux using pthreads from glibc 2.2.5. I'm using Python
>> 2.2.1 but can't see that 2.2.2 will improve this somehow...
> 
> Trying Python 2.3a1 might.  The GIL under Linux is implemented via POSIX
> semaphores in 2.3, instead of via a condvar+mutex pair.

Hmmm... No real solution for me as this program must run on the Python 
stable tree. But anyway surely worth a try. Thx for the suggestion... 

> Then let me ask you an odd question:  are you using fork()?  If so, move
> heaven and earth to get rid of it.  Over a year ago a number of people
[...]

No, I'm not using fork(). Only threads via the CommonC++ via libpthreads.

> If you're not using fork(), I have no ideas other than to try a different
> OS,

I doubt that this will be an option for me (look at my mail address) ;-))

> or move to Python 2.3a1 and hope the same bug doesn't plague your
> platform semaphore implementation.

Hmmm... I'll firstly try to get rid of the CommonC++ library and implement 
my threads myself using pthreads. I already had some very odd bugs which 
nailed down to only occur when using the CommonC++ library and which 
disappeared when I used libpthreads directly.

So I'll give that a try in the next hours/days (let's see)...

> all-oses-are-buggy-ly y'rs  - tim

:)

-- 
Ciao,

Gernot