global interpreter lock not working as it should

Tue Jul 30 17:44:08 EDT 2002

On Tuesday 30 July 2002 03:39 pm, brueckd at tbye.com wrote:
> On Tue, 30 Jul 2002, anton wilson wrote:
> > On Tuesday 30 July 2002 04:55 am, Martin v. Löwis wrote:
> > I would disagree.
> > In the Python documentation it states:
> >
> > "In order to support multi-threaded Python programs, the
> > interpreter regularly releases and reacquires the lock -- by default,
> > every ten bytecode instructions "
> >
> > What's the purpose of releasing an reaquiring the lock if no other
> > threads can run?
>
> Please consider that, because there's so many multithreaded Python
> programs that work quite well, it's rather unlikely that the threading
> implementation is outright broken. *Maybe* some improvement needs to be
> made, but from your posts it sounds more like you don't understand how
> things work at the C level, much less in Python. When the lock is released
> at the end of its regular interval, an *attempt* is made to reacquire it
> immediately, but there's no guarantee that the current thread will get it
> right away (and if another thread is already blocking on an attempt to get
> the lock then the other one will probably "win" most of the time anyway).
>

Maybe I am not being clear enough. I am concerned with a multi-threaded 
program that does not do any form of blocking on a Linux/Unix box. I DO 
expect a thread to block on the GIL every 10 byte codes. However, I have 
proved with my results that this does NOT happen. Any thread that is 
completely CPU bound will never give up the CPU for as long as 
1) it can run
2) it has work to do

I have proven this by even examining what happens within the interpreter with 
this code in ceval:

                                oldt = tstate;    /*<---- My code*/

                                if (PyThreadState_Swap(NULL) != tstate)
                                        Py_FatalError("ceval: tstate mix-up");

                                PyThread_release_lock(interpreter_lock);

                                /* Other threads may run now */
                                /*sched_yield();*/

                                PyThread_acquire_lock(interpreter_lock, 1);

                                if (PyThreadState_Swap(tstate) != NULL)
                                        Py_FatalError("ceval: orphan tstate");

                                if(tstate == oldt)    /*< ------- my code*/  
                                   printf("bad things have happened\n");

The great majority of the time, my print statement will be printed, meaning 
the GIL was not released.

 but the put-it-up-for-grabs-every-10-instructions functionality
> works just fine too. Consider:
>
> import threading, time
>
> COUNT = 3
> counters = [0] * COUNT
>
> def Worker(i):
>     while 1:
>         counters[i] += 1
>
> for i in range(COUNT):
>     threading.Thread(target=Worker, args=(i,)).start()
>
> while 1:
>     time.sleep(1.0)
>     print counters
>
> Here's some output:
> [162565, 176016, 165796]
> [329009, 327856, 333183]
> [497881, 496857, 498133]
> [665567, 679094, 643678]
> [810255, 845521, 811988]
> [968056, 1008142, 974790]
>
> Lo and behold, each thread is getting execution time, and nearly equal
> execution time at that!

There are several reasons why your program seems to work.
The first obvious reason is that the main thread sleeps. If you remove the 
sleep, you will see output that looks like this

[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
[0, 0, 0]

....(100+ times in all)...

[35499, 16419, 0]
[35499, 16419, 0]
[35499, 16419, 0]
[35499, 16419, 0]
[35499, 16419, 0]
[35499, 16419, 0]
[35499, 16419, 0]
[35499, 16419, 0]
[35499, 16419, 0]

...(100+ times in all) ....

[35499, 16419, 11556]
[35499, 16419, 11556]
[35499, 16419, 11556]
[35499, 16419, 11556]
[35499, 16419, 11556]
[35499, 16419, 11556]
[35499, 16419, 11556]

.....etc ....

This proves that the GIL does not block very often, and definitely not every 
10 byte codes. Think about this for a while.

I made a python interpreter with a sched_yield inbetween the acquire and 
release calls and my results looked like this with your sleep removed:

[1886, 1887, 0]
[1886, 1887, 0]
[1886, 1888, 0]
[1886, 1889, 0]
[1886, 1890, 0]
[1886, 1890, 0]
[1886, 1890, 0]
[1886, 1891, 0]
[1886, 1892, 0]
[1886, 1893, 0]
[1886, 1893, 0]

.....etc.......

Here, you will notice that there are constant changes. The GIL releasing is 
working as intended.

This brings me to the second reason that your program seems to work.
The Linux OS gives threads time-slices and when these time-slices are used up 
every 150 or so milliseconds, the process is forcibly removed from the CPU.
I presume that the reason your program seems to work is that in the time 
between when a thread releases the GIL and a thread tries to reaquire the 
GIL, it is forcibly removed from the CPU, and the other thread can now run. 
This would not be a rare occurence due to the high frequency at which the 
lock is released.

To prove this, I ran the program using sched_rr threads and changed the 
kernel so that round robin threads had no timeslice. In this case I saw this 
output once per second:

[0, 0, 0]
[0, 586126, 0]
[0, 1194859, 0]
[0, 1802596, 0]
[0, 2414027, 0]

Because the thread is never forced by the OS to relinquish the CPU, the 
thread will never ever lose the GIL. If the GIL was actually working corectly 
and blocking, the second thread would not retain the CPU past 10 byte codes.

So, the GIL does not blcok as intended, and this probably needs to be looked 
into.

Anton