Kill GIL

Sun Feb 13 00:23:03 EST 2005

Mike Meyer wrote:
> Jack Diederich <jack at performancedrivers.com> writes:
> 
> 
>>From reading this
>>thread every couple months on c.l.py for the last few years it is my 
>>opinion that the number of people who think threading is the only solution
>>to their problem greatly outnumber the number of people who actually have 
>>such a problem (like, nearly all of them).
> 
> 
> Here here. I find that threading typically introduces worse problems
> than it purports to solve.

In my experience, threads should mainly be used if you need asynchronous access 
to a synchronous operation. You spawn the thread to make the call, it blocks on 
the relevant API, then notifies the main thread when it's done.

Since any sane code will release the GIL before making the blocking call, this 
scales to multiple CPU's just fine.

Another justification for threads is when you have a multi-CPU machine, and a 
processor intensive operation you'd like to farm off to a separate CPU. In that 
case, you can treat the long-running operation like any other synchronous call, 
and farm off a thread that releases the GIL before start the time-consuming 
operation.

The only time the GIL "gets in the way" is if the long-running operation you 
want to farm off is itself implemented in Python.

However, consider this: threads run on a CPU, so if you want to run multiple 
threads concurrently, you either need multiple CPU's or a time-slicing scheduler 
that fakes it.

Here's the trick: PYTHON THREADS DO NOT RUN DIRECTLY ON THE CPU. Instead, they 
run on a Python Virtual Machine (or the JVM/CLR Runtime/whatever), which then 
runs on the CPU. So, if you want to run multiple Python threads concurrently, 
you need multiple PVM's or a timeslicing scheduler. The GIL represents the latter.

Now, Python *could* try to provide the ability to have multiple virtual machines 
in a single process in order to more effectively exploit multiple CPU's. I have 
no idea if Java or the CLR work that way - my guess it that they do (or 
something that looks the same from a programmer's POV). But then, they have 
Sun/Microsoft directly financing the development teams.

A much simpler suggestion is that if you want a new PVM, just create a new OS 
process to run another copy of the Python interpreter. The effectiveness of your 
multi-CPU utilisation will then be governed by your OS's ability to correctly 
schedule multiple processes rather than by the PVM's ability to fake multiple 
processes using threads (Hint: the former is likely to be much better than the 
latter).

Additionally, schemes for inter-process communication are often far more 
scaleable than those for inter-thread communication, since the former generally 
can't rely on shared memory (although good versions may utilise it for 
optimisation purposes). This means they can usually be applied to clustered 
computing rather effectively.

I would *far* prefer to see effort expended on making the idiom mentioned in the 
last couple of paragraphs simple and easy to use, rather than on a misguided 
effort to "Kill the GIL".

Cheers,
Nick.

P.S. If the GIL *really* bothers you, check out Stackless Python. As I 
understand it, it does its best to avoid the C stack (and hence threads) altogether.

-- 
Nick Coghlan   |   ncoghlan at email.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.skystorm.net