"A Fundamental Turn Toward Concurrency in Software"

Sat Jan 8 02:58:07 EST 2005

Steve Horsley wrote:
> But my understanding is that the current Python VM is single-threaded 
> internally,
> so even if the program creates multiple threads, just one core will be 
> dividing
> its time between those "threads".

Not really.

The CPython interpreter does have a thing called the 'Global Interpreter Lock' 
which synchronises access to the internals of the interpreter. If that wasn't 
there, Python threads could corrupt the data structures. In order to do anything 
useful, Python code must hold this lock, which leads to the frequent 
misapprehension that Python is 'single-threaded'.

However, the threads created by the Python threading mechanism are real OS 
threads, and the work load can be distributed between different cores.

In practice, this doesn't happen for a pure Python program, since any running 
Python code must hold the interpreter lock. The Python threads end up getting 
timesliced instead of running in parallel. Genuine concurrency with pure Python 
requires running things in separate processes (to reliably get multiple 
instances of the Python interpreter up and running).

Python threads are mainly intended to help deal with 'slow' I/O operations like 
disk and network access - the C code that implements those operations *releases* 
the GIL before making the slow call, allowing other Python threads to run while 
waiting for the I/O call to complete. This behaviour means threading can give 
*big* performance benefits on even single-CPU machines, and is likely to be the 
biggest source of performance improvements from threading.

However, on multi-processor machines, it is also handy if a CPU-intensive 
operation can be handled on one core, while another core keeps running Python code.

Again, this is handled by the relevant extension releasing the GIL before 
performing its CPU-intensive operations and reacquiring the GIL when it is done.

So Python's concurrency is built in a couple of layers:

Python-level concurrency:
   Multiple processes for true concurrency
   Time-sliced concurrency within a process (based on the GIL)

C-level concurrency:
   True concurrency if GIL is released when not needed

In some cases, problems with multi-threading are caused by invocation of 
extensions which don't correctly release the GIL, effectively preventing *any* 
other Python threads from running (since the executing extension never releases it).

As an example, I frequently use SWIG to access hardware API's from Python. My 
standard 'exception translator' (which SWIG automatically places around every 
call to the extension) now looks something like:

%exception {
   Py_BEGIN_ALLOW_THREADS
   try {
     $action
   } except (...) {
     Py_BLOCK_THREADS
     SWIG_exception(SWIG_RuntimeError, "Unexpected exception")
   }
   Py_END_ALLOW_THREADS
}

The above means that every call into my extension releases the GIL 
automatically, and reacquires it when returning to Python. I usually don't call 
the Python C API from the extension, but if I did, I would need to reacquire the 
GIL with PyGILState_Ensure() before doing so.

Without those threading API calls in place, operations which access the hardware 
always block the entire program, even if the Python program is multi-threaded.

See here for some more info on Python's threading:
http://www.python.org/doc/2.4/api/threads.html

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at email.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.skystorm.net