[Python-ideas] Remove GIL with CAS instructions?

Sturla Molden sturla at molden.no
Tue Oct 20 23:24:57 CEST 2009


The GIL is particularly evil on multiprocessor systems. Not just because 
it prevents parallel excution of Python code, but thread switches are 
defunct. This is because a thread that periodically releases and 
re-acquires the GIL (at checkintervals), more or less always wins. On a 
multiprocessor system, the thread it will continue to run its processor 
and grab the GIL back before a sleeping thread wakes up. The cooperative 
multithreading intended by using checkintervals only works properly on 
single-processor systems. Python threads only work properly on 
single-processor computers. On computers with multiple CPUs, one must 
set the affinity of the Python process is set to one CPU only for proper 
cooperative thread scheduling. This behaviour of the GIL is a PITA, but 
often overlooked - all debates seems to be about parallel scalability, 
though far less important. Today, most people have multicore desktop 
computers (e.g. mine have four cores). The Linux kernel got rid of the 
BKL a long time ago. It's time Python does the same.

The GIL also has its virtues. When calling a C extension, it is 
serialized by default unless the GIL is released.    

An attempt was made (many years ago) to replace the GIL with 
fine-grained locking. It slowed down serial code. That should come as no 
surprise, as OS mutexes and semaphores are expensive.

There are lock-free data structures of any conceivable sort. There are 
multithreaded garbage collectors in Java and .NET, that don't need a 
global lock. I've heard claims that removal of the GIL would require 
Python to use a garbage collector instead of reference counts. That is 
not true. Lock-free data structures and garbage collectors have all one 
thing in common: they use a compare-and-exchange (CAS) instruction 
present in most modern processors, e.g. CMPXCHG on x86. In fact, CAS is 
used to implement OS mutexes (sleeplocks) and the less expensive 
spinlocks. Without a CAS instruction for the platform, there would be no 
GIL. All platforms that has a working GIL has a working CAS.

Python could use the CAS instruction for lock-free management of 
reference counts. All built-in types (int, float, str, list, tuple, 
dict, set, etc) could be made lock-free using CAS. That would allow the 
interpreter to execute bytecode without holding the GIL. Only calls to C 
extensions would be serialized with the GIL.

Schematically, this is how refcounts could be managed without needing a 
lock:


/*
 * Returns 1 if an exhange was made, and (*addr == old) before
 * the exchange, otherwise returns 0.
 *
 * uses opcode CMPXCHG on x86
 *
 */
extern int Py_refcnt_CAS(Py_ssize_t *addr, Py_ssize_t old, Py_ssize_t 
new); 


inline void Py_DECREF(PyObject *obj)
{
    register Py_ssize_t refcnt;
    do {
        refcnt = obj->refcnt;
    } while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt-1));        
    refcnt = obj->refcnt;
    if (refcnt == 0) {
        if(Py_refcnt_CAS(&obj->ob_refcnt, 0, -1))
        {   
            /* deallocate the object */
        }
    }
}


inline void Py_INCREF(PyObject *obj)
{
    register Py_ssize_t refcnt;
    refcnt = obj->refcnt;
    if (refcnt >= 0) {
        do {
            if((refcnt = obj->refcnt)<0) break;
        } while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt+1));
    }
}
  




Regards,
Sturla Molden










More information about the Python-ideas mailing list