Parallelization on muli-CPU hardware?

Wed Oct 6 09:30:09 EDT 2004

Alan Kennedy <alanmk at hotmail.com> wrote:
>  [Andreas Kostyrka]
>  > So basically you either get a really huge number of locks (one per
>  > object) with enough potential for conflicts, deadlocks and all the
>  > other stuff to make it real slow down the ceval.
>  >
>  > One could use less granularity, and lock say the class of the object
>  > involved, but that wouldn't help that much either.
>  >
>  > So basically the GIL is a design decision that makes sense, perhaps it
>  > shouldn't be just called the GIL, call it the "very large locking
>  > granularity design decision".
> 
>  Reading the above, one might be tempted to conclude that presence of the 
>  GIL in the cpython VM is actually a benefit, and perhaps that all 
>  language interpreters should have one!

Interestingly the linux kernel has been through a similar evolution...

In linux 2.0 Multi-processing (SMP) was brought in.  This was done
using something called the Big Kernel Lock (BKL). Sounds familiar
doesn't it!  This worked fine but had the problem that only one CPU
could be executing in the kernel at once and hence a performance loss
in certain situations.  This is the identical situation to python now
- there can only be one thread in python core at once.

However the linux kernel has evolved to replace the BKL with a series
of smaller locks.  This has been a gradual process from 2.2->2.6 The
BKL is still there but its only used by legacy code in 2.6.

Yes fine grained locking does have an overhead.  More locks mean more
memory and more wasted time in locking and unlocking them.  The linux
kernel has got round this by

1) The locks aren't compiled in at all for a uniprocessor kernel (this
would mean a non-threading python - probably not an option)

2) The linux kernel has many different forms of locking both heavy and
light-weight

3) care has been taken to properly align the locks on cache line
boundaries.  This costs memory though.

4) the linux kernel now has Read-Copy-Update (RCU) which requires no
locking

Linux is aiming at the future when everyone has 4 or 8
cores/hyperthreads and I think that is the right decision.  Fine
grained locking will come to python one day I'm sure.

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick