Parallelization on muli-CPU hardware?

Wed Oct 6 07:38:17 EDT 2004

[Andreas Kostyrka]
 > So basically you either get a really huge number of locks (one per
 > object) with enough potential for conflicts, deadlocks and all the
 > other stuff to make it real slow down the ceval.
 >
 > One could use less granularity, and lock say the class of the object
 > involved, but that wouldn't help that much either.
 >
 > So basically the GIL is a design decision that makes sense, perhaps it
 > shouldn't be just called the GIL, call it the "very large locking
 > granularity design decision".

Reading the above, one might be tempted to conclude that presence of the 
GIL in the cpython VM is actually a benefit, and perhaps that all 
language interpreters should have one!

But the reality is that few other VMs have selected to employ such a 
"very large locking granularity design decision". If one were to believe 
the arguments about the problems of fine-grained locking, one might be 
tempted to conclude that other VMs such as the JVM and the CLR are 
incapable of competing with the cpython VM, in terms of performance. But 
Jim Hugunin's pre-release figures for IronPython performance indicate 
that it can, in some cases, outperform cpython while running on just a 
single processor, let alone when multiple processors are available. And 
that is for an "interpreter running on an interpreter".

CPython's GIL does give a *small* performance benefit, but only when 
there is a single execution pipeline. Once there is more than one 
pipeline, it degrades performance.

As I've already stated, I believe the benefits and trade-offs of the GIL 
are arguable either way when there is a small number of processors 
involved, e.g. 2 or less. But if chip makers are already producing chips 
with 2 execution pipelines, then you can be sure it won't too be long 
before they are shipping units with 4, 8, 16, 32, etc, execution 
pipelines. As this number increases, the GIL will increasingly become a 
restrictive bottleneck.

Contrarily, jython, ironpython, etc, will continue to benefit from the 
enormous and massively-resourced optimisation efforts going into the JVM 
and CLR respectively (e.g. JIT compilation), all without a single change 
to the python code or the code interpreting it.

Lastly, as the number of execution pipelines in CPUs grow, what will 
happen if/when they start talking back and forth to each, 
transputer-style, instead of executing mostly isolated and in parallel 
as they do today? Transputers[1] had the lovely concept of high-speed 
hardware communication channels cross-linking all CPUs in the "array". 
The reason this model never really took off back in the 80's was because 
there were no familiar high-level language models for exploiting it, 
besides Occam[2].

IMHO, new python concepts such as generators are precisely the right 
high-level concepts for enabling transputer style fine-granularity, 
inter-execution-pipeline, on-demand, "pull" comms. This would put python 
at an enormous conceptual advantage compared to other mainstream 
languages, which generally don't have generator-style concepts. What a 
shame that the GIL would restrict only one such CPU at a time to 
actually be running python code.

regards,

-- 
alan kennedy
------------------------------------------------------
email alan:              http://xhaus.com/contact/alan

[1] Transputer architecture
http://www.cmpe.boun.edu.tr/courses/cmpe511/fall2003/transputer.ppt

[2] Occam programming language
http://encyclopedia.thefreedictionary.com/Occam%20programming%20language