[Tutor] threading mind set

Mon May 14 08:12:02 CEST 2012

On Mon, 2012-05-14 at 10:31 +1000, Steven D'Aprano wrote:
[...]
> No hard compared to what?

Compared to sequential programming.

[...]
> My argument is that once you move beyond the one-operation-after-another 
> programming model, almost any parallel processing problem is harder than the 
> equivalent sequential version, inherently due to the parallelism. Except 
> perhaps for "embarrassingly parallel" problems, parallelism adds complexity 
> even if your framework abstracts away most of the tedious detail like semaphores.
> 
> http://en.wikipedia.org/wiki/Embarrassingly_parallel
> 
> Once you move beyond sequential execution, you have to think about issues that 
> don't apply to sequential programs: how to divide the task up between 
> processes/threads/actors/whatever, how to manage their synchronization, 
> resource starvation (e.g. deadlocks, livelocks), etc.

Actor systems, dataflow systems and CSP (Communicating Sequential
Processes), do not guarantee lack of deadlock or livelock, but the whole
"processes communicating by passing messages not by sharing data" make
it hugely easier to reason about what is happening.

Moreover if like with CSP, your actors or dataflow systems enforce
sequential actors/operators then it gets even better.

The secret to parallel processing (in general, there are always
exception/corner cases) is to write sequential bits that then
communicate using queues or channels.

No semaphores. No locks. No monitors. These are tools for operating
systems folk and for folk creating actor, dataflow and CSP queues and
channels.

> We have linear minds and it doesn't take that many real-time parallel tasks to 
> overwhelm the human brain. I'm not saying that people can't reason in 
> parallel, because we clearly can and do, but it's inherently harder than 
> sequential reasoning.

I think if you delve into the psychology of it, our minds are far from
linear. Certainly at the electro-chemical level the brain is a massively
parallel machine.

Over the last 50 years, we have enshrined single processor, single
memory into our entire thinking about computing and programming. Our
education systems enforce sequential programming for all but the final
parallel programming option. The main reason for parallel programming
being labelled hard is that we have the wrong tools for reasoning about
it. This is the beauty of the 1960s/1970s models of actors, dataflow and
CSP, you deconstruct the problem into small bits each of which are
sequential and comprehensible, then the overall behaviour of the system
is an emergent property of the interaction between these small
subsystems.

Instead of trying to reason about all the communications systems wide,
we just worry about what happens with a small subsystem.

The hard part is the decomposition. But then the hard part of software
has always been the algorithm.

You highlight "embarrassingly parallel" which is the simplest
decomposition possible, straight scatter/gather, aka map/reduce. More
often that not this is handled by a façade such as "parallel reduce".

It is perhaps worth noting that "Big Data" is moving to dataflow
processing in a "Big Way" :-) Data mining and the like has been
revolutionized by changing it's perception of algorithm and how to
decompose problems. 

[...]
> Python doesn't have a GIL. Some Python implementations do, most obviously 
> CPython, the reference implementation. But Jython and IronPython don't. If the 
> GIL is a problem for your program, consider running it on Jython or IronPython.

It is true that Python doesn't have a GIL, thanks for the correction.
CPython and (until recently) PyPy have a GIL. The PyPy folk are
experimenting with software transactional memory (STM) in the
interpreter to be able to remove the GIL. To date things are looking
very positive. PyPy will rock :-)

Although Guido had said (EuroPython 2010) he is happy to continue with
the GIL in CPython, there are subversive elements (notable the PyPy
folk) who are trying to show that STM will work with CPython as well.

Jython is sadly lagging behind in terms of versions of Python supported
and is increasingly becoming irrelevant -- unless someone does something
soon. Groovy, JRuby and Clojure are the dynamic languages of choice on
the JVM.

IronPython is an interesting option except that there is all the FUD
about use of the CLR and having to buy extortion^H^H^H^H^H^H^H^H^H
licencing money to Microsoft. Also Microsoft ceasing to fund IronPython
(and IronRuby) is a clear indicator that Microsoft have no intention of
supporting use of Python on CLR. Thus it could end up in the same state
as Jython.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder at ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel at winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/tutor/attachments/20120514/ecbf2a5d/attachment-0001.pgp>