[pypy-dev] STM

Sun Jan 8 21:24:22 CET 2012

Hi Armin:

________________________________
 From: Armin Rigo <arigo at tunes.org>
To: PyPy Developer Mailing List <pypy-dev at python.org> 
Sent: Wednesday, January 4, 2012 6:30 PM
Subject: [pypy-dev] STM

>While this is obvious in the case of the CPython interpreter, I'd
>argue that it is a useful point of view in general.  The paper
>mentioned above says that the current approach to multithreading is
>completely non-deterministic, 

....

A silly question? What is the paper you are discussing? 

> The solution that we really need for
>CPython requires a different point of view: a generally-STM program,
>in which we carefully add here and there a "yield", i.e. a point at
>which it's ok to end the current transaction and start the next one.

...

>(Interestingly, this run() function is similar to stackless.run().
>The exact relationships between stackless and this model are yet to
>investigate, but it seems that one point of view on what I'm proposing
>might be essentially "use stackless, remove some determinism from the
>already-unclear tasklet switching order, and add STM under the hood".)

Perhaps this is off-topic, in  non-trivial stackless programme using cooperative scheduling, 
unless you explicitly impose a logical ordering, it is difficult to predict when a particular tasklet 
will  run. The round-robin doesn't mean much. This is because in non-trivial stackless programmes,  
like most non-trivial multi-threaded programmes, coroutines are responding to external events that 
arrive with some degree of randomness. To cop a line from the creator of Erlang,"the world is concurrent."
This is also why I didn't quite get the extreme measures to inject non-determinism in the Newsqueak
interpreter so the programmer can't second guess the scheduler.

I will be the first to admit that I have a half-baked knowledge of STM. And I have just started looking
at the C++ Transactional language specification. And I have a sketchy knowledge of CPython 
interpreter. So I have a lot to learn and probably many misconceptions. 

I can sort of see the Stackless angle: in the simplest case a tasklet that does not share data with other 
tasklets  (in this case, it can yield without problems), communications only via channels, or uses set_atomic() looks like a coarse-grained transaction of sorts.....

Based on the little I know about stuff like transactional synchronisation order, in a Stackless 
programme, I can see a beginTransaction like statement acting like a natural yield/schedule() point. Marking transactions boundaries is useful. If another transaction is occuring, the tasklet/thread suspends. Otherwise the tasklet/thread starts its transaction until it finishes. 

One may be able to model this with stackless.py .....

If I understand what you are proposing, the problem I see with yielding in the coarse-grained tasklet-as-transaction, is that you probably need more machinery to ensure ACID properties, since you are now getting thread interweaving and potential race conditions.

>and add STM under the hood".

When I did my join pattern prototype, I read the paper "Scalable Join Patterns." A conversation with the authors lead to reading stuff on Concurrent/Parallel ML and "Transactional Events." The common strategy in all those projects was to use STM related technologies under the hood to support a higher level concurrency construct while reducing the performance hit.

Cheers,
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20120108/ddca46ee/attachment.html>