[Python-ideas] easy thread-safety [was: fork]

Wed Aug 19 04:09:35 CEST 2015

On Aug 18, 2015, at 13:32, Sven R. Kunze <srkunze at mail.de> wrote:
> 
>> On 18.08.2015 19:27, Chris Angelico wrote:
>>> On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze at mail.de> wrote:
>>>> On 18.08.2015 18:55, Chris Angelico wrote:
>>>> The notion of "a completely separate heap for each thread" is talking
>>>> about part B - you need a completely separate pile of objects.
>>> 
>>> Right. However, only if needed. As long as, threads only reads from a common
>>> variable, there is no need to interfere.
>> Sure, but as soon as you change something, you have to
>> thread-local-ify it. So I suppose what you would have is three
>> separate pools of objects:
>> 
>> 1) Thread-specific objects, which are referenced only from one thread.
>> These can be read and written easily and cheaply.
>> 2) Global objects which have never been changed in any way since
>> threading began. These can be read easily, but if written, must be
>> transformed into...
>> 3) Thread-local objects, which exist for all threads, but are
>> different. The id() of such an object depends on which thread is
>> asking.
>> 
>> Conceptually, all three types behave the same way - changes are
>> visible only within the thread that made them. But the implementation
>> could have these magic "instanced" objects for only those ones which
>> have actually been changed, and save a whole lot of memory for the
>> others.
> 
> Indeed. I think that is sensible approach here. Speaking of an implementation though, I don't know where I would start when looking at CPython.
> 
> Thinking more about id(). Consider a complex object like an instance of a class. Is it really necessary to deep copy it? It seems to me that we actually just need to hide the atomic/immutable values (e.g. strings, integers etc.) of that object.

Why wouldn't hiding the mutable members be just as necessary? In your example, if I can replace Y.x, isn't that even worse than replacing Y.x.a?

> The object itself can remain the same.

What does it mean for an object to be "the same" if it potentially holds different values in different threads.

> # first thread
> class X:
>    a = 0
> class Y:
>    x = X
> 
> #thread spawned by first thread
> Y.x.a = 3  # should leave id(X) and id(Y) alone

OK, but does the second thread see 0 or 3? If the former, then these aren't shared objects at all. If the latter, then that's how things already work.

> Maybe, that example is too simple, but I cannot think of an issue here. As long as the current thread is the only one being able to change the values of its variables, all is fine.

No. If other threads can see those changes, it's still a problem. They can see things happening out of order, see objects in inconsistent intermediate states, etc.--all the problems cause by races are still there.

>>> I agree, behavior-wise, processes behave almost as desired (relevant data is
>>> copied over and there are no shared variables).
>>> 
>>> However, regarding the cpu/memore/communication footprint for a new process
>>> (using spawn) is enormous compared to a thread. So, threading still have its
>>> merits (IMHO).
>> So really, you're asking for process semantics, with some
>> optimizations to take advantage of the fact that most of the processes
>> are going to be just reading and not writing. That may well be
>> possible, using something like the above three-way-split, but I'd want
>> the opinion of someone who's actually implemented something like this
>> - from me, it's just "hey, wouldn't this be cool".
> 
> If you put it this way, maybe yes. I also look forward to more feedback on this.

Have you looked into the subinterpreters project, the PyParallel project, or the PyPy-STM project, all of which, as I mentioned earlier, are possible ways of getting some of the advantages of process semantics without all of the performance costs? (Although none of them are exactly that, of course.)

> To me, a process/thread or any other concurrency solution, is basically a function that I can call but runs in the background. Later, when I am ready, I can collect its result. In the meantime, the main thread continues. (Again) to me, that is the only sensible way to approach concurrency. When I recall the details of locks, semaphores etc. and compare it to what real-world applications really need... You can create huge tables of all the possible cases that might happen just in order to find out that you missed an important one.

Yes, that is the problem that makes multithreading hard in the first place (except in pure functional languages). If the same value is visible in two threads, and can be changed by either of those threads, you have to start thinking either about lock discipline, or about ordering atomic operations; either way, things get very complicated very fast.

A compromise solution is to allow local mutable objects, but not allow them to be shared between threads; instead, you provide a way to (deep-)copy them between threads, and/or to (destructively) move them between threads. You can do that syntactically, as with the channel operators used by Erlang and the languages it's inspired, or you can do it purely at a semantic level, as with Python's multiprocessing library; the effect is the same: process semantics, or message-passing semantics, or whatever you want to call it gives you the advantages of immutable threading in a language with mutability.

> Even worse, as soon as you change something about your program, you are doomed to redo the complete case analysis, find a dead/live-lock-free solution and so forth. It's a time sink; costly and dangerous from a company's point of view.

This is an argument for companies to share as little mutable state as possible across threads. If you don't have any shared state at all, you don't need locks or other synchronization mechanisms at all. If you only have very limited and specific shared state, you have very limited and hopefully simple locking, which is a lot easier to keep track of.

And you can already do this today, using multiprocessing. It's an obvious and explicit way to ask for process semantics. If you're not using it, you have to explain why you can't use it, and why you think rebuilding the same semantics on top of threads would solve your problem.

There are possible answers to that. Some projects need a better "unsafe escape hatch" for sharing than either raw shared memory or proxy-manager protocols can provide; for some, there may be a specific performance bottleneck that could in theory be avoided but in practice the current design makes it impossible; etc. None of these are very common, but they do exist. If you're running into a specific one, we should be looking for ways to characterize and then solve that specific problem, not trying to rebuild what we already have and hope that this time the problem doesn't come up.