[Python-ideas] easy thread-safety [was: fork]

Sven R. Kunze srkunze at mail.de
Tue Aug 18 18:28:17 CEST 2015


On 12.08.2015 05:33, Andrew Barnert wrote:
> On Aug 11, 2015, at 07:33, Sven R. Kunze <srkunze at mail.de> wrote:
>> Am 05-Aug-2015 16:30:27 +0200 schrieb abarnert at yahoo.com:
>>
>>> What does that even mean? How would you not allow races? If you let people throw arbitrary tasks at a thread pool, with no restriction on mutable shared state, you've allowed races.
>> Let me answer this in a more implicit way.
>>
>> Why do we need to mark global variables as such?
>> I think the answer is clear: to mark side-effects (quoting the docs).
>>
>> Why are all variables thread-shared by default?
>> I don't know, maybe efficiency reasons but that hardly apply to Python in the first place.
> First, are you suggesting that your idea doesn't make sense unless Python is first modified to not have shared variables? In that case, it doesn't seem like a very useful proposal, because it applies to some different language that isn't Python. And applying it to Python instead means you're still inviting race conditions. Pointing out that in a different language those races wouldn't exist is not really an answer to that.
I don't understand. In order to propose a change, one needs to imagine 
how things could be if there were different. I don't see how this 
relates to 'if you change Python, it wouldn't be Python anymore'. You 
certainly agree that Python 2.7 is not Python 3.5 and Python 3.7 will 
also be different (if existent at all). So, what's your point here?
> Second, the reason for the design is that that's what threads mean, by definition: things that are like processes except that they share the same heap and other global state.
Where do you get this kind of definition from?
>   What's the point of a proposal that lets people select between threads and processes if its threads aren't actually processes?
Where did I say that?

My point was:

1) processes are fine (more or less)
2) threads aren't because their are hard to manage, so let's make them 
easier
> Finally, just making variables thread-local wouldn't help. You'd need a completely separate heap for each thread;
So?

At this point, talking about internal implementation hardly seems to 
relevant. Not exactly sure, what you mean by heap here, but I could 
imagine more like a overlay approach. As long as, I only read the 
original variable, we are fine. But setting it would require me to store 
the thread-local value somewhere else.

I am uncertain why you are so averse about making threading easier to 
handle and to maintain. If you bother about 'easier', let's call it 
'code works more reliably', 'code is more readable', 'code has lesser 
side-effects', 'code produces lesser races'.

I am not talking about 100%. I am talking about 80% less places in your 
code where you need to worry thread-safety. That makes 20% places where 
you really need to.


Btw. the stdlib would also benefit from this, in order to provide 
thread-safe modules out of the box. Not every maintainer needs to 
re-implement the desired thread-safety from scratch over and over again.

> otherwise, just passing a list to another thread means it can modify your values.

Just depends on what you want here. I would rather see Python assuming 
thread-safe behavior by default whereas the programmer can actively 
chose a more flexible/dangerous model if needed for some small areas.

> [...implementation...] what happens when you do x[0]=y if x is local and y shared, or vice-versa?

Now, we are talking.

A) As soon as a single variable (x or/and y) is shared, all expressions 
using/writing such variables are basically unsafe. It's dangerous, you 
might need some locks and so forth to get it running properly. You might 
need extra thought to handle some weird corner cases and so forth.

B) If all variables of an expression are local, everything is fine. No 
additional work needed.


I regard case B) as the common case where you DON'T want others to mess 
around with your variables and you don't can do anything about it.
Case A) is more like the data communication channel where threads could 
communicate with each other, aggregate results in a common list, and so 
forth. I only can image this taking place at the end of the threading 
part of a program where the results needs to be propagated back to the 
MainThread.

> Of course there's no reason that you couldn't implement something that's basically a process at the abstract level, but implemented with threads at the OS level. And that could make both explicit shared memory and IPC simpler at least under the covers, and more efficient. And it could lead to a way to eliminate the GIL. And there could be other benefits as well. That's why people are exploring things like the recent subinterpreters thread, PyParallel, PyPy+STM, etc.

Yes, transactional memory would basically the term that would cover 
that. A thread basically gets a snapshot of the world right from the 
start and after it finishes, the variables get merged back. However, I 
am unsure whether I would want that for all variables ("shared vs local" 
exists also here; and I would prefer an explicit way to declare it).

> The default is to write sequential code. You have to go out of your way to use threads. And when you do, you have to intentionally choose threads over processes or some kind of microthreads.

We are talking about threading all the way long. There is no point in 
going back to sequential.

> It's only when you've chosen to use shared-memory threading as the design for your app that shared memory becomes the default.
I am not sure if I can follow here. If I look at the threading API of 
the Python standard lib, it is shared-memory. So, it is the default, 
like it or not.

> But that is the default in Python, so your proposal would make it easier for such people to write broken code without even realizing they're doing so, so it's not a good thing.

I am sorry? Because shared-memory is the default in Python, my proposal 
would make it easier for such people to write broken code? We must be 
talking about different proposals. Maybe, you could give an example.


Just for the record, my proposal:

1) processes are almost fine
2) threads aren't, so let's make it easier to work with them


Best,
Sven


More information about the Python-ideas mailing list