[Python-ideas] easy thread-safety [was: fork]

Andrew Barnert abarnert at yahoo.com
Fri Aug 21 03:31:08 CEST 2015


On Aug 20, 2015, at 16:57, Ron Adam <ron3200 at gmail.com> wrote:
> 
> On 08/20/2015 05:26 PM, Andrew Barnert via Python-ideas wrote:
>>> It may seem like it isn't needed, because you have control over
>>> what a function has access too... ie... just don't do that.  But
>>> when you have many programmers working on large projects, things
>>> can get messy.  And this helps with that, but also helps in the
>>> case of threads.
>> The only case you're helping with here is the case where the race is
>> entirely local to one function and the functions it defines--a
>> relatively uncommon case that also gets the least messy and is the
>> easiest to spot and debug.
>> 
>> Also, the "dangerous" cases are already marked today: the local
>> function has to explicitly declare the variable nonlocal or it can't
>> assign to it.
> 
> But you can mutate it, so it's not already marked today.

But your solution does nothing to stop values from being mutated, only to stop variables from being reassigned, so you haven't fixed anything.

>>>>>>>>>> LOAD_FAST 0,  reads __code__.co_varnames[0] LOAD_FAST
>>>>>>>>>> 1,  reads __code__.co_varnames[1]
>>>>>>>>>> 
>>>>>>>>>> Adding a co_mutables name list to the __code__
>>>>>>>>>> attribute, along with new bytecodes to access them,
>>>>>>>>>> it would create a way to keep private local names
>>>>>>>>>> without changing how the other bytecodes work.
>>>>>>>>>> 
>>>>>>>>>> LOAD_MUTABLE 0,  would get the first reference in
>>>>>>>>>> __code__.co_mutables.
>> As a side note, closure variables aren't accessed by LOAD_FAST and
>> SAVE_FAST from either side; that's what cellvars and freevars are
>> for. So, your details don't actually work. But it's not hard to
>> s/fast/cell/ and similar in your details and understand what you
>> mean.
> 
> Yes... thats what I mean. ;-)
> 
> 
>> But I don't see why you couldn't just implement the "mutable" keyword
>> to mean that the variable must be in names rather than cellnames or
>> freenames or varnames (raising a compile-time error if that's not
>> possible) and just continue using *_FAST on them. That would be a lot
>> simpler to implement. It's also a lot simpler to explain: declaring a
>> variable mutable means it can't participate in closures.
> 
> Sounds good to me.
> 
> 
>>>> As I said it's a partial solution.  Shared & mutable names passed
>>>> as function arguments will still need to be protected
>> The problem here is the same as in Sven's proposal: the problem is
>> shared values, not shared variables, so any solution that just tries
>> to limit shared variables is only a vanishingly tiny piece of the
>> solution.
> 
> The problem is shared mutable values.
> 
> One thing I've been wondering is what is considered a mutable object in an practical sense.  For example a tuple is not mutable, but it's items may be.  

Yes, that problem already exists for hashabllity (which, in Python, is strongly connected to immutability). Calling hash() on a tuple may still raise an exception of one of its elements is mutable.

The reason this is rarely a problem is that it's not hard to just avoid storing mutable values in tuples that are used as dict keys. But technically, the problem is there.

In your case, the same doesn't help, because it _is_ hard to avoid storing mutable values in everything in the world except specially-marked local variables.

> But it seems to me it should still be considered a mutable object if it has any mutable sub items in it.  So a test for mutability needs to test content.  It probably needs another word or description. Completely immutable?  (?)   It seems like there should already be a word for an object that has no mutable sub items or parts.  It also seems like there should be an easy way to test an object for that.

Only by recursing through the entire thing, as calling hash() does.

In a statically-typed language, it's a different story, because if you declare, or the compiler infers, that you've got, say, a 3-tuple of nothing but ints or other 3-tuples of the same type or frozensets of the same type, then any value that it holds must be recursively immutable. (Although that can get tricky in practice, of course. Imagine writing that ADT manually, or reading it.) But that doesn't help for Python.

The only obvious efficient way I can think of to solve this is in Python what I said before: having an entirely separate heap of objects, so you know that the objects in that thread can't be part of any objects in another thread. In other words, process semantics, which we already have. Maybe there are other ways to solve it, but ignoring it definitely doesn't count as solving it.

>> It doesn't do anything for mutable values passed to functions, or
>> returned or yielded, or stored on self, or stored in any other
>> object's attributes or in any container, or even in globals.
> 
> Returned mutable values shouldn't be a problem unless the function reuses the same object over again.

Sure it would:

    connections.append(connect())
    with Pool() as pool:
        for _ in range(10):
            pool.apply_async(broadcast, connections, "warning")

Now you've got 10 threads that all want to mutate the same object returned by connect(), even though it isn't stored in a variable.

> But yes it doesn't solve those cases.

Right, it only solves an uncommon case, which also happens to be the simplest to detect and debug.

>> It also doesn't prevent you from mutating them by calling a method
>> (including __setattr__ and __setitem__ and the various __i*__ methods
>> as well as explicit method calls).
> 
> Nonlocal doesn't prevent that.  Not being able to get to it does.  For small programs that aren't nested deeply it usually not a problem.  Just don't reuse names and be sure to initialize values locally.

That's the whole problem: reusing names isn't the issue, it's different parts of the program having the same values under different names (or just using them without binding them to names at all). That's where races come from in real-life programs. And most complex programs are not deeply nested (they're wide, not tall), so that isn't the problem either.

>> And, even if you managed to solve all of those problems, it still
>> wouldn't be useful, because it doesn't do anything for any case where
>> you share a member or element of the object rather than the object
>> itself--e.g., if I have a dict mapping sockets to Connection objects,
>> marking the dict unshareable doesn't protect the Connection objects
>> in any way.
> 
> That would be the part that this won't address, you would still need to either deep copy or lock all the shared mutable locations within that dict including the items and any attributes.  Not only the items passed and returned, but for all mutable values that can seen by any functions in threads.  This last part was what I was thinking could be made easier.. is it possible to reduce how much is seen, and so reduce the amount of locks.
> 
> I think it was an interesting discussion and will help me think about these things in the future, but I'm not sure it would do what I was thinking at the start.

If you want to take this idea further, look at what you could do with a two-level store. For example, in Oz (slightly oversimplifying and distorting things), names are bound to variables, and lists hold variables, and so on, while variables hold values. You can also implement this in something like C++, if you only store shared_ptr<T> values rather than storing T values directly (at least whenever T is mutable). I think you could build something around declaring the sharedness of the variables, rather than the names. Would that be sufficient without transitive static typing? I'm not sure, but it might be worth thinking through. But I don't think that would lead to anything useful for Python.

Another possibility is to look at ways to move rather than copy values. That doesn't solve everything, but a few years of experience with C++11 seems to show that it can solve many real-world problems. (Of course this assumes the existence of collections that you can only move, not copy, things into. Or maybe just using a move-to-queue API and not directly accessing the collection underneath it?) There might be something doable for Python there.


More information about the Python-ideas mailing list