Thread-safe way to add a key to a dict only if it isn't already there?

Sat Jul 7 21:04:18 EDT 2018

On Sun, 08 Jul 2018 00:00:26 +0300, Marko Rauhamaa wrote:

> Ian Kelly <ian.g.kelly at gmail.com>:
>> the leaning of the devs seems to be to refrain from documenting it and
>> instead document that *no* operations are guaranteed atomic.
> 
> I believe that to be wise. Otherwise, Python would limit its future
> implementation options.

o_O

By that logic, one should say we shouldn't document the semantics of 
operations, so we don't limit the implementation.

"What does list.sort do?"

"It currently sorts the list using a stable comparison sort, but you 
can't rely on that it, because we didn't want to limit the 
implementation."

Sorting is guaranteed to be stable. Does that limit the implementation 
options? Yes, of course it does. So does the fact that it *sorts* -- 
we're limited to implementations which *sort*, and specifically 
comparison sorts (not, for example, radix sorts).

Changing the implementation to a radix sort that only works on ints would 
break things. So would changing the implementation to something which 
empties the list of all elements. ("The empty list is always sorted.")

Stable semantics are more important than freedom to change implementation.

Whether or not an operation is thread-safe, or atomic, is part of the 
semantics of the operation. It's not merely a performance issue, and we 
ought to treat it will a bit more respect.

I'm not saying that everything needs to be documented as thread-safe or 
not (for at least one release, Python's sorting actually was stable 
before it was documented as such) but the gung-ho attitude that safety is 
just an implementation detail should be resisted.

Changing implementations from one which is thread safe to one which is 
not can break people's code, and shouldn't be done on a whim. Especially 
since such breakage could be subtle, hard to notice, harder to track 
down, and even harder still to fix.

> The only thing Python should guarantee is that the data structures stay
> "coherent" under race conditions. In other words, there cannot be a
> segmentation fault. For example, if two threads executed this code in
> parallel:
> 
>     global i
>     i = 1
>     i += 1
> 
> a legal end result could be that i contains the string "impossible".

That wouldn't be coherent. The only coherent results are that i could 
equal either 2 or 3:

Possibility 1:

    i = 1   # thread 1
    i += 1  # thread 1
    i = 1   # thread 2
    i += 1  # thread 2
    assert i == 2

Possibility 2:

    i = 1   # thread 1
    i = 1   # thread 2
    i += 1  # thread 1
    i += 1  # thread 2
    assert i == 3

Executing statements out of order is impossible, even in threaded code. 
Redefining the meaning of the integer literal 1 is impossible (ignoring 
unsupported shenanigans with ctypes). Monkey-patching the int __iadd__ 
method is impossible. So there is no coherent way to get a result of 
"impossible" from just adding 1 to 1 in any coherent implementation of 
Python.

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson