Thread-safe way to add a key to a dict only if it isn't already there?

Chris Angelico rosuav at gmail.com
Sat Jul 7 21:15:17 EDT 2018


On Sun, Jul 8, 2018 at 11:04 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
>> The only thing Python should guarantee is that the data structures stay
>> "coherent" under race conditions. In other words, there cannot be a
>> segmentation fault. For example, if two threads executed this code in
>> parallel:
>>
>>     global i
>>     i = 1
>>     i += 1
>>
>> a legal end result could be that i contains the string "impossible".
>
> That wouldn't be coherent. The only coherent results are that i could
> equal either 2 or 3:
>
> Possibility 1:
>
>     i = 1   # thread 1
>     i += 1  # thread 1
>     i = 1   # thread 2
>     i += 1  # thread 2
>     assert i == 2
>
> Possibility 2:
>
>     i = 1   # thread 1
>     i = 1   # thread 2
>     i += 1  # thread 1
>     i += 1  # thread 2
>     assert i == 3
>
>
> Executing statements out of order is impossible, even in threaded code.
> Redefining the meaning of the integer literal 1 is impossible (ignoring
> unsupported shenanigans with ctypes). Monkey-patching the int __iadd__
> method is impossible. So there is no coherent way to get a result of
> "impossible" from just adding 1 to 1 in any coherent implementation of
> Python.

Python threads don't switch only between lines of code, so the actual
interaction is a bit more complicated than you say. In CPython, the
increment operation is:

  3           0 LOAD_GLOBAL              0 (i)
              2 LOAD_CONST               1 (1)
              4 INPLACE_ADD
              6 STORE_GLOBAL             0 (i)

A context switch could happen between any pair of statements. In this
particular example, the end result doesn't change - coherent results
are 2 and 3, nothing else - but in other situations, there may be
times when the separate steps might be significant. For instance, if
you replace "i += 1" with "i += i", to double the value, you'll get
this:

  3           0 LOAD_GLOBAL              0 (i)
              2 LOAD_GLOBAL              0 (i)
              4 INPLACE_ADD
              6 STORE_GLOBAL             0 (i)

and that could potentially have both of them load the initial value,
then one of them runs to completion, and then the other loads the
result - so it'll add 1 and 2 and have a result of 3, rather than 2 or
4.

But you're absolutely right that there are only a small handful of
plausible results, even with threading involved.

ChrisA



More information about the Python-list mailing list