Is there a more efficient threading lock?

Sun Feb 26 00:35:50 EST 2023

On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
<python-list at python.org> wrote:
>
> On 2023-02-25, Paul Rubin <no.email at nospam.invalid> wrote:
> > The GIL is an evil thing, but it has been around for so long that most
> > of us have gotten used to it, and some user code actually relies on it.
> > For example, with the GIL in place, a statement like "x += 1" is always
> > atomic, I believe.  But, I think it is better to not have any shared
> > mutables regardless.
>
> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
> Any replacement for the GIL would have to keep the former at least,
> plus the fact that you can do hundreds of things like list.append(foo)
> which are all effectively atomic.

The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right?

As it turns out, most GIL-removal attempts have had a fairly nasty
negative effect on performance. The GIL is a huge performance boost.

As to what is atomic and what is not... it's complicated, as always.
Suppose that x (or foo.x) is a custom type:

class Thing:
    def __iadd__(self, other):
        print("Hi, I'm being added onto!")
        self.increment_by(other)
        return self

Then no, neither of these is atomic, although if the increment itself
is, it probably won't matter. As far as I know, the only way that it
would be at all different for x+=1 and foo.x+=1 would be if the
__iadd__ method both mutates and returns something other than self,
which is quite unusual. (Most incrementing is done by either
constructing a new object to return, or mutating the existing one, but
not a hybrid.)

Consider this:

import threading
d = {0:0, 1:0, 2:0, 3:0}
def thrd():
    for _ in range(10000):
        d[0] += 1
        d[1] += 1
        d[2] += 1
        d[3] += 1

threads = [threading.Thread(target=thrd) for _ in range(50)]
for t in threads: t.start()
for t in threads: t.join()
print(d)

Is this code guaranteed to result in 500000 in every slot in the
dictionary? What if you replace the dictionary with a four-element
list? Do you need a GIL for this, or some other sort of lock? What
exactly is it that is needed? To answer that question, let's look at
exactly what happens in the disassembly:

>>> def thrd():
...     d[0] += 1
...     d[1] += 1
...
>>> import dis
>>> dis.dis(thrd)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              0 (d)
             14 LOAD_CONST               1 (0)
             16 COPY                     2
             18 COPY                     2
             20 BINARY_SUBSCR
             30 LOAD_CONST               2 (1)
             32 BINARY_OP               13 (+=)
             36 SWAP                     3
             38 SWAP                     2
             40 STORE_SUBSCR

  3          44 LOAD_GLOBAL              0 (d)
             56 LOAD_CONST               2 (1)
             58 COPY                     2
             60 COPY                     2
             62 BINARY_SUBSCR
             72 LOAD_CONST               2 (1)
             74 BINARY_OP               13 (+=)
             78 SWAP                     3
             80 SWAP                     2
             82 STORE_SUBSCR
             86 LOAD_CONST               0 (None)
             88 RETURN_VALUE
>>>

(Your exact disassembly may differ, this was on CPython 3.12.)
Crucially, note these three instructions that occur in each block:
BINARY_SUBSCR, BINARY_OP, and STORE_SUBSCR. Those are a lookup
(retrieving the value of d[0]), the actual addition (adding one to the
value), and a store (putting the result back into d[0]). So it's
actually not guaranteed to be atomic; it would be perfectly reasonable
to interrupt that sequence and have something else do another
subscript.

Here's the equivalent with just incrementing a global:

>>> def thrd():
...     x += 1
...
>>> dis.dis(thrd)
  1           0 RESUME                   0

  2           2 LOAD_FAST_CHECK          0 (x)
              4 LOAD_CONST               1 (1)
              6 BINARY_OP               13 (+=)
             10 STORE_FAST               0 (x)
             12 LOAD_CONST               0 (None)
             14 RETURN_VALUE
>>>

The exact same sequence: load, add, store. Still not atomic.

General takeaway: The GIL is a performance feature, not a magic
solution, and certainly not an evil beast that must be slain at any
cost. Attempts to remove it always have to provide equivalent
protection in some other way. But the protection you think you have
might not be what you actually have.

ChrisA