Multiple scripts versus single multi-threaded script

Thu Oct 3 14:28:32 EDT 2013

In article <mailman.684.1380819470.18130.python-list at python.org>,
 Chris Angelico <rosuav at gmail.com> wrote:

> On Fri, Oct 4, 2013 at 2:41 AM, Roy Smith <roy at panix.com> wrote:
> > The downside to threads is that all of of this sharing makes them much
> > more complicated to use properly.  You have to be aware of how all the
> > threads are interacting, and mediate access to shared resources.  If you
> > do that wrong, you get memory corruption, deadlocks, and all sorts of
> > (extremely) difficult to debug problems.  A lot of the really hairy
> > problems (i.e. things like one thread continuing to use memory which
> > another thread has freed) are solved by using a high-level language like
> > Python which handles all the memory allocation for you, but you can
> > still get deadlocks and data corruption.
> 
> With CPython, you don't have any headaches like that; you have one
> very simple protection, a Global Interpreter Lock (GIL), which
> guarantees that no two threads will execute Python code
> simultaneously. No corruption, no deadlocks, no hairy problems.
> 
> ChrisA

Well, the GIL certainly eliminates a whole range of problems, but it's 
still possible to write code that deadlocks.  All that's really needed 
is for two threads to try to acquire the same two resources, in 
different orders.  I'm running the following code right now.  It appears 
to be doing a pretty good imitation of a deadlock.  Any similarity to 
current political events is purely intentional.

import threading
import time

lock1 = threading.Lock()
lock2 = threading.Lock()

class House(threading.Thread):
    def run(self):
        print "House starting..."
        lock1.acquire()
        time.sleep(1)
        lock2.acquire()
        print "House running"
        lock2.release()
        lock1.release()

class Senate(threading.Thread):
    def run(self):
        print "Senate starting..."
        lock2.acquire()
        time.sleep(1)
        lock1.acquire()
        print "Senate running"
        lock1.release()
        lock2.release()

h = House()
s = Senate()

h.start()
s.start()

Similarly, I can have data corruption.  I can't get memory corruption in 
the way you can get in a C/C++ program, but I can certainly have one 
thread produce data for another thread to consume, and then 
(incorrectly) continue to mutate that data after it relinquishes 
ownership.

Let's say I have a Queue.  A producer thread pushes work units onto the 
Queue and a consumer thread pulls them off the other end.  If my 
producer thread does something like:

work = {'id': 1, 'data': "The Larch"}
my_queue.put(work)
work['id'] = 3

I've got a race condition where the consumer thread may get an id of 
either 1 or 3, depending on exactly when it reads the data from its end 
of the queue (more precisely, exactly when it uses that data).

Here's a somewhat different example of data corruption between threads:

import threading
import random
import sys

sketch = "The Dead Parrot"

class T1(threading.Thread):
    def run(self):
        current_sketch = str(sketch)
        while 1:
            if sketch != current_sketch:
                print "Blimey, it's changed!"
                return

class T2(threading.Thread):
    def run(self):
        sketches = ["Piranah Brothers",
                    "Spanish Enquisition",
                    "Lumberjack"]
        while 1:
            global sketch
            sketch = random.choice(sketches)

t1 = T1()
t2 = T2()
t2.daemon = True

t1.start()
t2.start()

t1.join()
sys.exit()