[Tutor] Starbucks does not use two-phase commit

Todd Maynard python-tutor at toddmaynard.com
Sun Jan 22 15:52:23 CET 2006


Well Danny, now I know how I am gonna spend my Sunday....

Thanks for the great explanation and the resources.   Of course do you think I 
could manage to get the code to break - of course not.... Usually I have the 
opposite problem.  Anyways I think that your explanation makes perfect sense.

My problem with your original code is that _jobLoop could sometimes return 
when there where still jobs (from separateCaller) still left in the queue.  
When separateCaller tried to ticket.claim , the self.result = self.q.get() 
would block, causing the program to hang indefinitely.  This is what I was 
trying to prevent by using the timeout in the get() call and then handling 
the possible Empty exception. 

I am now gonna play with this some more to see if I can build a robust/clean 
coffeeshop framework, with customers placing orders with a cashier , the 
cashier passing the orders to a barista and the barista processing the orders 
and delivering to the customers.  The idea of course being that the 
customers, cashier, and baristas each run in different threads.  Then to 
enhance with multiple cashiers and baristas....

but first I need to put another pot of coffee on.

If you don't hear from me in a while, I've probably suffered a caffeine 
overdose.  


Thanks for the inspiration,

Todd Maynard


-- 
The tao that can be tar(1)ed
is not the entire Tao.
The path that can be specified 
is not the Full Path.

We declare the names
of all variables and functions.
Yet the Tao has no type specifier.

Dynamically binding, you realize the magic.
Statically binding, you see only the hierarchy.

Yet magic and hierarchy
arise from the same source,
and this source has a null pointer.

Reference the NULL within NULL,
it is the gateway to all wizardry.


On Sunday 22 January 2006 03:13, Danny Yoo wrote:
> On Sat, 21 Jan 2006, Todd Maynard wrote:
> > I want to thank you for ruining my plans for a relaxing Saturday
> > morning.  As a thread newbie I killed several hours playing around with
> > your code.
>
> Hi Todd,
>
> Sorry about that.  I hope you were relaxing in a cafe while playing with
> the code.
>
> > One thing I noticed is that sometimes the program would hang, which I
> > figured was the Queue code blocking in the Ticket claim function. I used
> > exception handling to deal with that situation cleanly.
>
> That's odd.  There shouldn't be anything that blocks the code.  Oh!  Did
> you make changes to the test code, or did the hanging occur in the
> original code in:
>
>     http://mail.python.org/pipermail/tutor/2006-January/044567.html
>
> I'm curious because nothing there should fundamentally block, assuming
> that _doJob() doesn't dies badly with an exception.  If _doJob() dies, the
> server dies, and that's bad.  *grin*
>
> Do you mind showing what the exception handling looks like in your code?
>
> > I then decided that it wasn't very nice of Starbucks to close after
> > accepting my order without giving me my Latte, so I changed that part of
> > the code to:
>
> [code cut]
>
> > I am 99.44% sure that this is thread safe, reasoning being:
> > 	 setting the acceptNew to False and adding the QUIT_NOW happens in the
> > same thread so it is impossible for another job to get scheduled after
> > the QUIT_NOW - so no thread will end up hanging...
>
> Bad news: no.  *grin*
>
> There's a "race condition".  Let's go into some detail with this, since
> this is not obvious stuff.
>
> First, let's look at the code again --- I'll label three lines with (a),
> (b), and (c), to make it a little easier to see the race.
>
>
> ###############################################################
> def schedule(self,job):
>     if self.acceptNew == True:                           ## (a)
>         outputQueue=Queue()
>         self.queue.put((job,outputQueue))
>         return Ticket(outputQueue)
>     else:
>          print "Server not accepting any new requests."
>          return None
>
> def scheduleShutdown(self):                              ## (b)
>     self.queue.put((Server._QUIT_NICELY,None))
>
> def _jobLoop(self):
>     while True:
>         print "Looping ... "
>         (nextJob, outputQueue) = self.queue.get()
>         if nextJob is server._QUIT_NOW:
>             return
>         if nextJob is server._QUIT_NICELY:               ## (c)
>             self.acceptNew = False
>             self.queue.put((Server._QUIT_NOW,None))
>         else:
>             returnValue=self._doJob(nextJob)
>             outputQueue.put(returnValue)
> ##############################################################
>
>
> Let's imagine three threads, which I'll name C1, C2, and S.  C1 and C2
> will be distinct client threads, and S will be the server thread that runs
> through _jobLoop().
>
> Imagine the following scenario.  The server's online, and its work queue
> is empty.
>
>     1.  C1 calls schedule(), and reaches the line labeled (a).  At this
>         point, server.acceptNew is True, so it goes into the body of the
>         if statement.  But wait...
>
>     2.  Now we context switch to C2.  C2 calls scheduleShutdown()
>         in its entirety.  There is now a _QUIT_NICELY element in the
>         queue.  C2 is done for.
>
>     3.  Now we context switch to the server thread S.  It grabs the
>         _QUIT_NICELY, and puts a _QUIT_NOW.  Let's imagine that S
>         continues and loops again.  In the next loop through _jobLoop(),
>         it sees _QUIT_NOW and exits.   S is done for.  Muhahaha.
>
>     4.  Now we context switch back to C1 and continue with:
>
>         outputQueue = Queue()
>         self.queue.put((job,outputQueue))
>         return Ticket(outputQueue)
>
> In this scenario, poor C1 is left holding a ticket that will never cash
> out.
>
> One way to fix this problem is to make calling schedule() and
> scheduleShutdown() "atomic" in this sense: if we're calling schedule(), we
> shouldn't be able to context switch into a call to scheduleShutdown(), and
> visa-versa.
>
> Our troubles started at step 2 of the above scenario, where two clients
> jostled for attention.  If we prevent that particular situation --- if we
> force all our clients to stand in line to get served --- then we'll be
> fine.  So we might look into some synchronizing tool, like a Lock object:
>
>     http://www.python.org/doc/lib/lock-objects.html
>
> Concretely, we can add an exclusive Lock object to the server's __init__:
>
>     def __init__(self):
>         self.clientLock = Lock()
>
> and make sure to acquire-and-release in any of our schedule* functions:
>
>     def schedule*(self, job):
>         self.clientLock.acquire()
>         try:
>             ...
>         finally:
>             self.clientLock.release()
>
> > However, I would sleep a little better if you could reassure me that I
> > am right, and would sleep even better if you could give me a method to
> > test this.
>
> I'm sorry; I can't provide either.  That doesn't mean that such things
> don't exist, but only that I don't know about them.  (The only formal
> training I've received on this, so far, has been a standard intro
> Operating Systems CS course.)
>
> So you might want to check with others.  The way I caught the race
> condition above was by just by trying to be in a very foul mood while
> reading the code.
>
> > This kinda stuff looks tricky to test with standard unittest
> > methodology....
>
> It might be hard.  It's an active CS research topic to formally check for
> race conditions.  There may be commercial tools to build unit tests to
> hammer for race conditions, but I don't know much about them.
>
> For example, associate Professor Cormac Flanagan does research on
> statically detecting race conditions:
>
>     http://www.soe.ucsc.edu/~cormac/
>
> And Google Scholar does show a heck of a lot of recent papers on this:
>
> http://scholar.google.com/scholar?hl=en&lr=&q=thread+race+condition&btnG=Se
>arch
>
> so this might not be a closed issue yet.
>
>
> Hmmm... hey, Flanagan's paper looks interesting!  Ok, I'm printing it out
> now... now where's my coffee...  Weekend reading for me.
>
> Well, thanks for ruining my weekend too.  *grin*



More information about the Tutor mailing list