[Tutor] Multi-thread environments

Kent Johnson kent37 at tds.net
Thu Mar 30 14:59:30 CEST 2006


Liam Clarke wrote:
> Hi all,
> 
> I'm working in my first multi-threaded environments, and I think I
> might have just been bitten by that.
> 
> class Parser:
>     def __init__(self, Q):
>     self.Q = Q
>     self.players = {}
>     self.teams = {}
> 
>     def sendData(self):
>         if not self.players or not self.teams: return
>         self.Q.put((self.players, self.teams))
>         self.resetStats()
> 
>     def resetStats():
>         for key in self.players:
>             self.players[key] = 0
>         for key in self.teams:
>             self.teams[key] = 0
> 

> What I'm finding is that if a lot more sets of zeroed data are being
> sent to the DAO than should occur.
> 
> If the resetStats() call is commented out, data is sent correctly. I
> need to reset the variables after each send so as to not try and
> co-ordinate state with a database, otherwise I'd be away laughing.
> 
> My speculation is that because the Queue is shared between two
> threads, one of which is looping on it, that a data write to the Queue
> may actually occur after the next method call, the resetStats()
> method, has occurred.
> 
> So, the call to Queue.put() is made, but the actual data is accessedin
> memory by the Queue after resetStats has changed it.

You're close. The call to Queue.put() is synchronous - it will finish 
before the call to resetStats() is made - but the *data* is still shared.

What is in the Queue is references to the dicts that is also referenced 
by self.players and self.teams. The actual dict is not copied! This is 
normal Python function call and assignment semantics, but in this case 
it's not what you want. You have a race condition - if the data in the 
Queue is processed before the call to resetStats() is made, it will work 
fine; if resetStats() is called first, it will be a problem. Actually 
there are many possible failures since resetStats() loops over the 
dicts, the consumer could be interleaving its reads with the writes in 
resetStats().

What you need to do is copy the data, either before you put it in the 
queue or as part of the reset. I suggest rewriting resetStats() to 
create new dicts because dict.fromkeys() will do just what you want:
   def resetStats():
     self.players = dict.fromkeys(self.players.keys(), 0)
     self.teams = dict.teams(self.players.keys(), 0)

This way you won't change the data seen by the consumer thread.

> I've spent about eight hours so far trying to debug this; I've never
> been this frustrated in a Python project before to be honest... I've
> reached my next skill level bump, so to speak.

Yes, threads can be mind-bending until you learn to spot the gotchas 
like this.

By the way you also have a race condition here:
>             if self.dump:
>                 self.parser.sendDat()
>                 self.dump = False

Possibly the thread that sets self.dump will set it again between the 
time you test it and when you reset it. If the setting thread is on a 
timer and the time is long enough, it won't be a problem, but it is a 
potential bug.

Kent



More information about the Tutor mailing list