[Tutor] Multi-thread environments
Kent Johnson
kent37 at tds.net
Thu Mar 30 14:59:30 CEST 2006
Liam Clarke wrote:
> Hi all,
>
> I'm working in my first multi-threaded environments, and I think I
> might have just been bitten by that.
>
> class Parser:
> def __init__(self, Q):
> self.Q = Q
> self.players = {}
> self.teams = {}
>
> def sendData(self):
> if not self.players or not self.teams: return
> self.Q.put((self.players, self.teams))
> self.resetStats()
>
> def resetStats():
> for key in self.players:
> self.players[key] = 0
> for key in self.teams:
> self.teams[key] = 0
>
> What I'm finding is that if a lot more sets of zeroed data are being
> sent to the DAO than should occur.
>
> If the resetStats() call is commented out, data is sent correctly. I
> need to reset the variables after each send so as to not try and
> co-ordinate state with a database, otherwise I'd be away laughing.
>
> My speculation is that because the Queue is shared between two
> threads, one of which is looping on it, that a data write to the Queue
> may actually occur after the next method call, the resetStats()
> method, has occurred.
>
> So, the call to Queue.put() is made, but the actual data is accessedin
> memory by the Queue after resetStats has changed it.
You're close. The call to Queue.put() is synchronous - it will finish
before the call to resetStats() is made - but the *data* is still shared.
What is in the Queue is references to the dicts that is also referenced
by self.players and self.teams. The actual dict is not copied! This is
normal Python function call and assignment semantics, but in this case
it's not what you want. You have a race condition - if the data in the
Queue is processed before the call to resetStats() is made, it will work
fine; if resetStats() is called first, it will be a problem. Actually
there are many possible failures since resetStats() loops over the
dicts, the consumer could be interleaving its reads with the writes in
resetStats().
What you need to do is copy the data, either before you put it in the
queue or as part of the reset. I suggest rewriting resetStats() to
create new dicts because dict.fromkeys() will do just what you want:
def resetStats():
self.players = dict.fromkeys(self.players.keys(), 0)
self.teams = dict.teams(self.players.keys(), 0)
This way you won't change the data seen by the consumer thread.
> I've spent about eight hours so far trying to debug this; I've never
> been this frustrated in a Python project before to be honest... I've
> reached my next skill level bump, so to speak.
Yes, threads can be mind-bending until you learn to spot the gotchas
like this.
By the way you also have a race condition here:
> if self.dump:
> self.parser.sendDat()
> self.dump = False
Possibly the thread that sets self.dump will set it again between the
time you test it and when you reset it. If the setting thread is on a
timer and the time is long enough, it won't be a problem, but it is a
potential bug.
Kent
More information about the Tutor
mailing list