collect data using threads

James Tanis jtanis at pycoder.org
Tue Jun 14 12:23:35 EDT 2005


Previously, on Jun 14, Jeremy Jones said: 

# Kent Johnson wrote:
# 
# > Peter Hansen wrote:
# >  
# > > Qiangning Hong wrote:
# > > 
# > >    
# > > > A class Collector, it spawns several threads to read from serial port.
# > > > Collector.get_data() will get all the data they have read since last
# > > > call.  Who can tell me whether my implementation correct?
# > > >      
# > > [snip sample with a list]
# > > 
# > >    
# > > > I am not very sure about the get_data() method.  Will it cause data lose
# > > > if there is a thread is appending data to self.data at the same time?
# > > >      
# > > That will not work, and you will get data loss, as Jeremy points out.
# > > 
# > > Normally Python lists are safe, but your key problem (in this code) is
# > > that you are rebinding self.data to a new list!  If another thread calls
# > > on_received() just after the line "x = self.data" executes, then the new
# > > data will never be seen.
# > >    
# > 
# > Can you explain why not? self.data is still bound to the same list as x. At
# > least if the execution sequence is x = self.data
# >                    self.data.append(a_piece_of_data)
# > self.data = []
# > 
# > ISTM it should work.
# > 
# > I'm not arguing in favor of the original code, I'm just trying to understand
# > your specific failure mode.
# > 
# > Thanks,
# > Kent
# >  
# Here's the original code:
# 
# class Collector(object):
#    def __init__(self):
#        self.data = []
#        spawn_work_bees(callback=self.on_received)
# 
#    def on_received(self, a_piece_of_data):
#        """This callback is executed in work bee threads!"""
#        self.data.append(a_piece_of_data)
# 
#    def get_data(self):
#        x = self.data
#        self.data = []
#        return x
# 
# The more I look at this, the more I'm not sure whether data loss will occur.
# For me, that's good enough reason to rewrite this code.  I'd rather be clear
# and certain than clever anyday. 
# So, let's say you a thread T1 which starts in ``get_data()`` and makes it as
# far as ``x = self.data``.  Then another thread T2 comes along in
# ``on_received()`` and gets as far as ``self.data.append(a_piece_of_data)``.
# ``x`` in T1's get_data()`` (as you pointed out) is still pointing to the list
# that T2 just appended to and T1 will return that list.  But what happens if
# you get multiple guys in ``get_data()`` and multiple guys in
# ``on_received()``?  I can't prove it, but it seems like you're going to have
# an uncertain outcome.  If you're just dealing with 2 threads, I can't see how
# that would be unsafe.  Maybe someone could come up with a use case that would
# disprove that.  But if you've got, say, 4 threads, 2 in each method....that's
# gonna get messy. 
# And, honestly, I'm trying *really* hard to come up with a scenario that would
# lose data and I can't.  Maybe someone like Peter or Aahz or some little 13
# year old in Topeka who's smarter than me can come up with something.  But I do
# know this - the more I think about this as to whether this is unsafe or not is
# making my head hurt.  If you have a piece of code that you have to spend that
# much time on trying to figure out if it is threadsafe or not, why would you
# leave it as is?  Maybe the rest of you are more confident in your thinking and
# programming skills than I am, but I would quickly slap a Queue in there.  If
# for nothing else than to rest from simulating in my head 1, 2, 3, 5, 10
# threads in the ``get_data()`` method while various threads are in the
# ``on_received()`` method.  Aaaagghhh.....need....motrin......
# 
# 
# Jeremy Jones
# 

I may be wrong here, but shouldn't you just use a stack, or in other 
words, use the list as a stack and just pop the data off the top. I 
believe there is a method pop() already supplied for you. Since 
you wouldn't require an self.data = [] this should allow you to safely 
remove the data you've already seen without accidentally removing data 
that may have been added in the mean time.

---
James Tanis
jtanis at pycoder.org
http://pycoder.org



More information about the Python-list mailing list