collect data using threads

Jeremy Jones zanesdad at bellsouth.net
Tue Jun 14 12:15:57 EDT 2005


Kent Johnson wrote:

>Peter Hansen wrote:
>  
>
>>Qiangning Hong wrote:
>>
>>    
>>
>>>A class Collector, it spawns several threads to read from serial port.
>>>Collector.get_data() will get all the data they have read since last
>>>call.  Who can tell me whether my implementation correct?
>>>      
>>>
>>[snip sample with a list]
>>
>>    
>>
>>>I am not very sure about the get_data() method.  Will it cause data lose
>>>if there is a thread is appending data to self.data at the same time?
>>>      
>>>
>>That will not work, and you will get data loss, as Jeremy points out.
>>
>>Normally Python lists are safe, but your key problem (in this code) is 
>>that you are rebinding self.data to a new list!  If another thread calls 
>>on_received() just after the line "x = self.data" executes, then the new 
>>data will never be seen.
>>    
>>
>
>Can you explain why not? self.data is still bound to the same list as x. At least if the execution sequence is 
>x = self.data
>                    self.data.append(a_piece_of_data)
>self.data = []
>
>ISTM it should work.
>
>I'm not arguing in favor of the original code, I'm just trying to understand your specific failure mode.
>
>Thanks,
>Kent
>  
>
Here's the original code:

class Collector(object):
    def __init__(self):
        self.data = []
        spawn_work_bees(callback=self.on_received)

    def on_received(self, a_piece_of_data):
        """This callback is executed in work bee threads!"""
        self.data.append(a_piece_of_data)

    def get_data(self):
        x = self.data
        self.data = []
        return x

The more I look at this, the more I'm not sure whether data loss will 
occur.  For me, that's good enough reason to rewrite this code.  I'd 
rather be clear and certain than clever anyday. 

So, let's say you a thread T1 which starts in ``get_data()`` and makes 
it as far as ``x = self.data``.  Then another thread T2 comes along in 
``on_received()`` and gets as far as 
``self.data.append(a_piece_of_data)``.  ``x`` in T1's get_data()`` (as 
you pointed out) is still pointing to the list that T2 just appended to 
and T1 will return that list.  But what happens if you get multiple guys 
in ``get_data()`` and multiple guys in ``on_received()``?  I can't prove 
it, but it seems like you're going to have an uncertain outcome.  If 
you're just dealing with 2 threads, I can't see how that would be 
unsafe.  Maybe someone could come up with a use case that would disprove 
that.  But if you've got, say, 4 threads, 2 in each method....that's 
gonna get messy. 

And, honestly, I'm trying *really* hard to come up with a scenario that 
would lose data and I can't.  Maybe someone like Peter or Aahz or some 
little 13 year old in Topeka who's smarter than me can come up with 
something.  But I do know this - the more I think about this as to 
whether this is unsafe or not is making my head hurt.  If you have a 
piece of code that you have to spend that much time on trying to figure 
out if it is threadsafe or not, why would you leave it as is?  Maybe the 
rest of you are more confident in your thinking and programming skills 
than I am, but I would quickly slap a Queue in there.  If for nothing 
else than to rest from simulating in my head 1, 2, 3, 5, 10 threads in 
the ``get_data()`` method while various threads are in the 
``on_received()`` method.  Aaaagghhh.....need....motrin......


Jeremy Jones
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20050614/60549c80/attachment.html>


More information about the Python-list mailing list