collect data using threads
Jeremy Jones
zanesdad at bellsouth.net
Tue Jun 14 12:15:57 EDT 2005
Kent Johnson wrote:
>Peter Hansen wrote:
>
>
>>Qiangning Hong wrote:
>>
>>
>>
>>>A class Collector, it spawns several threads to read from serial port.
>>>Collector.get_data() will get all the data they have read since last
>>>call. Who can tell me whether my implementation correct?
>>>
>>>
>>[snip sample with a list]
>>
>>
>>
>>>I am not very sure about the get_data() method. Will it cause data lose
>>>if there is a thread is appending data to self.data at the same time?
>>>
>>>
>>That will not work, and you will get data loss, as Jeremy points out.
>>
>>Normally Python lists are safe, but your key problem (in this code) is
>>that you are rebinding self.data to a new list! If another thread calls
>>on_received() just after the line "x = self.data" executes, then the new
>>data will never be seen.
>>
>>
>
>Can you explain why not? self.data is still bound to the same list as x. At least if the execution sequence is
>x = self.data
> self.data.append(a_piece_of_data)
>self.data = []
>
>ISTM it should work.
>
>I'm not arguing in favor of the original code, I'm just trying to understand your specific failure mode.
>
>Thanks,
>Kent
>
>
Here's the original code:
class Collector(object):
def __init__(self):
self.data = []
spawn_work_bees(callback=self.on_received)
def on_received(self, a_piece_of_data):
"""This callback is executed in work bee threads!"""
self.data.append(a_piece_of_data)
def get_data(self):
x = self.data
self.data = []
return x
The more I look at this, the more I'm not sure whether data loss will
occur. For me, that's good enough reason to rewrite this code. I'd
rather be clear and certain than clever anyday.
So, let's say you a thread T1 which starts in ``get_data()`` and makes
it as far as ``x = self.data``. Then another thread T2 comes along in
``on_received()`` and gets as far as
``self.data.append(a_piece_of_data)``. ``x`` in T1's get_data()`` (as
you pointed out) is still pointing to the list that T2 just appended to
and T1 will return that list. But what happens if you get multiple guys
in ``get_data()`` and multiple guys in ``on_received()``? I can't prove
it, but it seems like you're going to have an uncertain outcome. If
you're just dealing with 2 threads, I can't see how that would be
unsafe. Maybe someone could come up with a use case that would disprove
that. But if you've got, say, 4 threads, 2 in each method....that's
gonna get messy.
And, honestly, I'm trying *really* hard to come up with a scenario that
would lose data and I can't. Maybe someone like Peter or Aahz or some
little 13 year old in Topeka who's smarter than me can come up with
something. But I do know this - the more I think about this as to
whether this is unsafe or not is making my head hurt. If you have a
piece of code that you have to spend that much time on trying to figure
out if it is threadsafe or not, why would you leave it as is? Maybe the
rest of you are more confident in your thinking and programming skills
than I am, but I would quickly slap a Queue in there. If for nothing
else than to rest from simulating in my head 1, 2, 3, 5, 10 threads in
the ``get_data()`` method while various threads are in the
``on_received()`` method. Aaaagghhh.....need....motrin......
Jeremy Jones
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20050614/60549c80/attachment.html>
More information about the Python-list
mailing list