collect data using threads
Kent Johnson
kent37 at tds.net
Tue Jun 14 14:11:09 EDT 2005
Qiangning Hong wrote:
> I actually had considered Queue and pop() before I wrote the above code.
> However, because there is a lot of data to get every time I call
> get_data(), I want a more CPU friendly way to avoid the while-loop and
> empty checking, and then the above code comes out. But I am not very
> sure whether it will cause serious problem or not, so I ask here. If
> anyone can prove it is correct, I'll use it in my program, else I'll go
> back to the Queue solution.
OK, here is a real failure mode. Here is the code and the disassembly:
>>> class Collector(object):
... def __init__(self):
... self.data = []
... def on_received(self, a_piece_of_data):
... """This callback is executed in work bee threads!"""
... self.data.append(a_piece_of_data)
... def get_data(self):
... x = self.data
... self.data = []
... return x
...
>>> import dis
>>> dis.dis(Collector.on_received)
6 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 1 (data)
6 LOAD_ATTR 2 (append)
9 LOAD_FAST 1 (a_piece_of_data)
12 CALL_FUNCTION 1
15 POP_TOP
16 LOAD_CONST 1 (None)
19 RETURN_VALUE
>>> dis.dis(Collector.get_data)
8 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 1 (data)
6 STORE_FAST 1 (x)
9 9 BUILD_LIST 0
12 LOAD_FAST 0 (self)
15 STORE_ATTR 1 (data)
10 18 LOAD_FAST 1 (x)
21 RETURN_VALUE
Imagine the thread calling on_received() gets as far as LOAD_ATTR (data), LOAD_ATTR (append) or LOAD_FAST (a_piece_of_data), so it has a reference to self.data; then it blocks and the get_data() thread runs. The get_data() thread could call get_data() and *finish processing the returned list* before the on_received() thread runs again and actually appends to the list. The appended value will never be processed.
If you want to avoid the overhead of a Queue.get() for each data element you could just put your own mutex into on_received() and get_data().
Kent
More information about the Python-list
mailing list