Passing data across callbacks in ThreadPoolExecutor

Thu Feb 18 14:06:29 EST 2016

On Thur, Feb 17, 2016 at 9:24 AM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
>> What is the pattern for chaining execution of tasks with ThreadPoolExecutor?
>> Callbacks is not an adequate facility as each task I have will generate new output.
>
> Can you specify in more detail what your use case is?
>
> If you don't mind having threads sitting around waiting, you can just
> submit each chained task at the start and have each task wait on the
> futures of its dependencies. The advantage of this is that it's easy
> to conceptualize the dependency graph of the tasks. The disadvantage
> is that it eats up extra threads. You'll probably want to increase the
> size of the thread pool to handle the waiting tasks (or use a separate
> ThreadPoolExecutor for each chained task).

The thing with callbacks is each one gets the original tasks result. So the
callback can't pass a new task result up and/or cancel the task "set".

> Otherwise, is there some reason you can't use multiple callbacks, one
> to handle the task's output and one to submit the chained task?
> 
> E.g.:
> 
> def chain_task2(input, f2):
>     f2 = executor.submit(task2, input, f2.result())
>     f2.add_done_callback(handle_task2_done)
> 
> f1 = executor.submit(task1, input)
> f1.add_done_callback(handle_task1_done)
> f1.add_done_callback(functools.partial(chain_task2, input))

Closing over the executer instance seems potentially race'y. What happens when
the last task finishes and the executer checks (locking the queue) and the task also
wants to add more work. I'll check the source, but I had hoped for a more succinct way.

Thanks,
jlc