[Python-Dev] ThreadedProcessPoolExecutor

Roberto Martínez robertomartinezp at gmail.com
Wed Mar 21 11:03:20 EDT 2018


Hi,

I've made a custom concurrent.futures.Executor mixing the
ProcessPoolExecutor and ThreadPoolExecutor.

I've published it here:

https://github.com/nilp0inter/threadedprocess

This executor is very similar to a ProcessPoolExecutor, but each process in
the pool have it's own ThreadPoolExecutor inside.

The motivation for this executor is mitigate the problem we have in a
project were we have a very large number of long running IO bounded tasks,
that have to run concurrently. Those long running tasks have sparse CPU
bounded operations.

To resolve this problem I considered multiple solutions:

   1. Use asyncio to run the IO part as tasks and use a ProcessPoolExecutor
   to run the CPU bounded operations with "run_in_executor". Unfortunately the
   CPU operations depends on a large memory context, and using a
   ProcessPoolExecutor this way force the parent process to picklelize all the
   context to send it to the task, and because the context is so large, this
   operation is itself very CPU demanding. So it doesn't work.
   2. Executing the IO/CPU bounded operations in different processes with
   multiprocessing.Process. This actually works, but the number of idle
   processes in the system is too large, resulting in a bad memory footprint.
   3. Executing the IO/CPU bounded operations in threads. This doesn't work
   because the sum of all CPU operations saturate the core where the Python
   process is running and the other cores are wasted doing nothing.

So I coded the ThreadedProcessPoolExecutor that helped me maintaining the
number of processes under control (I just have one process per CPU core)
allowing me to have a very high concurrency (hundreds of threads per
process).

I have a couple of questions:

The first one is about the license. Given that I copied the majority of the
code from the concurrent.futures library, I understand that I have to
publish the code under the PSF LICENSE. Is this correct?

My second question is about the package namespace. Given that this is an
concurrent.futures.Executor subclass I understand that more intuitive place
to locate it is under concurrent.futures. Is this a suitable use case for
namespace packages? Is this a good idea?

Best regards,
Roberto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180321/2e25513b/attachment.html>


More information about the Python-Dev mailing list