[Python-Dev] ThreadedProcessPoolExecutor

Guido van Rossum guido at python.org
Wed Mar 21 11:23:08 EDT 2018


Roberto,

That looks like an interesting class. I presume you're intending to publish
this as a pip package on PyPI.python.org?

I'm no lawyer, but I believe you can license your code under a new license
(I recommend BSD) as long as you keep a copy and a mention of the PSF
license in your distribution as well. (Though perhaps you could structure
your code differently and inherit from the standard library modules rather
than copying them?)

In terms of the package namespace, do not put it in the same namespace as
standard library code! It probably won't work and will cause world-wide
pain and suffering for the users of your code. Invent your project name and
use that as a top-level namespace, like everyone else. :-)

Good luck with your project,

--Guido



On Wed, Mar 21, 2018 at 8:03 AM, Roberto Martínez <
robertomartinezp at gmail.com> wrote:

> Hi,
>
> I've made a custom concurrent.futures.Executor mixing the
> ProcessPoolExecutor and ThreadPoolExecutor.
>
> I've published it here:
>
> https://github.com/nilp0inter/threadedprocess
>
> This executor is very similar to a ProcessPoolExecutor, but each process
> in the pool have it's own ThreadPoolExecutor inside.
>
> The motivation for this executor is mitigate the problem we have in a
> project were we have a very large number of long running IO bounded tasks,
> that have to run concurrently. Those long running tasks have sparse CPU
> bounded operations.
>
> To resolve this problem I considered multiple solutions:
>
>    1. Use asyncio to run the IO part as tasks and use a
>    ProcessPoolExecutor to run the CPU bounded operations with
>    "run_in_executor". Unfortunately the CPU operations depends on a large
>    memory context, and using a ProcessPoolExecutor this way force the parent
>    process to picklelize all the context to send it to the task, and because
>    the context is so large, this operation is itself very CPU demanding. So it
>    doesn't work.
>    2. Executing the IO/CPU bounded operations in different processes with
>    multiprocessing.Process. This actually works, but the number of idle
>    processes in the system is too large, resulting in a bad memory footprint.
>    3. Executing the IO/CPU bounded operations in threads. This doesn't
>    work because the sum of all CPU operations saturate the core where the
>    Python process is running and the other cores are wasted doing nothing.
>
> So I coded the ThreadedProcessPoolExecutor that helped me maintaining the
> number of processes under control (I just have one process per CPU core)
> allowing me to have a very high concurrency (hundreds of threads per
> process).
>
> I have a couple of questions:
>
> The first one is about the license. Given that I copied the majority of
> the code from the concurrent.futures library, I understand that I have to
> publish the code under the PSF LICENSE. Is this correct?
>
> My second question is about the package namespace. Given that this is an
> concurrent.futures.Executor subclass I understand that more intuitive place
> to locate it is under concurrent.futures. Is this a suitable use case for
> namespace packages? Is this a good idea?
>
> Best regards,
> Roberto
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180321/a03791bd/attachment.html>


More information about the Python-Dev mailing list