[Async-sig] async executors

Christoph Groth christoph at grothesque.org
Mon Jun 13 12:10:30 EDT 2016


Hi Yury,

Thanks for your insightful reply.

>> • Performance is better, especially when many worker processes 
>> are  involved, because of the simpler code that uses less 
>> locking  (the only locking that remains is inside the 
>> multiprocessing  module).
>
> Yes, it will be better, but the performance improvements will be 
> visible only for 100s or 1000s of processes.  And only if they 
> are executing short-living tasks.
>
> Is there a real need for a bit faster process pool?

You are right of course, 1000s of processes is not a realistic 
application of concurrent.futures.ProcessPoolExecutor on a normal 
computer.

When I started playing with async executor I had computing 
clusters in mind.  There exist several concurrent.futures-like 
libraries that work with computing clusters and support 1000s of 
worker processes (e.g. ipyparallel, distribute, SCOOP).  Several 
of these packages use async programming (e.g. tornado or greenlet) 
for their internals (schedulers, controlers, etc.), but the 
futures that they provide use locking, just like that of 
concurrent.futures.

In order to estimate whether an async executor could be useful for 
cluster-like workloads, I did some tests on my laptop with many 
worker processes that do mostly nothing.  The result is that using 
asyncio's run_in_executor allows to process up to 1000 tasks per 
second.  Using aexecutor, this number grows to 5000.  These two 
numbers seem reasonably robust when, for example, the number of 
workers is varied.

That factor 5 difference is not enormous, but perhaps there's some 
potential for improvement (optimization, curio?).  Let's try a 
real-world check: I am often using a cluster of 1000 cores, and 
each core is about 50% as fast as the core in my laptop.  So 
run_in_executor() will be overcharged if the average task takes 
less than 2 seconds to run.  This doesn't sound like a terrible 
restriction, one would certainly try to have tasks that run longer 
than that.  But I can certainly imagine useful workloads where 
tasks run less than 2 seconds and the parameters and results of 
each task are only a few numbers at most, so that the 
communication bandwidth shouldn't be a limit.

>> Based on concurrent.futures.ProcessPoolExecutor, I’ve made a 
>> proof-of-concept implementation [3] that demonstrates that the 
>> idea works.  (There have been some improvements compared to the 
>> version that I posted on python-ideas.)
>
> I think you should stabilize the code base, and release it as a 
> module on PyPI.

I will do that, once I have some confidence that there are no 
obvious blunders in it (see further below).

>> I would be grateful if any asyncio experts could have a look at 
>> the part of the main loop of the process management coroutine 
>> where the coroutine awaits new data [4].  Currently, I am using 
>> an asyncio.Event that is set by a callback via asyncio’s 
>> add_reader().  Is this the most natural way to do it currently? 
>
> I think you can use asyncio.Future for resuming that coroutine.

I'm not sure what you mean.  In the current code (line in 
concurrent.futures), a multiprocessing.SimpleQueue instance is 
used to receive results from the workers.  That queue has an 
attribute _reader._handle that is just a file descriptor.

I use BaseEventLoop.add_reader() to add a callback for that file 
descriptor.  The callback sets an asyncio.Event and the main loop 
of _queue_management_worker() waits for this event.

How could I use asyncio.Future here?

One problem with my current solution is that the event gets set 
also when there is no data in the queue yet.  That's why I use the 
method poll() of the reader to verify if there's anything to be 
read and if there isn't, the event is cleared again and the 
waiting resumes.  The spurious events could be an indication that 
there is a better way of solving the problem, hence my original 
question.

Thanks,
Christoph
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/async-sig/attachments/20160613/5c1fabaf/attachment.sig>


More information about the Async-sig mailing list