[Python-Dev] A more flexible task creation

Thu Jun 14 21:46:18 EDT 2018

On Thu, Jun 14, 2018 at 3:31 PM, Tin Tvrtković <tinchester at gmail.com> wrote:
> * my gut feeling is spawning a thousand tasks and having them all fighting
> over the same semaphore and scheduling is going to be much less efficient
> than a small number of tasks draining a queue.

Fundamentally, a Semaphore is a queue:

https://github.com/python/cpython/blob/9e7c92193cc98fd3c2d4751c87851460a33b9118/Lib/asyncio/locks.py#L437

...so the two approaches are more analogous than it might appear at
first. The big difference is what objects are in the queue. For a web
scraper, the options might be either a queue where each entry is a URL
represented as a str, versus a queue where each entry is (effectively)
a Task object with attached coroutine object.

So I think the main differences you'll see in practice are:

- a Task + coroutine aren't terribly big -- maybe a few kilobytes --
but definitely larger than a str; so the Semaphore approach will take
more RAM. Modern machines have lots of RAM, so for many use cases this
is still probably fine (50,000 tasks is really not that many). But
there will certainly be some situations where the str queue fits in
RAM but the Task queue doesn't.

- If you create all those Task objects up front, then that front-loads
a chunk of work (i.e., allocating all those objects!) that otherwise
would be spread throughout the queue processing. So you'll see a
noticeable pause up front before the code starts working.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org