[Async-sig] Asyncio, implementing a new fair scheduling reactor

Thu Jun 15 17:40:04 EDT 2017

Hi guys, recently I've been trying to implement a POC of the default
event loop implemented by asyncio but using a fair scheduling reactor.
At the moment is just
a POC [1], something to test the rationale and pending to be evolved
in someone more mature, but before of that I would prefer to share my
thoughts and get all of the comments from you.

The current implementation is based on a FIFO queue that is filled
with all of the callbacks that have to be executed, these callbacks
can stand for:

1) Run tasks, either to be started or resumed
2) Run future callbacks.
3) Run scheduled callbacks.
4) Run file descriptors callbacks.

Worth mentioning that most of them internally are chained in somehow,
perhaps a future callback can wake up a resumed task. Also, have in
mind that the API published by asyncio to schedule callbacks can be
used by asyncio itself or by the user, callsoon for example.

The usual flow the reactor is the following one:

1) Check the file descriptors with events, and stack the handlers into
the reactor.
2) Pop all outdated scheduled callbacks and push them into the reactor.
3) Iterate for N first elements at the queue, where N stands for the number of
the handles stacked at that moment. Future handles stacked during that
iteration won't be handled, they must wait until next whole iteration
4) Go to the point 1.

As you can observe here, the IO is only made once per loop and should
wait until all handles that are in a specific moment are executed.

This implements in somehow a natural backpressure, the read and also
the accept the new connections will rely on the buffers run by the
operating system.

That implementation can be seen as simple, but it stands on a solid
strategy and follows KISS design that helps to scare the bugs.

Why fair scheduling?

Not all code that is written in the same module, in terms of loop
sharing, has the same requirements. Some part might need N and other
parts M. When this implementation cant be decoupled, and it means that
the cost of placing them into a separated pieces inside of your
architecture are too expensive, in that scenario the developer cant
express this difference to make the underlying implementation aware of
that.

For example, an API with a regular endpoint accessed by the user and
another one with the health-check of the system, which has completely
different requirements in terms of IO. Not only due to the nature of
the resources accessed, also because of the frequency of use.
Meanwhile, the healthcheck is accessed to a known a frequency at X
seconds, the other endpoint has a variable frequency of use.

Do you believe that asyncio will be able to preserve the health-check
frequency at any moment? Absolutely not.

Therefore, the idea of implementing a fair scheduling reactor is based
on the needed of address these kind of situations, giving to the
developer an interface to isolate different resources.

Basic principles

The basic principles of the implementation are:

- The cost of the scheduling has to be the same of the current
implementation, no overhead
- The design has to follow the current one, having the implicit
backpressure that was commented.

I will focus in the second principle, taking into account that the
first one is a matter of implementation.

To achieve the same behavior, the new implementation only split the
resources - handles, schedules, file descriptors - in isolated
partitions to then implement for each partition the same algorithm
than the current one. The developer can create a new partition using a
new function called `spawn`, this function takes as an argument a
coroutine, the task wrapped to that coroutine and all of the resources
created inside this coroutine will belong to that partition. For
example:

>>> async def background_task():
>>>     task = [ fetch() for i in range(1000)]
>>>     return (await asyncio.gather(*t))
>>>
>>> async def foo():
>>>     return (await asyncio.spawn(background_task()))

All resources created inside the scope of the `background_tasks` are
isolated to one partition. The 1000 sockets will schedule callbacks
that will be stacked in the same queue.

The partition is by default identified with the hash of the task that
warps the `background_task`, but the user can pass an alternative
value.

>>> async def foo():
>>>     return (await asyncio.spawn(healtheck(), partition='healthcheck'))

Internally the implementation has a default ROOT partition that is
used for all of these resources that are not executed inside of the
scope of a spawn function. As you can guess, if you don't use the
spawn method the reactor will run exactly as the current
implementation. Having all the resources in the same queue.

Round robin between partitions.

The differents partitions that exist at some moment share the CPU
resource using a round robin strategy. It gives the same chance to all
partitions to run the same amount of handles, but with one
particularity. Each time that a partition runs out of handles, the
loop is restarted again to handle the file descriptors and the delayed
calls but only for that specific partition that runs out of handles.

The side effect is clear, have the same backpressure mechanism. But,
per partition.

The black hole of the current implementation.

There is always a but, at least I've found a situation where this
strategy can perform in the same way as the current one, without
applying any fair scheduling. Although the code uses the spawn method.

Have a look at the following snippet:

>>> async def healtheck(request):
>>>     await request.resp()
>>>
>>> async def view(request):
>>>     return (await asyncio.spawn(healthcheck(request)))

The task that wraps the healtcheck coroutine that is being isolated in
a partition, won't be scheduled until the data from the file
descriptor that is read by a callback that is in fact executed inside
of the ROOT partition. Therefore, in the worst case scenario, the fair
scheduling will become a simple FIFO scheduling.

IMHO there is not an easy way to solve that issue, or at least without
changing the current picture. And try to solve it, might end up having
a messy implementation and a buggy code.

Although, I believe that this still worth it, having in mind the
benefits that it will bring us for all of those cases where the user
needs to isolate resources.

Thoughts, comments, and others will be welcomed.

[1] https://github.com/pfreixes/qloop

-- 
--pau