[Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation

Sat Jan 12 07:39:42 CET 2013

Jan Kaliszewski, 12.01.2013 02:28:
> 12.01.2013 00:41, Guido van Rossum wrote:
> 
>> Here's an interesting puzzle. Check out the core of Tulip's event
>> loop: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672
>>
>> Specifically this does something like this:
>>
>> 1. poll for I/O, appending any ready handlers to the _ready queue
>>
>> 2. append any handlers scheduled for a time <= now to the _ready queue
>>
>> 3. while _ready:
>>        handler = _ready.popleft()
>>        call handler
>>
>> It is the latter loop that causes me some concern. In theory it is
>> possible for a bad callback to make this loop never finish, as
>> follows:
>>
>> def hogger():
>>     tulip.get_event_loop().call_soon(hogger)
>>
>> Because call_soon() appends the handler to the _ready queue, the while
>> loop will never finish.
>>
>> There is a simple enough solution (Tornado uses this AFAIK):
>>
>> now_ready = list(_ready)
>> _ready.clear()
>> for handler in now_ready:
>>     call handler
>>
>> However this implies that we go back to the I/O polling code more
>> frequently. While the I/O polling code sets the timeout to zero when
>> there's anything in the _ready queue, so it won't block, it still
>> isn't free; it's an expensive system call that we'd like to put off
>> until we have nothing better to do.
> [...]
>> So what's more important? Avoid I/O starvation at all cost or make the
>> callbacks-posting-callbacks pattern efficient? I can see several
>> outcomes of this discussion: we could end up deciding that one or the
>> other strategy is always best; we could also leave it up to the
>> implementation (but then I still would want guidance for what to do in
>> Tulip); we could even decide this is so important that the user needs
>> to be able to control the policy here
> [...]
> 
> Maybe it could be, at least for the standard Tulip implementation,
> parameterizable with a simple integer value -- the suggested max number
> of loop iterations?
> 
> E.g. something like the following:
> 
>     # `suggested_iter_limit` is the parameter
>     actual_limit = max(len(_ready), suggested_iter_limit)
>     for i in range(actual_limit):
>         if not _ready:
>             break
>         handler = _ready.popleft()
>         call handler...

Yep, it could simply use itertools.islice() when iterating over _ready with
an appropriate upper bound factor relative to the actual length, and then
cut down the list after the loop. So it would never go, say, 50% over the
initially anticipated workload. Or rather a fixed number, I guess, to make
it more predictable for users. That would be a user configurable parameter
to the I/O loop.

    actual_limit = len(_ready) + max_additional_load_per_loop
    for handler in itertools.islice(_ready, None, actual_limit):
        call handler...
    del _ready[:actual_limit]

Stefan