From pfreixes at gmail.com  Mon Jan  1 12:02:34 2018
From: pfreixes at gmail.com (Pau Freixes)
Date: Mon, 1 Jan 2018 18:02:34 +0100
Subject: [Async-sig] Asyncio loop instrumentation
In-Reply-To: <20171231200247.755b5c42@fsol>
References: <CA+ULCcHMUkSm5qXKhsYOcFCWmjXeBcX_zbQUHyhwHXZD57Oz3g@mail.gmail.com>
 <20171231200247.755b5c42@fsol>
Message-ID: <CA+ULCcFcMmwM-7muEKOW1=bCe47LuQOEudLSjD9sXOZczQruKQ@mail.gmail.com>

HI Antonie,

Regarding your questions

>
> What does it mean exactly? Is it the ratio of CPU time over wall clock
> time?

This can be considered a metric that informs you how much CPU
resources are being consumed by your loop, in the best case scenario
where there is only your process, this metric will match with the CPU
usage - important notice that will match with CPU where your process
is executed. Having many processes fighting for the same CPU this
number will be significantly different, taking into account that the
resources
are being divided by many consumers.

Therefore I would like to notice that this load is relative to your
loop rather than an objective value taken from the CPU metric.

To make so with `psutil` you must gather the CPU usage from that
specific CPU where your loop is currently running. Not an impossible
problem
but making it from something trivial to something more complicated.

In the case of the `time.thread_time`  I cant see how I could do that.
You would gather information related to the thread where your loop is
currently running, but there
s nothing straightforward that will help you to take into account
other threads that are fighting for that
specific CPU.

The solution presented is not perfect, and there is still some corner
cases where the load factor might not be enough accurate. The way of
the `load` method has to guess
if the loop is fighting for the CPU resources with other processes is
basically attributing only at maximum the timeout as sleeping time,
perhaps:

t0 = time()
select(fds, timeout=1)
t1 = time()
sleeping_time = min(t1 - t0, 1)

Therefore, if the call to the select took more than 1 second because
the scheduler decided to give the CPU to another process this lambda
time that goes beyond 1 second will be considered
as resource usage time. As you can imagine, the problem with that is
what happens when the select was ready before of 1 second, and the
schedule did not give back the CPU because there
was another more priority process, in that case, this time will be
attributed as sleeping time.


>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>
>> * `loop_start` : Executed when the loop starts for the first time.
>> * `tick_start` : Executed when a new loop tick is started.
>> * `io_start` : Executed when a new IO process starts.
>> * `io_end` : Executed when the IO process ends.
>> * `tick_end` : Executed when the loop tick ends.
>> * `loop_stop` : Executed when the loop stops.
>
> What do you call a "IO process" in this context?

Basically the call to the `select/poll/whatever` syscall that will ask
for read or write to a set of file descriptors.

Thanks,

-- 
--pau

From songofacandy at gmail.com  Mon Jan  1 19:34:13 2018
From: songofacandy at gmail.com (INADA Naoki)
Date: Tue, 2 Jan 2018 09:34:13 +0900
Subject: [Async-sig] Asyncio loop instrumentation
In-Reply-To: <CA+ULCcFcMmwM-7muEKOW1=bCe47LuQOEudLSjD9sXOZczQruKQ@mail.gmail.com>
References: <CA+ULCcHMUkSm5qXKhsYOcFCWmjXeBcX_zbQUHyhwHXZD57Oz3g@mail.gmail.com>
 <20171231200247.755b5c42@fsol>
 <CA+ULCcFcMmwM-7muEKOW1=bCe47LuQOEudLSjD9sXOZczQruKQ@mail.gmail.com>
Message-ID: <CAEfz+Tws=icpoBdT-Dq1nxFy0J_7mU9qyGN5an1JvVC_Bf4wvA@mail.gmail.com>

>>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>>
>>> * `loop_start` : Executed when the loop starts for the first time.
>>> * `tick_start` : Executed when a new loop tick is started.
>>> * `io_start` : Executed when a new IO process starts.
>>> * `io_end` : Executed when the IO process ends.
>>> * `tick_end` : Executed when the loop tick ends.
>>> * `loop_stop` : Executed when the loop stops.
>>
>> What do you call a "IO process" in this context?
>
> Basically the call to the `select/poll/whatever` syscall that will ask
> for read or write to a set of file descriptors.

`select/poll/whatever` syscalls doesn't ask for read or write.
It waits for read or write (more accurate, waits for readable or
writable state).

So poll_start / poll_end looks better name to me.

INADA Naoki  <songofacandy at gmail.com>


>
> Thanks,
>
> --
> --pau
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/

From pfreixes at gmail.com  Tue Jan  2 11:32:12 2018
From: pfreixes at gmail.com (Pau Freixes)
Date: Tue, 2 Jan 2018 17:32:12 +0100
Subject: [Async-sig] Asyncio loop instrumentation
In-Reply-To: <2B052CB7-FFCB-494C-97BA-DA8859B49598@gmail.com>
References: <CA+ULCcHMUkSm5qXKhsYOcFCWmjXeBcX_zbQUHyhwHXZD57Oz3g@mail.gmail.com>
 <20171231200247.755b5c42@fsol>
 <2B052CB7-FFCB-494C-97BA-DA8859B49598@gmail.com>
Message-ID: <CA+ULCcGyN2P3cRQnC9je1MxNvxcFhP_Ga_P5984F61Wd1T4wag@mail.gmail.com>

Hi Yuri,

Its good to know that we are on the same page regarding the lack of a
feature that should be a must. Since Asyncio has become stable and
widely used by many organizations - such as us [1], the needs of tools
that allow us to instrumentalize asynchronous code that runs on top of
Asyncio have increased.

A good example is how some changes in Aiohttp were implemented [2] -
disclaimer, I'm the author of this code part - to allow the developers
to gather more information about how the HTTP calls perform at both
layers, application, and protocol.

This proposal, just a POC, goes in the same direction and tries to
mitigate this lack for the event loop. The related work regarding the
`load` method is conjunctural but helps to understand why this feature
is such important.

I still believe that we can start to fill the gap for Python 3.7, if
finally the window time to implement it gets closed before all work is
done at least we will have some work done.

I still have some questions to be answered that might help to focus
this work in the right way. Few of them as a proof of the rationale.
Perhaps, how much coupled has to be this feature to the AbstractLoop
making it a specification for other loop implementations. And others
purely technical. But, it's true that we must go further with this
questions if we believe that we can take advantage of all of this
effort.

Regards,

[1] https://medium.com/@SkyscannerEng/running-aiohttp-at-scale-2656b7a83a09
[2] https://github.com/aio-libs/aiohttp/pull/2429

On Sun, Dec 31, 2017 at 8:12 PM, Yury Selivanov <yselivanov at gmail.com> wrote:
> When PEP 567 is accepted, I plan to implement advanced instrumentation in uvloop, to monitor basically all io/callback/loop events. I'm still -1 to do this in asyncio at least in 3.7, because i'd like us to have some time to experiment with such instrumentation in real production code (preferably at scale)
>
> Yury
>
> Sent from my iPhone
>
>> On Dec 31, 2017, at 10:02 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> On Sun, 31 Dec 2017 18:32:21 +0100
>> Pau Freixes <pfreixes at gmail.com> wrote:
>>>
>>> These new implementation of the load method - remember that it returns
>>> a load factor between 0.0 and 1.0 that inform you about how bussy is
>>> your loop -
>>
>> What does it mean exactly? Is it the ratio of CPU time over wall clock
>> time?
>>
>> Depending on your needs, the `psutil` library (*) and/or the new
>> `time.thread_time` function (**) may also help.
>>
>> (*) https://psutil.readthedocs.io/en/latest/
>> (**) https://docs.python.org/3.7/library/time.html#time.thread_time
>>
>>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>>
>>> * `loop_start` : Executed when the loop starts for the first time.
>>> * `tick_start` : Executed when a new loop tick is started.
>>> * `io_start` : Executed when a new IO process starts.
>>> * `io_end` : Executed when the IO process ends.
>>> * `tick_end` : Executed when the loop tick ends.
>>> * `loop_stop` : Executed when the loop stops.
>>
>> What do you call a "IO process" in this context?
>>
>> Regards
>>
>> Antoine.
>>
>>
>> _______________________________________________
>> Async-sig mailing list
>> Async-sig at python.org
>> https://mail.python.org/mailman/listinfo/async-sig
>> Code of Conduct: https://www.python.org/psf/codeofconduct/
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/


-- 
--pau

From pfreixes at gmail.com  Tue Jan  2 12:00:18 2018
From: pfreixes at gmail.com (Pau Freixes)
Date: Tue, 2 Jan 2018 18:00:18 +0100
Subject: [Async-sig] Asyncio loop instrumentation
In-Reply-To: <CAEfz+Tws=icpoBdT-Dq1nxFy0J_7mU9qyGN5an1JvVC_Bf4wvA@mail.gmail.com>
References: <CA+ULCcHMUkSm5qXKhsYOcFCWmjXeBcX_zbQUHyhwHXZD57Oz3g@mail.gmail.com>
 <20171231200247.755b5c42@fsol>
 <CA+ULCcFcMmwM-7muEKOW1=bCe47LuQOEudLSjD9sXOZczQruKQ@mail.gmail.com>
 <CAEfz+Tws=icpoBdT-Dq1nxFy0J_7mU9qyGN5an1JvVC_Bf4wvA@mail.gmail.com>
Message-ID: <CA+ULCcGzLM9s+=i6LnCOUgk5PZ7DNA=vfBkZTmM+8Q-SC5dgfQ@mail.gmail.com>

Agree, poll_start and poll_end suit much better.

Thanks for the feedback.

On Tue, Jan 2, 2018 at 1:34 AM, INADA Naoki <songofacandy at gmail.com> wrote:
>>>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>>>
>>>> * `loop_start` : Executed when the loop starts for the first time.
>>>> * `tick_start` : Executed when a new loop tick is started.
>>>> * `io_start` : Executed when a new IO process starts.
>>>> * `io_end` : Executed when the IO process ends.
>>>> * `tick_end` : Executed when the loop tick ends.
>>>> * `loop_stop` : Executed when the loop stops.
>>>
>>> What do you call a "IO process" in this context?
>>
>> Basically the call to the `select/poll/whatever` syscall that will ask
>> for read or write to a set of file descriptors.
>
> `select/poll/whatever` syscalls doesn't ask for read or write.
> It waits for read or write (more accurate, waits for readable or
> writable state).
>
> So poll_start / poll_end looks better name to me.
>
> INADA Naoki  <songofacandy at gmail.com>
>
>
>>
>> Thanks,
>>
>> --
>> --pau
>> _______________________________________________
>> Async-sig mailing list
>> Async-sig at python.org
>> https://mail.python.org/mailman/listinfo/async-sig
>> Code of Conduct: https://www.python.org/psf/codeofconduct/


-- 
--pau

From yselivanov at gmail.com  Tue Jan  2 12:46:04 2018
From: yselivanov at gmail.com (Yury Selivanov)
Date: Tue, 2 Jan 2018 20:46:04 +0300
Subject: [Async-sig] Asyncio loop instrumentation
In-Reply-To: <CA+ULCcGzLM9s+=i6LnCOUgk5PZ7DNA=vfBkZTmM+8Q-SC5dgfQ@mail.gmail.com>
References: <CA+ULCcHMUkSm5qXKhsYOcFCWmjXeBcX_zbQUHyhwHXZD57Oz3g@mail.gmail.com>
 <20171231200247.755b5c42@fsol>
 <CA+ULCcFcMmwM-7muEKOW1=bCe47LuQOEudLSjD9sXOZczQruKQ@mail.gmail.com>
 <CAEfz+Tws=icpoBdT-Dq1nxFy0J_7mU9qyGN5an1JvVC_Bf4wvA@mail.gmail.com>
 <CA+ULCcGzLM9s+=i6LnCOUgk5PZ7DNA=vfBkZTmM+8Q-SC5dgfQ@mail.gmail.com>
Message-ID: <C4C7E840-4C37-41E0-B3C6-4CEF9A8B1786@gmail.com>

I understand why it could be useful to have this in asyncio. But I'm big -1 on rushing this functionality in 3.7.

asyncio is no longer provisional, so we have to be careful when we design new APIs for it.

Example: I wanted to add support for Task groups to asyncio. A similar concept exists in curio and trio and I like it, it can be a big improvement over asyncio.gather. But there are too many caveats about handling multiple exceptions properly (MultiError?) and some issues with cancellation. That's why I decided that it's safer to prototype TaskGroups in a separate package, than to push a poorly thought out new API in 3.7.

Same applies to your proposal. You can easily publish a package on PyPI that provides an improved version of asyncio event loop. You won't even need to write a lot of code, just overload a few methods.

Yury

Sent from my iPhone

> On Jan 2, 2018, at 8:00 PM, Pau Freixes <pfreixes at gmail.com> wrote:
> 
> Agree, poll_start and poll_end suit much better.
> 
> Thanks for the feedback.
> 
> On Tue, Jan 2, 2018 at 1:34 AM, INADA Naoki <songofacandy at gmail.com> wrote:
>>>>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>>>> 
>>>>> * `loop_start` : Executed when the loop starts for the first time.
>>>>> * `tick_start` : Executed when a new loop tick is started.
>>>>> * `io_start` : Executed when a new IO process starts.
>>>>> * `io_end` : Executed when the IO process ends.
>>>>> * `tick_end` : Executed when the loop tick ends.
>>>>> * `loop_stop` : Executed when the loop stops.
>>>> 
>>>> What do you call a "IO process" in this context?
>>> 
>>> Basically the call to the `select/poll/whatever` syscall that will ask
>>> for read or write to a set of file descriptors.
>> 
>> `select/poll/whatever` syscalls doesn't ask for read or write.
>> It waits for read or write (more accurate, waits for readable or
>> writable state).
>> 
>> So poll_start / poll_end looks better name to me.
>> 
>> INADA Naoki  <songofacandy at gmail.com>
>> 
>> 
>>> 
>>> Thanks,
>>> 
>>> --
>>> --pau
>>> _______________________________________________
>>> Async-sig mailing list
>>> Async-sig at python.org
>>> https://mail.python.org/mailman/listinfo/async-sig
>>> Code of Conduct: https://www.python.org/psf/codeofconduct/
> 
> 
> 
> -- 
> --pau
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/

From njs at pobox.com  Thu Jan 11 05:09:29 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 11 Jan 2018 02:09:29 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
Message-ID: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>

Hi all,

Folks here might be interested in this new blog post:

https://vorpus.org/blog/timeouts-and-cancellation-for-humans/

It's a detailed discussion of pitfalls and design-tradeoffs in APIs
for timeout and cancellation, and has a proposal for handling them in
a more Pythonic way. Any feedback welcome!

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From dimaqq at gmail.com  Thu Jan 11 22:49:56 2018
From: dimaqq at gmail.com (Dima Tisnek)
Date: Fri, 12 Jan 2018 11:49:56 +0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
Message-ID: <CAGGBzX+Vv7NFJjd1MDd3TTNh04gzcaJK7UM-fgNcZXjf7hhfDg@mail.gmail.com>

Very nice read, Nathaniel.

The post left me wondering how cancel tokens interact or should
logically interact with async composition, for example:

with move_on_after(10):
    await someio.gather(a(), b(), c())

or

with move_on_after(10):
    await someio.first/race(a(), b(), c())

or

dataset = someio.Future(large_download(), move_on_after=9999)

task a:
    with move_on_after(10):
        use((await dataset)["a"])

task b:
    with move_on_after(10):
        use((await dataset)["b"])


On 11 January 2018 at 18:09, Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
>
> Folks here might be interested in this new blog post:
>
> https://vorpus.org/blog/timeouts-and-cancellation-for-humans/
>
> It's a detailed discussion of pitfalls and design-tradeoffs in APIs
> for timeout and cancellation, and has a proposal for handling them in
> a more Pythonic way. Any feedback welcome!
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/

From chris.jerdonek at gmail.com  Fri Jan 12 07:17:51 2018
From: chris.jerdonek at gmail.com (Chris Jerdonek)
Date: Fri, 12 Jan 2018 04:17:51 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
Message-ID: <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>

Thanks, Nathaniel. Very instructive, thought-provoking write-up!

One thing occurred to me around the time of reading this passage:

> "Once the cancel token is triggered, then all future operations on that token are cancelled, so the call to ws.close doesn't get stuck. It's a less error-prone paradigm. ... If you follow the path we did in this blog post, and start by thinking about applying a timeout to a complex operation composed out of multiple blocking calls, then it's obvious that if the first call uses up the whole timeout budget, then any future calls should fail immediately."

One case that's not clear how should be addressed is the following.
It's something I've wrestled with in the context of asyncio, and it
doesn't seem to be raised as a possibility in your write-up.

Say you have a complex operation that you want to be able to timeout
or cancel, but the process of cleanup / cancelling might also require
a certain amount of time that you'd want to allow time for (likely a
smaller time in normal circumstances). Then it seems like you'd want
to be able to allocate a separate timeout for the clean-up portion
(independent of the timeout allotted for the original operation).

It's not clear to me how this case would best be handled with the
primitives you described. In your text above ("then any future calls
should fail immediately"), without any changes, it seems there
wouldn't be "time" for any clean-up to complete.

With asyncio, one way to handle this is to await on a task with a
smaller timeout after calling task.cancel(). That lets you assign a
different timeout to waiting for cancellation to complete.

--Chris


On Thu, Jan 11, 2018 at 2:09 AM, Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
>
> Folks here might be interested in this new blog post:
>
> https://vorpus.org/blog/timeouts-and-cancellation-for-humans/
>
> It's a detailed discussion of pitfalls and design-tradeoffs in APIs
> for timeout and cancellation, and has a proposal for handling them in
> a more Pythonic way. Any feedback welcome!
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/

From njs at pobox.com  Sat Jan 13 05:32:49 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 13 Jan 2018 02:32:49 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAGGBzX+Vv7NFJjd1MDd3TTNh04gzcaJK7UM-fgNcZXjf7hhfDg@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAGGBzX+Vv7NFJjd1MDd3TTNh04gzcaJK7UM-fgNcZXjf7hhfDg@mail.gmail.com>
Message-ID: <CAPJVwB=968U-NP3KRxCtgy3sUcV1fRbC9Y2_OxrNKunggssAKg@mail.gmail.com>

On Thu, Jan 11, 2018 at 7:49 PM, Dima Tisnek <dimaqq at gmail.com> wrote:
> Very nice read, Nathaniel.
>
> The post left me wondering how cancel tokens interact or should
> logically interact with async composition, for example:
>
> with move_on_after(10):
>     await someio.gather(a(), b(), c())
>
> or
>
> with move_on_after(10):
>     await someio.first/race(a(), b(), c())
>
> or
>
> dataset = someio.Future(large_download(), move_on_after=9999)
>
> task a:
>     with move_on_after(10):
>         use((await dataset)["a"])
>
> task b:
>     with move_on_after(10):
>         use((await dataset)["b"])

It's funny you say "async composition"... Trio's concurrency primitive
(nurseries) is closely related to the core concurrency primitive in
Communicating Sequential Processes, which they call "parallel
composition". (Basically, if P and Q are processes, then "P || Q" is
the process that runs both P and Q in parallel and then finishes when
they've both finished.) If you were using that as your primitive, then
tasks would form an orderly tree and this wouldn't be a problem :-).

Given asyncio's actual primitives though, then yeah, this is clearly
the big question, and I doubt there are any simple answers; so far my
ambition has just been to articulate the problem well enough to start
that conversation (see also the "asyncio" section in the blog post).

One possibility might be a hybrid cancel token / cancel scope API:
create a first class cancel token API like C# has, enhance make the
low-level asyncio APIs to use them, and then on top of that add
mechanisms to attach a stack of implicitly-applied cancel tokens to
each task? That's just a vague handwave of an idea so far though.

Note that last case is the one where asyncio cancellation semantics
are already... well, surprising, anyway. If you cancel task a then
task b will receive a CancelledError, even though task a was not
cancelled. (I talked about this a bit in my "Some thoughts ..." blog
post; search for "spooky-cancellation-at-a-distance.py".)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From njs at pobox.com  Sun Jan 14 06:33:44 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 14 Jan 2018 03:33:44 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
Message-ID: <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>

On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
<chris.jerdonek at gmail.com> wrote:
> Thanks, Nathaniel. Very instructive, thought-provoking write-up!
>
> One thing occurred to me around the time of reading this passage:
>
>> "Once the cancel token is triggered, then all future operations on that token are cancelled, so the call to ws.close doesn't get stuck. It's a less error-prone paradigm. ... If you follow the path we did in this blog post, and start by thinking about applying a timeout to a complex operation composed out of multiple blocking calls, then it's obvious that if the first call uses up the whole timeout budget, then any future calls should fail immediately."
>
> One case that's not clear how should be addressed is the following.
> It's something I've wrestled with in the context of asyncio, and it
> doesn't seem to be raised as a possibility in your write-up.
>
> Say you have a complex operation that you want to be able to timeout
> or cancel, but the process of cleanup / cancelling might also require
> a certain amount of time that you'd want to allow time for (likely a
> smaller time in normal circumstances). Then it seems like you'd want
> to be able to allocate a separate timeout for the clean-up portion
> (independent of the timeout allotted for the original operation).
>
> It's not clear to me how this case would best be handled with the
> primitives you described. In your text above ("then any future calls
> should fail immediately"), without any changes, it seems there
> wouldn't be "time" for any clean-up to complete.
>
> With asyncio, one way to handle this is to await on a task with a
> smaller timeout after calling task.cancel(). That lets you assign a
> different timeout to waiting for cancellation to complete.

You can get these semantics using the "shielding" feature, which the
post discusses a bit later:

try:
    await do_some_stuff()
finally:
    # Always give this 30 seconds to clean up, even if we've
    # been cancelled
    with trio.move_on_after(30) as cscope:
        cscope.shield = True
        await do_cleanup()

Here the inner scope "hides" the code inside it from any external
cancel scopes, so it can continue executing even of the overall
context has been cancelled.

However, I think this is probably a code smell. Like all code smells,
there are probably cases where it's the right thing to do, but when
you see it you should stop and think carefully. If you're writing code
like this, then it means that there are multiple different layers in
your code that are implementing timeout policies, that might end up
fighting with each other. What if the caller really needs this to
finish in 15 seconds? So if you have some way to move the timeout
handling into the same layer, then I suspect that will make your
program easier to understand and maintain. OTOH, if you decide you
want it, the code above works :-). I'm not 100% sure here; I'd
definitely be interested to hear about more use cases.

One thing I've thought about that might help is adding a kind of "soft
cancelled" state to the cancel scopes, inspired by the "graceful
shutdown" mode that you'll often see in servers where you stop
accepting new connections, then try to finish up old ones (with some
time limit). So in this case you might mark 'do_some_stuff()' as being
cancelled immediately when we entered the 'soft cancel' phase, but let
the 'do_cleanup' code keep running until the grace period expired and
the region was hard-cancelled. This idea isn't fully baked yet though.
(There's some more mumbling about this at
https://github.com/python-trio/trio/issues/147.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From chris.jerdonek at gmail.com  Sun Jan 14 08:11:51 2018
From: chris.jerdonek at gmail.com (Chris Jerdonek)
Date: Sun, 14 Jan 2018 05:11:51 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
 <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
Message-ID: <CAOTb1weDoROpafJ2192XHTbWPJWYROpfaA17WbEHZBYFetmKWA@mail.gmail.com>

On Sun, Jan 14, 2018 at 3:33 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
> <chris.jerdonek at gmail.com> wrote:
>> Say you have a complex operation that you want to be able to timeout
>> or cancel, but the process of cleanup / cancelling might also require
>> a certain amount of time that you'd want to allow time for (likely a
>> smaller time in normal circumstances). Then it seems like you'd want
>> to be able to allocate a separate timeout for the clean-up portion
>> (independent of the timeout allotted for the original operation).
>> ...
>
> You can get these semantics using the "shielding" feature, which the
> post discusses a bit later:
> ...
> However, I think this is probably a code smell.

I agree with this assessment. My sense was that shielding could
probably do it, but it seems like it could be brittle or more of a
kludge. It would be nice if the same primitive could be used to
accommodate this and other variations in addition to the normal case.
For example, a related variation might be if you wanted to let
yourself extend the timeout in response to certain actions or results.

The main idea that occurs to me is letting the cancel scope be
dynamic: the timeout could be allowed to change in response to certain
things. Something like that seems like it has the potential to be both
simple as well as general enough to accommodate lots of different
scenarios, including adjusting the timeout in response to entering a
clean-up phase. One good test would be whether shielding could be
implemented using such a primitive.

--Chris

> Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully. If you're writing code
> like this, then it means that there are multiple different layers in
> your code that are implementing timeout policies, that might end up
> fighting with each other. What if the caller really needs this to
> finish in 15 seconds? So if you have some way to move the timeout
> handling into the same layer, then I suspect that will make your
> program easier to understand and maintain. OTOH, if you decide you
> want it, the code above works :-). I'm not 100% sure here; I'd
> definitely be interested to hear about more use cases.
>
> One thing I've thought about that might help is adding a kind of "soft
> cancelled" state to the cancel scopes, inspired by the "graceful
> shutdown" mode that you'll often see in servers where you stop
> accepting new connections, then try to finish up old ones (with some
> time limit). So in this case you might mark 'do_some_stuff()' as being
> cancelled immediately when we entered the 'soft cancel' phase, but let
> the 'do_cleanup' code keep running until the grace period expired and
> the region was hard-cancelled. This idea isn't fully baked yet though.
> (There's some more mumbling about this at
> https://github.com/python-trio/trio/issues/147.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org

From nbadger1 at gmail.com  Sun Jan 14 17:45:28 2018
From: nbadger1 at gmail.com (Nick Badger)
Date: Sun, 14 Jan 2018 14:45:28 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
 <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
Message-ID: <CABkA8JxWWa3=GjSB8YUmingYNottpsbUhoY8kXS+7jOt73d4fQ@mail.gmail.com>

>
> However, I think this is probably a code smell. Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully.


Huh. That's a really good point. But I'm not sure the source of the smell
is the code that needs the shield logic -- I think this might instead be
indicative of upstream code smell. Put a bit more concretely: if you're
writing a protocol for an unreliable network (and of course, every network
is unreliable), requiring a closure operation to transmit something over
that network is inherently problematic, because it inevitably leads to
multiple-stage timeouts or ungraceful shutdowns.

Clearly, changing anything upstream is out of scope here. So if the smell
is, in fact, "upwind", there's not really much you could do about that in
asyncio, Curio, Trio, etc, other than minimize the additional smell you
need to accommodate smelly protocols. Unfortunately, I'm not sure there's
any one approach to that problem that isn't application-specific.

Nick Badger
https://www.nickbadger.com

2018-01-14 3:33 GMT-08:00 Nathaniel Smith <njs at pobox.com>:

> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
> <chris.jerdonek at gmail.com> wrote:
> > Thanks, Nathaniel. Very instructive, thought-provoking write-up!
> >
> > One thing occurred to me around the time of reading this passage:
> >
> >> "Once the cancel token is triggered, then all future operations on that
> token are cancelled, so the call to ws.close doesn't get stuck. It's a less
> error-prone paradigm. ... If you follow the path we did in this blog post,
> and start by thinking about applying a timeout to a complex operation
> composed out of multiple blocking calls, then it's obvious that if the
> first call uses up the whole timeout budget, then any future calls should
> fail immediately."
> >
> > One case that's not clear how should be addressed is the following.
> > It's something I've wrestled with in the context of asyncio, and it
> > doesn't seem to be raised as a possibility in your write-up.
> >
> > Say you have a complex operation that you want to be able to timeout
> > or cancel, but the process of cleanup / cancelling might also require
> > a certain amount of time that you'd want to allow time for (likely a
> > smaller time in normal circumstances). Then it seems like you'd want
> > to be able to allocate a separate timeout for the clean-up portion
> > (independent of the timeout allotted for the original operation).
> >
> > It's not clear to me how this case would best be handled with the
> > primitives you described. In your text above ("then any future calls
> > should fail immediately"), without any changes, it seems there
> > wouldn't be "time" for any clean-up to complete.
> >
> > With asyncio, one way to handle this is to await on a task with a
> > smaller timeout after calling task.cancel(). That lets you assign a
> > different timeout to waiting for cancellation to complete.
>
> You can get these semantics using the "shielding" feature, which the
> post discusses a bit later:
>
> try:
>     await do_some_stuff()
> finally:
>     # Always give this 30 seconds to clean up, even if we've
>     # been cancelled
>     with trio.move_on_after(30) as cscope:
>         cscope.shield = True
>         await do_cleanup()
>
> Here the inner scope "hides" the code inside it from any external
> cancel scopes, so it can continue executing even of the overall
> context has been cancelled.
>
> However, I think this is probably a code smell. Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully. If you're writing code
> like this, then it means that there are multiple different layers in
> your code that are implementing timeout policies, that might end up
> fighting with each other. What if the caller really needs this to
> finish in 15 seconds? So if you have some way to move the timeout
> handling into the same layer, then I suspect that will make your
> program easier to understand and maintain. OTOH, if you decide you
> want it, the code above works :-). I'm not 100% sure here; I'd
> definitely be interested to hear about more use cases.
>
> One thing I've thought about that might help is adding a kind of "soft
> cancelled" state to the cancel scopes, inspired by the "graceful
> shutdown" mode that you'll often see in servers where you stop
> accepting new connections, then try to finish up old ones (with some
> time limit). So in this case you might mark 'do_some_stuff()' as being
> cancelled immediately when we entered the 'soft cancel' phase, but let
> the 'do_cleanup' code keep running until the grace period expired and
> the region was hard-cancelled. This idea isn't fully baked yet though.
> (There's some more mumbling about this at
> https://github.com/python-trio/trio/issues/147.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20180114/d2567b24/attachment.html>

From dimaqq at gmail.com  Sun Jan 14 21:33:20 2018
From: dimaqq at gmail.com (Dima Tisnek)
Date: Mon, 15 Jan 2018 10:33:20 +0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CABkA8JxWWa3=GjSB8YUmingYNottpsbUhoY8kXS+7jOt73d4fQ@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
 <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
 <CABkA8JxWWa3=GjSB8YUmingYNottpsbUhoY8kXS+7jOt73d4fQ@mail.gmail.com>
Message-ID: <CAGGBzXKq26sHVaoUtrMpAXkiPLdzbRC7hsqjuUgetU304r+9Sw@mail.gmail.com>

I suppose the websocket case ought to follow conventions similar to kernel
TCP API where `close` returns immediately but continues to send packets
behind the scenes. It could look something like this:


with move_on_after(10):
    await get_ws_message(url):

async def get_ws_message(url):
    async def close():
        if sock and sock.is_connected and ...:
            await sock.send(build_close_packet())
            await sock.recv()  # or something
        if sock:
            sock.close()

    sock = socket.socket()
    try:
        await sock.connect(url)
        data = sock.recv(...)
        return decode(data)
    finally:
        with move_on_after(30):
            someio.spawn_tak(close())


I believe the concern is more general than supporting "broken" protocols,
like websocket.

When someone writes `with move_on_after(N): a = await foo()` it can be
understood in two ways:

* perform foo for N seconds or else, or
* I want the result in N seconds or else

The latter doesn't imply that foo should be interrupted, only that caller
wishes to proceed without the result. It makes sense if the action involves
an unrelated, long-running process, where `foo()` is something like
`anext(some_async_generator)`.

Both solve the original concern, that caller should not block for more than
N.
I suppose one can be implemented in terms of the other.

Perhaps the latter is what `shield` should do? That is detach computation
as opposed to blocking the caller past caller's deadline?

What do you all think?


On Mon, 15 Jan 2018 at 6:45 AM, Nick Badger <nbadger1 at gmail.com> wrote:

> However, I think this is probably a code smell. Like all code smells,
>> there are probably cases where it's the right thing to do, but when
>> you see it you should stop and think carefully.
>
>
> Huh. That's a really good point. But I'm not sure the source of the smell
> is the code that needs the shield logic -- I think this might instead be
> indicative of upstream code smell. Put a bit more concretely: if you're
> writing a protocol for an unreliable network (and of course, every network
> is unreliable), requiring a closure operation to transmit something over
> that network is inherently problematic, because it inevitably leads to
> multiple-stage timeouts or ungraceful shutdowns.
>
> Clearly, changing anything upstream is out of scope here. So if the smell
> is, in fact, "upwind", there's not really much you could do about that in
> asyncio, Curio, Trio, etc, other than minimize the additional smell you
> need to accommodate smelly protocols. Unfortunately, I'm not sure there's
> any one approach to that problem that isn't application-specific.
>
>
> Nick Badger
> https://www.nickbadger.com
>
> 2018-01-14 3:33 GMT-08:00 Nathaniel Smith <njs at pobox.com>:
>
>> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
>> <chris.jerdonek at gmail.com> wrote:
>> > Thanks, Nathaniel. Very instructive, thought-provoking write-up!
>> >
>> > One thing occurred to me around the time of reading this passage:
>> >
>> >> "Once the cancel token is triggered, then all future operations on
>> that token are cancelled, so the call to ws.close doesn't get stuck. It's a
>> less error-prone paradigm. ... If you follow the path we did in this blog
>> post, and start by thinking about applying a timeout to a complex operation
>> composed out of multiple blocking calls, then it's obvious that if the
>> first call uses up the whole timeout budget, then any future calls should
>> fail immediately."
>> >
>> > One case that's not clear how should be addressed is the following.
>> > It's something I've wrestled with in the context of asyncio, and it
>> > doesn't seem to be raised as a possibility in your write-up.
>> >
>> > Say you have a complex operation that you want to be able to timeout
>> > or cancel, but the process of cleanup / cancelling might also require
>> > a certain amount of time that you'd want to allow time for (likely a
>> > smaller time in normal circumstances). Then it seems like you'd want
>> > to be able to allocate a separate timeout for the clean-up portion
>> > (independent of the timeout allotted for the original operation).
>> >
>> > It's not clear to me how this case would best be handled with the
>> > primitives you described. In your text above ("then any future calls
>> > should fail immediately"), without any changes, it seems there
>> > wouldn't be "time" for any clean-up to complete.
>> >
>> > With asyncio, one way to handle this is to await on a task with a
>> > smaller timeout after calling task.cancel(). That lets you assign a
>> > different timeout to waiting for cancellation to complete.
>>
>> You can get these semantics using the "shielding" feature, which the
>> post discusses a bit later:
>>
>> try:
>>     await do_some_stuff()
>> finally:
>>     # Always give this 30 seconds to clean up, even if we've
>>     # been cancelled
>>     with trio.move_on_after(30) as cscope:
>>         cscope.shield = True
>>         await do_cleanup()
>>
>> Here the inner scope "hides" the code inside it from any external
>> cancel scopes, so it can continue executing even of the overall
>> context has been cancelled.
>>
>> However, I think this is probably a code smell. Like all code smells,
>> there are probably cases where it's the right thing to do, but when
>> you see it you should stop and think carefully. If you're writing code
>> like this, then it means that there are multiple different layers in
>> your code that are implementing timeout policies, that might end up
>> fighting with each other. What if the caller really needs this to
>> finish in 15 seconds? So if you have some way to move the timeout
>> handling into the same layer, then I suspect that will make your
>> program easier to understand and maintain. OTOH, if you decide you
>> want it, the code above works :-). I'm not 100% sure here; I'd
>> definitely be interested to hear about more use cases.
>>
>> One thing I've thought about that might help is adding a kind of "soft
>> cancelled" state to the cancel scopes, inspired by the "graceful
>> shutdown" mode that you'll often see in servers where you stop
>> accepting new connections, then try to finish up old ones (with some
>> time limit). So in this case you might mark 'do_some_stuff()' as being
>> cancelled immediately when we entered the 'soft cancel' phase, but let
>> the 'do_cleanup' code keep running until the grace period expired and
>> the region was hard-cancelled. This idea isn't fully baked yet though.
>> (There's some more mumbling about this at
>> https://github.com/python-trio/trio/issues/147.)
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- https://vorpus.org
>> _______________________________________________
>> Async-sig mailing list
>> Async-sig at python.org
>> https://mail.python.org/mailman/listinfo/async-sig
>> Code of Conduct: https://www.python.org/psf/codeofconduct/
>>
>
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20180115/8774918b/attachment-0001.html>

From njs at pobox.com  Sun Jan 14 22:10:01 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 14 Jan 2018 19:10:01 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAOTb1weDoROpafJ2192XHTbWPJWYROpfaA17WbEHZBYFetmKWA@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
 <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
 <CAOTb1weDoROpafJ2192XHTbWPJWYROpfaA17WbEHZBYFetmKWA@mail.gmail.com>
Message-ID: <CAPJVwB=ULVxw09HsovHOFw8Y+czRK2a=py-hhA8RsmHMUHFgHQ@mail.gmail.com>

On Sun, Jan 14, 2018 at 5:11 AM, Chris Jerdonek
<chris.jerdonek at gmail.com> wrote:
> On Sun, Jan 14, 2018 at 3:33 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
>> <chris.jerdonek at gmail.com> wrote:
>>> Say you have a complex operation that you want to be able to timeout
>>> or cancel, but the process of cleanup / cancelling might also require
>>> a certain amount of time that you'd want to allow time for (likely a
>>> smaller time in normal circumstances). Then it seems like you'd want
>>> to be able to allocate a separate timeout for the clean-up portion
>>> (independent of the timeout allotted for the original operation).
>>> ...
>>
>> You can get these semantics using the "shielding" feature, which the
>> post discusses a bit later:
>> ...
>> However, I think this is probably a code smell.
>
> I agree with this assessment. My sense was that shielding could
> probably do it, but it seems like it could be brittle or more of a
> kludge. It would be nice if the same primitive could be used to
> accommodate this and other variations in addition to the normal case.
> For example, a related variation might be if you wanted to let
> yourself extend the timeout in response to certain actions or results.
>
> The main idea that occurs to me is letting the cancel scope be
> dynamic: the timeout could be allowed to change in response to certain
> things. Something like that seems like it has the potential to be both
> simple as well as general enough to accommodate lots of different
> scenarios, including adjusting the timeout in response to entering a
> clean-up phase. One good test would be whether shielding could be
> implemented using such a primitive.

Ah, if you want to change the timeout on a specific cancel scope, that's easy:

async def do_something():
    with move_on_after(10) as cscope:
        ...
        # Actually, let's give ourselves a bit more time
        cscope.deadline += 10
        ...

If you have a reference to a Trio cancel scope, you can change its
timeout at any time. However, this is different from shielding. The
code above only changes the deadline for that particular cancel scope.
If the caller sets their own timeout:

with move_on_after(15):
    await do_something()

then the code will still get cancelled after 15 seconds when the outer
cancel scope's deadline expires, even though the inner scope ended up
with a 20 second timeout.

Shielding is about disabling outer cancel scopes -- the ones you don't
know about! -- in a particular bit of code. (If you compare to C#'s
cancellation sources or Golang's context-based cancellation, it's like
writing a function that intentionally choose not to pass through the
cancel token it was given into some function it calls.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From njs at pobox.com  Sun Jan 14 23:52:19 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 14 Jan 2018 20:52:19 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CABkA8JxWWa3=GjSB8YUmingYNottpsbUhoY8kXS+7jOt73d4fQ@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
 <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
 <CABkA8JxWWa3=GjSB8YUmingYNottpsbUhoY8kXS+7jOt73d4fQ@mail.gmail.com>
Message-ID: <CAPJVwBng1DNayBSJ+1xPGvE7dgTj0DkQ87UBtComH5jQ3g+6Rg@mail.gmail.com>

On Sun, Jan 14, 2018 at 2:45 PM, Nick Badger <nbadger1 at gmail.com> wrote:
>> However, I think this is probably a code smell. Like all code smells,
>> there are probably cases where it's the right thing to do, but when
>> you see it you should stop and think carefully.
>
> Huh. That's a really good point. But I'm not sure the source of the smell is
> the code that needs the shield logic -- I think this might instead be
> indicative of upstream code smell. Put a bit more concretely: if you're
> writing a protocol for an unreliable network (and of course, every network
> is unreliable), requiring a closure operation to transmit something over
> that network is inherently problematic, because it inevitably leads to
> multiple-stage timeouts or ungraceful shutdowns.

I wouldn't go that far -- there are actually good reasons to design
protocols like this.

SSL/TLS is a protocol that has a "goodbye" message (they call it
"close-notify"). According to the spec [1], sending this is mandatory
if you want to cleanly shut down an SSL/TLS connection. Why? Well, say
I send you a message, "Should I buy more bitcoin?" and your reply is
"Yes, but only if the price drops below $XX". Unbeknownst to us, we're
being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter
what we're saying. But they can manipulate the network; for example,
they could cause our connection to drop after the first 3 bytes of
your message, so your answer gets truncated and I think you just said
"Yes" -- which is very different! But, close-notify saves us -- or at
least contains the damage. Since I know that you're supposed to send a
close-notify at the end of your connection, and I didn't get one, I
can tell that this is a truncated message. I can't tell what the rest
was going to be, but at least I know the message I got isn't the
message you intended to send. And an attacker can't forge a
close-notify message, because they're cryptographically authenticated
like all the data we send.

In websockets, the goodbye handshake is used to work around a nasty
case that can happen with common TCP stacks (like, all of them):

1. A sends a message to B.
2. A is done after that, so it closes the connection.
3. Just then, B sends a message to A, like maybe a regular ping on some timer.
4. A's TCP stack receives data on a closed connection, goes "huh
wut?", and sends a RST packet.
5. B goes to read the last message A sent before they closed the
connection... but whoops it's gone! the RST packet caused both TCP
stacks to wipe out all their buffered data associated with this
connection.

So if you have a protocol that's used for streaming indefinite amounts
of data in both directions and supports stuff like pings, you kind of
have to have a goodbye handshake to avoid TCP stacks accidentally
corrupting your data. (The goodbye handshake can also help make sure
that clients end up carrying CLOSE-WAIT states instead of servers, but
that's a finicky and less important issue.)

Of course, it is absolutely true that networks are unreliable, so when
your protocol specifies a goodbye handshake like this then
implementations still need to have some way to cope if their peer
closes the connection unexpectedly, and they may need to unilaterally
close the connection in some circumstances no matter what the spec
says. Correctly handling every possible case here quickly becomes,
like, infinitely complicated. But nonetheless, as a library author one
has to try to provide some reasonable behavior by default (while
knowing that some users will end up needing to tweak things to handle
special circumstances).

My tentative approach so far in Trio is (a) make cancellation stateful
like discussed in the blog post, because accidentally hanging forever
just can't be a good default, (b) in the "trio.abc.AsyncResource"
interface that complex objects like trio.SSLStream implement (and we
recommend libraries implement too), the semantics for the aclose and
__aexit__ methods are that they're allowed to block forever trying to
do a graceful shutdown, but if cancelled then they have to return
promptly *but still freeing any underlying resources*, possibly in a
non-graceful way. So if you write straightforward code like:

with trio.move_on_after(10):
    async with open_websocket_connection(...):
        ...

then it tries to do a proper websocket goodbye handshake by default,
but if the timeout expires then it gives up and immediately closes the
socket. It's not perfect, but it seems like a better default than
anything else I can think of.

-n

[1] There's also this whole mess where many SSL/TLS implementations
ignore the spec and don't bother sending close-notify. This is *kinda*
justifiable because the original and most popular use for SSL/TLS is
for wrapping HTTP connections, and HTTP has its own ways of signaling
the end of the connection that are already transmitted through the
encrypted tunnel, so the SSL/TLS end-of-connection handshake is
redundant. Therefore lots of implementations went ahead and ignored
the spec (including Python's ssl module!), so now if you're
implementing HTTPS you have to do the same for interoperability. But
the SSL/TLS spec can't assume you're using HTTP on top: it's contract
is basically "socket semantics, but cryptographically authenticated".
And close() is part of socket semantics, so it kind of has to make
close() cryptographically authenticated too. (trio.SSLStream handles
this by implementing the standard compliant behavior by default, but
you can pass https_compatible=True to the constructor to get the
HTTPS-style behavior.)

-- 
Nathaniel J. Smith -- https://vorpus.org

From nbadger1 at gmail.com  Mon Jan 15 01:08:26 2018
From: nbadger1 at gmail.com (Nick Badger)
Date: Sun, 14 Jan 2018 22:08:26 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAPJVwBng1DNayBSJ+1xPGvE7dgTj0DkQ87UBtComH5jQ3g+6Rg@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
 <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
 <CABkA8JxWWa3=GjSB8YUmingYNottpsbUhoY8kXS+7jOt73d4fQ@mail.gmail.com>
 <CAPJVwBng1DNayBSJ+1xPGvE7dgTj0DkQ87UBtComH5jQ3g+6Rg@mail.gmail.com>
Message-ID: <CABkA8Jy04pmkXCv51VaKLrD_Ds9kewqSY7hqy2LTF_3NkJLYHA@mail.gmail.com>

Quick preface: there are definitely times when code "smell" really isn't --
nothing's perfect! -- and sometimes some system component is unavoidably
inelegant. I think this is oftentimes (but not always) the result of
scoping: clearly I couldn't decide, as a library author, that "it's all
just broken" and rip out everything from OS to TCP to language syntax and
semantics just to make my API prettier. So I pragmatically downscope the
problem space, and it forces me to make design decisions to accommodate the
rest of the universe. And that's okay!

With that being said, I'm still not convinced that the
double-timeout-shutdown isn't an indication of upstream code smell. From a
practical standpoint, for the purposes of this discussion it really doesn't
matter; Trio et al can't go mucking about in the TCP stack internals, so we
do the best we can. But I'm willing to entertain the possibility (actually
I think it's highly likely) that there are better solutions to the
aforementioned problems than the ones used by (for example) TCP and TLS.
But that rabbit hole goes very, very deep, so to circle back, what I'm
trying to say is this:

   - I share the inclination that shielding against cancellation (or any
   equivalent workaround) is likely code smell
   - However, I personally suspect the source of that smell is upstream, in
   the network protocols themselves
   - Given that, I think some amount of smell in downstream libraries like
   Trio is unavoidable

To that end, I really like Trio's existing approach. Shielding should
definitely be used sparingly, but I think it's a justifiable, pragmatic
compromise when it comes to dealing with not-quite-perfect protocols on
even-less-perfect networks. And I think the connection close semantics Trio
provides for these situations -- attempt to close gracefully, but if
cancelled, still close unilaterally to free local resources -- is an
excellent approach. But it also "lucks out" a bit, because freeing local
resources is many orders of magnitude faster than the enclosing timeout is
likely to be, so it's effectively a "free" operation. The relative
timescales are a critical observation; if freeing local resources took one
second out of a ten-second timeout, I think you'd be stuck asking the same
question there, too.


Nick Badger
https://www.nickbadger.com

2018-01-14 20:52 GMT-08:00 Nathaniel Smith <njs at pobox.com>:

> On Sun, Jan 14, 2018 at 2:45 PM, Nick Badger <nbadger1 at gmail.com> wrote:
> >> However, I think this is probably a code smell. Like all code smells,
> >> there are probably cases where it's the right thing to do, but when
> >> you see it you should stop and think carefully.
> >
> > Huh. That's a really good point. But I'm not sure the source of the
> smell is
> > the code that needs the shield logic -- I think this might instead be
> > indicative of upstream code smell. Put a bit more concretely: if you're
> > writing a protocol for an unreliable network (and of course, every
> network
> > is unreliable), requiring a closure operation to transmit something over
> > that network is inherently problematic, because it inevitably leads to
> > multiple-stage timeouts or ungraceful shutdowns.
>
> I wouldn't go that far -- there are actually good reasons to design
> protocols like this.
>
> SSL/TLS is a protocol that has a "goodbye" message (they call it
> "close-notify"). According to the spec [1], sending this is mandatory
> if you want to cleanly shut down an SSL/TLS connection. Why? Well, say
> I send you a message, "Should I buy more bitcoin?" and your reply is
> "Yes, but only if the price drops below $XX". Unbeknownst to us, we're
> being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter
> what we're saying. But they can manipulate the network; for example,
> they could cause our connection to drop after the first 3 bytes of
> your message, so your answer gets truncated and I think you just said
> "Yes" -- which is very different! But, close-notify saves us -- or at
> least contains the damage. Since I know that you're supposed to send a
> close-notify at the end of your connection, and I didn't get one, I
> can tell that this is a truncated message. I can't tell what the rest
> was going to be, but at least I know the message I got isn't the
> message you intended to send. And an attacker can't forge a
> close-notify message, because they're cryptographically authenticated
> like all the data we send.
>
> In websockets, the goodbye handshake is used to work around a nasty
> case that can happen with common TCP stacks (like, all of them):
>
> 1. A sends a message to B.
> 2. A is done after that, so it closes the connection.
> 3. Just then, B sends a message to A, like maybe a regular ping on some
> timer.
> 4. A's TCP stack receives data on a closed connection, goes "huh
> wut?", and sends a RST packet.
> 5. B goes to read the last message A sent before they closed the
> connection... but whoops it's gone! the RST packet caused both TCP
> stacks to wipe out all their buffered data associated with this
> connection.
>
> So if you have a protocol that's used for streaming indefinite amounts
> of data in both directions and supports stuff like pings, you kind of
> have to have a goodbye handshake to avoid TCP stacks accidentally
> corrupting your data. (The goodbye handshake can also help make sure
> that clients end up carrying CLOSE-WAIT states instead of servers, but
> that's a finicky and less important issue.)
>
> Of course, it is absolutely true that networks are unreliable, so when
> your protocol specifies a goodbye handshake like this then
> implementations still need to have some way to cope if their peer
> closes the connection unexpectedly, and they may need to unilaterally
> close the connection in some circumstances no matter what the spec
> says. Correctly handling every possible case here quickly becomes,
> like, infinitely complicated. But nonetheless, as a library author one
> has to try to provide some reasonable behavior by default (while
> knowing that some users will end up needing to tweak things to handle
> special circumstances).
>
> My tentative approach so far in Trio is (a) make cancellation stateful
> like discussed in the blog post, because accidentally hanging forever
> just can't be a good default, (b) in the "trio.abc.AsyncResource"
> interface that complex objects like trio.SSLStream implement (and we
> recommend libraries implement too), the semantics for the aclose and
> __aexit__ methods are that they're allowed to block forever trying to
> do a graceful shutdown, but if cancelled then they have to return
> promptly *but still freeing any underlying resources*, possibly in a
> non-graceful way. So if you write straightforward code like:
>
> with trio.move_on_after(10):
>     async with open_websocket_connection(...):
>         ...
>
> then it tries to do a proper websocket goodbye handshake by default,
> but if the timeout expires then it gives up and immediately closes the
> socket. It's not perfect, but it seems like a better default than
> anything else I can think of.
>
> -n
>
> [1] There's also this whole mess where many SSL/TLS implementations
> ignore the spec and don't bother sending close-notify. This is *kinda*
> justifiable because the original and most popular use for SSL/TLS is
> for wrapping HTTP connections, and HTTP has its own ways of signaling
> the end of the connection that are already transmitted through the
> encrypted tunnel, so the SSL/TLS end-of-connection handshake is
> redundant. Therefore lots of implementations went ahead and ignored
> the spec (including Python's ssl module!), so now if you're
> implementing HTTPS you have to do the same for interoperability. But
> the SSL/TLS spec can't assume you're using HTTP on top: it's contract
> is basically "socket semantics, but cryptographically authenticated".
> And close() is part of socket semantics, so it kind of has to make
> close() cryptographically authenticated too. (trio.SSLStream handles
> this by implementing the standard compliant behavior by default, but
> you can pass https_compatible=True to the constructor to get the
> HTTPS-style behavior.)
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20180114/c6af70ba/attachment-0001.html>

From solipsis at pitrou.net  Mon Jan 15 19:41:36 2018
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 16 Jan 2018 01:41:36 +0100
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
Message-ID: <20180116014136.1985193e@fsol>


Hi,

On Thu, 11 Jan 2018 02:09:29 -0800
Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
> 
> Folks here might be interested in this new blog post:
> 
> https://vorpus.org/blog/timeouts-and-cancellation-for-humans/
> 
> It's a detailed discussion of pitfalls and design-tradeoffs in APIs
> for timeout and cancellation, and has a proposal for handling them in
> a more Pythonic way. Any feedback welcome!

I have little constructive feedback to share, other than it is a very
insightful write-up and the API proposal there is quite interesting.

cheers,

Antoine.


From njs at pobox.com  Tue Jan 16 04:56:33 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 16 Jan 2018 01:56:33 -0800
Subject: [Async-sig] Blog post: Timeouts and cancellation for humans
In-Reply-To: <CAGGBzXKq26sHVaoUtrMpAXkiPLdzbRC7hsqjuUgetU304r+9Sw@mail.gmail.com>
References: <CAPJVwBk12EFM5Hao94dc8FDc0caGFLuZEF4uReJsN8a1PERGfw@mail.gmail.com>
 <CAOTb1wcY8VbRo4BNH36tYW6k9abirDypdonCBUxXmj38C7oeLA@mail.gmail.com>
 <CAPJVwBnU7PLs2_JAhdK4NN+TFuZJH0O=XUXWdwxUpsKbtdiynA@mail.gmail.com>
 <CABkA8JxWWa3=GjSB8YUmingYNottpsbUhoY8kXS+7jOt73d4fQ@mail.gmail.com>
 <CAGGBzXKq26sHVaoUtrMpAXkiPLdzbRC7hsqjuUgetU304r+9Sw@mail.gmail.com>
Message-ID: <CAPJVwBmHNpDNG=cv2rFQ1M8nd+H06+BsY7H-==MF3DgzG+ZiKg@mail.gmail.com>

On Sun, Jan 14, 2018 at 6:33 PM, Dima Tisnek <dimaqq at gmail.com> wrote:
> Perhaps the latter is what `shield` should do? That is detach computation as
> opposed to blocking the caller past caller's deadline?

Well, it can't do that in trio :-). One of trio's core design
principles is: no detached processes.

And even if you don't think detached processes are inherently a bad
idea, I don't think they're what you'd want in this case anyway. If
your socket shutdown code has frozen, you want to kill it and close
the socket, not move it into the background where it can hang around
indefinitely wasting resources.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org