[Async-sig] Blog post: Timeouts and cancellation for humans

Nick Badger nbadger1 at gmail.com
Sun Jan 14 17:45:28 EST 2018


>
> However, I think this is probably a code smell. Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully.


Huh. That's a really good point. But I'm not sure the source of the smell
is the code that needs the shield logic -- I think this might instead be
indicative of upstream code smell. Put a bit more concretely: if you're
writing a protocol for an unreliable network (and of course, every network
is unreliable), requiring a closure operation to transmit something over
that network is inherently problematic, because it inevitably leads to
multiple-stage timeouts or ungraceful shutdowns.

Clearly, changing anything upstream is out of scope here. So if the smell
is, in fact, "upwind", there's not really much you could do about that in
asyncio, Curio, Trio, etc, other than minimize the additional smell you
need to accommodate smelly protocols. Unfortunately, I'm not sure there's
any one approach to that problem that isn't application-specific.

Nick Badger
https://www.nickbadger.com

2018-01-14 3:33 GMT-08:00 Nathaniel Smith <njs at pobox.com>:

> On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
> <chris.jerdonek at gmail.com> wrote:
> > Thanks, Nathaniel. Very instructive, thought-provoking write-up!
> >
> > One thing occurred to me around the time of reading this passage:
> >
> >> "Once the cancel token is triggered, then all future operations on that
> token are cancelled, so the call to ws.close doesn't get stuck. It's a less
> error-prone paradigm. ... If you follow the path we did in this blog post,
> and start by thinking about applying a timeout to a complex operation
> composed out of multiple blocking calls, then it's obvious that if the
> first call uses up the whole timeout budget, then any future calls should
> fail immediately."
> >
> > One case that's not clear how should be addressed is the following.
> > It's something I've wrestled with in the context of asyncio, and it
> > doesn't seem to be raised as a possibility in your write-up.
> >
> > Say you have a complex operation that you want to be able to timeout
> > or cancel, but the process of cleanup / cancelling might also require
> > a certain amount of time that you'd want to allow time for (likely a
> > smaller time in normal circumstances). Then it seems like you'd want
> > to be able to allocate a separate timeout for the clean-up portion
> > (independent of the timeout allotted for the original operation).
> >
> > It's not clear to me how this case would best be handled with the
> > primitives you described. In your text above ("then any future calls
> > should fail immediately"), without any changes, it seems there
> > wouldn't be "time" for any clean-up to complete.
> >
> > With asyncio, one way to handle this is to await on a task with a
> > smaller timeout after calling task.cancel(). That lets you assign a
> > different timeout to waiting for cancellation to complete.
>
> You can get these semantics using the "shielding" feature, which the
> post discusses a bit later:
>
> try:
>     await do_some_stuff()
> finally:
>     # Always give this 30 seconds to clean up, even if we've
>     # been cancelled
>     with trio.move_on_after(30) as cscope:
>         cscope.shield = True
>         await do_cleanup()
>
> Here the inner scope "hides" the code inside it from any external
> cancel scopes, so it can continue executing even of the overall
> context has been cancelled.
>
> However, I think this is probably a code smell. Like all code smells,
> there are probably cases where it's the right thing to do, but when
> you see it you should stop and think carefully. If you're writing code
> like this, then it means that there are multiple different layers in
> your code that are implementing timeout policies, that might end up
> fighting with each other. What if the caller really needs this to
> finish in 15 seconds? So if you have some way to move the timeout
> handling into the same layer, then I suspect that will make your
> program easier to understand and maintain. OTOH, if you decide you
> want it, the code above works :-). I'm not 100% sure here; I'd
> definitely be interested to hear about more use cases.
>
> One thing I've thought about that might help is adding a kind of "soft
> cancelled" state to the cancel scopes, inspired by the "graceful
> shutdown" mode that you'll often see in servers where you stop
> accepting new connections, then try to finish up old ones (with some
> time limit). So in this case you might mark 'do_some_stuff()' as being
> cancelled immediately when we entered the 'soft cancel' phase, but let
> the 'do_cleanup' code keep running until the grace period expired and
> the region was hard-cancelled. This idea isn't fully baked yet though.
> (There's some more mumbling about this at
> https://github.com/python-trio/trio/issues/147.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20180114/d2567b24/attachment.html>


More information about the Async-sig mailing list