[Async-sig] Asynchronous cleanup is a problem

Wed Jul 6 15:54:47 EDT 2016

On Wed, Jul 6, 2016 at 6:42 AM, Cory Benfield <cory at lukasa.co.uk> wrote:
>
>> On 6 Jul 2016, at 13:09, David Beazley <dave at dabeaz.com> wrote:
>>
>> Curio uses asynchronous context managers for much more than closing sockets (which frankly is the least interesting thing).   For example, they're used extensively with synchronization primitives such as Locks, Semaphores, Events, Queues, and other such things.   The ability to use coroutines in the __aexit__() method is an essential part of these primitives because it allows task scheduling decisions to be made in conjunction with synchronization events such as lock releases.   For example, you can implement fair-locking or various forms of priority scheduling.   Curio also uses asynchronous context managers for timeouts and other related functionality where coroutines have to be used in __aexit__.  I would expect coroutines in __aexit__ to also be useful in more advanced contexts such as working with databases, dealing with transactions, and other kinds of processing where asynchronous I/O might be involved.
>
> For my own edification, Dave, do you mind if I dive into this a little bit? My suspicion is that this problem is rather unique to curio (or, at least, does not so strongly effect event loop implementations), and I’d just like to get a handle on it. It’s important to me that we don’t blow away curio, so where it differs from event loops I’d like to understand what it’s doing.

The relevant difference between curio and asyncio here is that in
asyncio, there are two different mechanisms for accessing the event
loop: for some operations, you access it through its coroutine runner
interface using 'await', and for other operations, you get a direct
reference to it through some side-channel (loop= arguments, global
lookups) and then make direct method calls. To make the latter work,
asyncio code generally has to be written in a way that makes sure
loop= objects are always passed throughout the whole call stack, and
always stay in sync [no pun intended] with the coroutine runner
object. This has a number of downsides, but one upside is that it
means that the loop object is available from __del__, where the
coroutine runner isn't.

Curio takes the other approach, of standardizing on 'await' as the
single blessed mechanism for accessing the event loop (or kernel or
whatever you want to call it). So this eliminates all the tiresome
loop= tracking and the potential for out-of-sync bugs, but it means
you can't do *anything* event-loop-related from __del__.

However, I'm not convinced that asyncio really has an advantage here.
Imagine some code that uses asyncio internally, but exposes a
synchronous wrapper to the outside (e.g. as proposed here ;-) [1]):

def synchronous_wrapper(...):
    # Use a new loop since we might not be in the same thread as the global loop
    loop = asyncio.new_event_loop()
    return loop.run_until_complete(work_asynchronously(..., loop=loop))

async def work_asynchronously(..., loop=loop):
    stream = await get_asyncio_stream_writer(..., loop=loop)
    stream.write(b"hello")

stream.write(...) queues some data to be sent, but doesn't send it.
Then stream falls out of scope, which triggers a call to
stream.__del__, which calls stream.close(), which does some calls on
loop requesting that it flush the buffer and then close the underlying
socket. So far so good.

...but then immediately after this, the loop itself falls out of
scope, and you lose. AFAICT from a quick skim of the asyncio code, the
data will not be sent and the socket will not be closed (depending on
kernel buffering etc.).

(And even if you wrap everything with proper 'with' blocks, this is
still true... asyncio event loops don't seem to have any API for
"complete all work and then shut down"? Maybe I'm just missing it --
if not then this is possibly a rather serious bug in actual
currently-existing asyncio. But for the present purposes, the point is
that you really do need something like 'async with' around everything
here to force the I/O to complete before handing things over to the gc
-- you can't rely on the gc to do your I/O.)

-n

[1] https://github.com/kennethreitz/requests/issues/1390#issuecomment-225361421

-- 
Nathaniel J. Smith -- https://vorpus.org