[Python-ideas] asyncore: included batteries don't fit

Tue Oct 9 07:12:51 CEST 2012

On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <guido at python.org> wrote:
>> It's a Future constructor, a (conditional) add_done_callback, plus the
>> calls to set_result or set_exception and the with statement for error
>> handling.  In full:
>>
>> def future_wrap(f):
>>     @functools.wraps(f)
>>     def wrapper(*args, **kwargs):
>>         future = Future()
>>         if kwargs.get('callback') is not None:
>>             future.add_done_callback(kwargs.pop('callback'))
>>         kwargs['callback'] = future.set_result
>>         def handle_error(typ, value, tb):
>>             future.set_exception(value)
>>             return True
>>         with ExceptionStackContext(handle_error):
>>             f(*args, **kwargs)
>>         return future
>>     return wrapper
>
> Hmm... I *think* it automatically adds a special keyword 'callback' to
> the *call* site so that you can do things like
>
>   fut = some_wrapped_func(blah, callback=my_callback)
>
> and then instead of using yield to wait for the callback, put the
> continuation of your code in the my_callback() function.

Yes.  Note that if you're passing in a callback you're probably going
to just ignore the return value.  The callback argument and the future
return value are essentially two alternative interfaces; it probably
doesn't make sense to use both at once (but as a library author it's
useful to provide both).

> But it also
> seems like it passes callback=future.set_result as the callback to the
> wrapped function, which looks to me like that function was apparently
> written before Futures were widely used. This seems pretty impure to
> me and I'd like to propose a "future" where such functions either be
> given the Future where the result is expected, or (more commonly) the
> function would create the Future itself.

Yes, it's impure and based on pre-Future patterns.  The caller's
callback argument and the inner function's callback not really related
any more (they were the same in pre-Future async code of course).
They should probably have different names, although if the inner
function's return value were passed via exception (StopIteration or
return) the inner callback argument can just go away.

>
> Unless I'm totally missing the programming model here.
>
> PS. I'd like to learn more about ExceptionStackContext() -- I've
> struggled somewhat with getting decent tracebacks in NDB.

StackContext doesn't quite give you better tracebacks, although I
think it could be adapted to do that.  ExceptionStackContext is
essentially a try/except block that follows you around across
asynchronous operations - on entry it sets a thread-local state, and
all the tornado asynchronous functions know to save this state when
they are passed a callback, and restore it when they execute it.  This
has proven to be extremely helpful in ensuring that all exceptions get
caught by something that knows how to do the appropriate cleanup (i.e.
an asynchronous web page serves an error instead of just spinning
forever), although it has turned out to be a little more intrusive and
magical than I had originally anticipated.

https://github.com/facebook/tornado/blob/master/tornado/stack_context.py

>
>>>> In Tornado the Future is created by a decorator
>>>> and hidden from the asynchronous function (it just sees the callback),
>>>
>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>>> to make Futures work, and most code (including large swaths of
>>> internal code) uses Futures. I think NDB is similar to monocle here.
>>> In NDB, you can do
>>>
>>>   f = <some function returning a Future>
>>>   r = yield f
>>>
>>> where "yield f" is mostly equivalent to f.result(), except it gives
>>> better opportunity for concurrency.
>>
>> Yes, tornado's gen.engine does the same thing here.  However, the
>> stakes are higher than "better opportunity for concurrency" - in an
>> event loop if you call future.result() without yielding, you'll
>> deadlock if that Future's task needs to run on the same event loop.
>
> That would depend on the semantics of the event loop implementation.
> In NDB's event loop, such a .result() call would just recursively
> enter the event loop, and you'd only deadlock if you actually have two
> pieces of code waiting for each other's completion.

Hmm, I think I'd rather deadlock. :)  If the event loop is reentrant
then the application code has be coded defensively as if it were
preemptively multithreaded, which introduces the possibility of
deadlock or (probably) more subtle/less frequent errors.  Reentrancy
has been a significant problem in my experience, so I've been moving
towards a policy where methods in Tornado that take a callback never
run it immediately; callbacks are always scheduled on the next
iteration of the IOLoop with IOLoop.add_callback.

>
> [...]
>>> I am currently trying to understand if using "yield from" (and
>>> returning a value from a generator) will simplify things. For example
>>> maybe the need for a special decorator might go away. But I keep
>>> getting headaches -- perhaps there's a Monad involved. :-)
>>
>> I think if you build generator handling directly into the event loop
>> and use "yield from" for calls from one async function to another then
>> you can get by without any decorators.  But I'm not sure if you can do
>> that and maintain any compatibility with existing non-generator async
>> code.
>>
>> I think the ability to return from a generator is actually a bigger
>> deal than "yield from" (and I only learned about it from another
>> python-ideas thread today).  The only reason a generator decorated
>> with @tornado.gen.engine needs a callback passed in to it is to act as
>> a psuedo-return, and a real return would prevent the common mistake of
>> running the callback then falling through to the rest of the function.
>
> Ah, so you didn't come up with the clever hack of raising an exception
> to signify the return value. In NDB, you raise StopIteration (though
> it is given the alias 'Return' for clarity) with an argument, and the
> wrapper code that is responsible for the Future takes the value from
> the StopIteration exception and passes it to the Future's
> set_result().

I think I may have thought about "raise Return(x)" and dismissed it as
too weird.  But then, I'm abnormally comfortable with asynchronous
code that passes callbacks around.

>
>> For concreteness, here's a crude sketch of what the APIs I'm talking
>> about would look like in use (in a hypothetical future version of
>> tornado).
>>
>> @future_wrap
>> @gen.engine
>> def async_http_client(url, callback):
>>     parsed_url = urlparse.urlsplit(url)
>>     # works the same whether the future comes from a thread pool or @future_wrap
>
> And you need the thread pool because there's no async version of
> getaddrinfo(), right?

Right.

>
>>     addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>>     stream = IOStream(socket.socket())
>>     yield stream.connect((addrinfo[0][-1]))
>>     stream.write('GET %s HTTP/1.0' % parsed_url.path)
>
> Why no yield in front of the write() call?

Because we don't need to wait for the write to complete before we
continue to the next statement.  write() doesn't return anything; it
just succeeds or fails, and if it fails the next read_until will fail
too. (although in this case it wouldn't hurt to have the yield either)

>
>>     header_data = yield stream.read_until('\r\n\r\n')
>>     headers = parse_headers(header_data)
>>     body_data = yield stream.read_bytes(int(headers['Content-Length']))
>>     stream.close()
>>     callback(body_data)
>>
>> # another function to demonstrate composability
>> @future_wrap
>> @gen.engine
>> def fetch_some_urls(url1, url2, url3, callback):
>>     body1 = yield async_http_client(url1)
>>     # yield a list of futures for concurrency
>>     future2 = yield async_http_client(url2)
>>     future3 = yield async_http_client(url3)
>>     body2, body3 = yield [future2, future3]
>>     callback((body1, body2, body3))
>
> This second one is nearly identical to the way we it's done in NDB.
> However I think you have a typo -- I doubt that there should be yields
> on the lines creating future2 and future3.

Right.

>
>> One hole in this design is how to deal with callbacks that are run
>> multiple times.  For example, the IOStream read methods take both a
>> regular callback and an optional streaming_callback (which is called
>> with each chunk of data as it arrives).  I think this needs to be
>> modeled as something like an iterator of Futures, but I haven't worked
>> out the details yet.
>
> Ah. Yes, that's a completely different kind of thing, and probably
> needs to be handled in a totally different way. I think it probably
> needs to be modeled more like an infinite loop where at the blocking
> point (e.g. a low-level read() or accept() call) you yield a Future.
> Although I can see that this doesn't work well with the IOLoop's
> concept of file descriptor (or other event source) registration.

It works just fine at the IOLoop level:  you call
IOLoop.add_handler(fd, func, READ), and you'll get read events
whenever there's new data until you call remove_handler(fd) (or
update_handler).  If you're passing callbacks around explicitly it's
pretty straightforward (as much as anything ever is in that style) to
allow for those callbacks to be run more than once.  The problem is
that generators more or less require that each callback be run exactly
once.  That's a generally desirable property, but the mismatch between
the two layers can be difficult to deal with.

-Ben