[Web-SIG] Server-side async API implementation sketches

Sun Jan 9 19:09:28 CET 2011

09.01.2011 19:03, P.J. Eby kirjoitti:
> At 06:06 AM 1/9/2011 +0200, Alex Grönholm wrote:
>> A new feature here is that the application itself yields a (status, 
>> headers) tuple and then chunks of the body (or futures).
>
> Hm.  I'm not sure if I like that.  The typical app developer really 
> shouldn't be yielding multiple body strings in the first place.  I 
> much prefer that the canonical example of a WSGI app just return a 
> list with a single bytestring -- preferably in a single statement for 
> the entire return operation, whether it's a yield or a return.
Uh, so don't yield multiple body strings then? How is that so difficult?
>
>
> IOW, I want it to look like the normal way to do thing is to just 
> return the whole request at once, and use the additional difficulty of 
> creating a second iterator to discourage people writing iterated 
> bodies when they should just write everything to a BytesIO and be done 
> with it.
I fail to understand why a second iterator is necessary when we can get 
away with just one.
>
>
> Also, it makes middleware simpler: the last line can just yield the 
> result of calling the app, or a modified version, i.e.:
>
>     yield app(environ)
>
> or:
>
>     s, h, b = app(environ)
>     # ... modify or replace s, h, b
>     yield s, h, b
Asynchronous applications may not be ready to send the status line as 
the first thing coming out of the generator. Consider an app that 
receives a file. The first thing coming out of the app is a future. The 
app needs to receive the entire file until it can determine what status 
line to send. Maybe there was an I/O error writing the file, so it needs 
to send a 500 response instead of 200. This is not possible with a body 
iterator, and if we are already iterating the application generator, I 
really don't understand why the body needs to be an iterator as well.
>
>
> In your approach, the above samples have to be rewritten as:
>
>     return app(environ)
>
> or:
>
>     result = app(environ)
>     s, h = yield result
>     # ... modify or replace s, h
>     yield s, h
>
>     for data in result:
>          # modify b as we go
>          yield result
>
> Only that last bit doesn't actually work, because you have to be able 
> to send future results back *into* the result.  Try actually making 
> some code that runs on this protocol and yields to futures during the 
> body iteration.
Did you miss the gist posted by myself (and improved by Alice)?
>
> Really, this modified protocol can't work with a full async API the 
> way my coroutine-based version does, AND the middleware is much more 
> complicated.  In my version, your do-nothing middleware looks like this:
>
>
> class NullMiddleware(object):
>     def __init__(self, app):
>         self.app = app
>
>     def __call__(environ):
>         # ACTION: pre-application environ mangling
>
>         s, h, body = yield self.app(environ)
>
>         # modify or replace s, h, body here
>
>         yield s, h, body
>
>
> If you want to actually process the body in some way, it looks like:
>
> class NullMiddleware(object):
>
>     def __init__(self, app):
>         self.app = app
>
>     def __call__(environ):
>         # ACTION: pre-application environ mangling
>
>         s, h, body = yield self.app(environ)
>
>         # modify or replace s, h, body here
>
>         yield s, h, self.process(body)
>
>     def process(self, body_iter):
>         while True:
>             chunk = yield body_iter
>             if chunk is None:
>                 break
>             # process/modify chunk here
>             yield chunk
>
> And that's still a lot simpler than your sketch.
>
> Personally, I would write both of the above as:
>
>     def null_middleware(app):
>
>         def wrapped(environ):
>             # ACTION: pre-application environ mangling
>             s, h, body = yield app(environ)
>
>             # modify or replace s, h, body here
>             yield s, h, process(body)
>
>         def process(body_iter):
>             while True:
>                 chunk = yield body_iter
>                 if chunk is None:
>                     break
>                 # process/modify chunk here
>                 yield chunk
>
>         return wrapped
>
> But that's just personal taste.  Even as a class, it's much easier to 
> write.  The above middleware pattern works with the sketches I gave on 
> the PEAK wiki, and I've now updated the wiki to include an example app 
> and middleware for clarity.
>
> Really, the only hole in this approach is dealing with applications 
> that block.  The elephant in the room here is that while it's easy to 
> write these example applications so they don't block, in practice 
> people read files and do database queries and whatnot in their 
> requests, and those APIs are generally synchronous.  So, unless they 
> somehow fold their entire application into a future, it doesn't work.
>
>
>> I liked the idea of having a separate async_read() method in 
>> wsgi.input, which would set the underlying socket in nonblocking mode 
>> and return a future. The event loop would watch the socket and read 
>> data into a buffer and trigger the callback when the given amount of 
>> data has been read. Conversely, .read() would set the socket in 
>> blocking mode. What kinds of problems would this cause?
>
> That you could never *call* the .read() method outside of a future, or 
> else you would block the server, thereby obliterating the point of 
> having the async API in the first place.
>
Outside of the application/middleware you mean? I hope there isn't any 
more confusion left about what a future is. The fact is that you cannot 
use synchronous API calls directly from an async app no matter what. 
Some workaround is always necessary.