[Web-SIG] WSGI2: write callable?

Sat Sep 27 05:44:56 CEST 2014

On Fri, Sep 26, 2014 at 7:41 PM, Robert Collins
<robertc at robertcollins.net> wrote:
> One thing we could do with the status code in the headers dict is to
> default to 200 - the vastly common case (in the same way that throwing
> an error generates a 500). Then status wouldn't be required at all for
> trivial uses. That would make things easier, no?

At the cost of variation.  A core design principle of WSGI is that
variations make things *harder*, not easier, because it means more
alternatives that apps, servers, and middleware have to support, with
more code paths and fewer of them properly tested.  Every variation
that is part of the spec (as opposed to an extension), creates a LOT
of complexity in the field.  (Which is one reason it'll be nice to get
rid of start_response(), and all its convoluted sequencing logics.)

> So a classic example for Trailers is digitally signing streamed
> content. Using the same strawman API as above:
>
> def app(environ):
>    yield {':status': '200}
>    md5sum = md5.new()
>    for bytes in block_reader(open('foo', 'rb'), 65536):
>        md5sum.update(bytes)
>        yield bytes
>    digest = md5sum.hexdigest()
>    signature = sign_bytes(digest.encode('utf8'))
>    yield {'Content-MD5Sum': digest, 'X-Signature': signature}
>
> Note that this doesn't need to buffer or use a closure.

Please bear in mind that another core WSGI design principle is that we
don't make apps easier to write by making servers and middleware
harder to write.  That kills adoption and growth, because the audience
that *needs* to adopt WSGI (or any successor standard) is the audience
of people who write servers and middleware.  If a feature is sinfully
ugly for the app writer, but a thing of beauty for a middleware
author, we *want* that feature.

Conversely, if a feature means that *every* piece of middleware now
has to add an extra "if" statement to support the feature in order to
make it pretty for the app writer, then we do NOT want that feature,
and it should be taken out and shot *at once*.

It's not a fair tradeoff, because only server authors and middleware
authors *have to* deal with WSGI directly.  App authors can use
libraries to pretty it up, so we don't need to pretty it for them in
advance -- especially since we don't know what their *personal* idea
of pretty is going to be.  ;-)

The above API is cute and clean for the app writer, but for a
middleware writer it's a barrel of misery.  *Every* piece of
middleware that even wants to *read* anything from the response (let
alone modify it), now needs to check types of yielded values,
accumulate headers, and maybe buffer content.  And there are many ways
to write that middleware that will be wrong, but *appear* right
because the author didn't think of all the ways that an app could
violate the middleware author's assumptions.

On the other hand, if somebody wants to make a library implementing a
similar API to your proposal *on top* of WSGI, then sure, why not?
That's fine: it only adds overhead at a *single point*": the library
that implements the pretty API on top of WSGI.

> Writing that with a callback for trailers (which is the only
> alternative - its either a callback or a generator - because until the
> body is fully handled the content of the trailers cannot be
> determined):

Doesn't look bad to me.  It'd also be fine as a method on the response
body, and that would let us stick to (status, headers, body) as a
return value.

>> The other alternative is to use a dict as the response object
>> (analagous to environ as the request object), with named keys for
>> status, headers, trailers, body, etc.  It would then be extensible to
>> handle things like the "Associated content" concept.
>
> That might work, though it will force more closures. One of the things
> I like about the generator style is the clarity in code that we can
> achieve.

Please try to think instead of how you could implement those things in
a "make it nice" API for app authors.  WSGI wasn't made ugly on a
whim; it's the direct result of some very important design principles.
While the need for start_response() is gone, many of the other reasons
for its ugliness remain.

(In any case, you can still implement a generator-based API for
writing WSGI apps, without needing to make WSGI *itself* be
implemented that way.)

> Here's a body-size logging middleware:
>
> def logger(app):
>     def middleware(environ):
>         wrapped = app(environ)
>         yield next(wrapped)
>         body_bytes = 0
>         for maybe_body in wrapped:
>             if type(maybe_body) is bytes:
>                 body_bytes += len(maybe_body)
>             yield maybe_body
>         logging.info("Saw %d bytes for %s" % (body_bytes, environ['PATH_INFO']))
>     return middleware

Perhaps you meant this as a sketch, but note that you're not calling
close() on the underlying iterator.  At minimum, you need a
try/finally to do that, or else you need to use the wsgi_lite closing
extension -- and you need to assume that your parent middleware or
server is calling the closing extension on your response as well.
(Just another issue with implementing the core API based on
generators, since a generator function doesn't have access to its own
return result -- i.e., generator instance.)

After your original proposal, I actually gave some thought to the
benefits of implementing a pure-generator-based WSGI.  In theory,
there are some good things you can do with it.  In practice, though,
you pay a fairly high price in complexity for everything but the
already-complicated cases.

To put it another way, the common case for WSGI always was -- and
mostly still is -- to return an entire HTTP response in one go,
without any streaming or buffering or anything of that sort.  And
simple things should be simple, with complex things still being
possible.

Unfortunately, making the raw API generator-based benefits the complex
cases at the expense of the simple ones, at the middleware level.  It
should be *possible* to do the fancy things, but not at the expense of
making every piece of middleware more complex.  Not as long as app
writers can use an app-level library to get a nicer API, but
middleware and server authors have to *always* deal with the bare
metal.

So, let's trim the sharp edges for the poor middleware and server
developers, rather than polishing the bits that app writers aren't
going to be using, anyway.  (Since most of them are going to be using
Django, Pyramid, Flask, or whatever the latest hotness is, anyway.)

> Thats certainly a desirable property. If we've changed things too much
> to infer by the basic structure then we'll need some metadata for it.
> Works for me - I'd like to have a decorator for that:

A decorator is an API; WSGI is a protocol.  Of course people can use
decorators to implement the protocol, and wsgiref2 (or whatever)
should include some.  But the spec should define what metadata the
decorator would expose, rather than dictating the use of any
particular decorator.