[Web-SIG] WSGI2: write callable?

Sat Sep 27 06:20:14 CEST 2014

On 27 September 2014 15:44, PJ Eby <pje at telecommunity.com> wrote:
> On Fri, Sep 26, 2014 at 7:41 PM, Robert Collins
> <robertc at robertcollins.net> wrote:
>> One thing we could do with the status code in the headers dict is to
>> default to 200 - the vastly common case (in the same way that throwing
>> an error generates a 500). Then status wouldn't be required at all for
>> trivial uses. That would make things easier, no?
>
> At the cost of variation.  A core design principle of WSGI is that
> variations make things *harder*, not easier, because it means more
> alternatives that apps, servers, and middleware have to support, with
> more code paths and fewer of them properly tested.  Every variation
> that is part of the spec (as opposed to an extension), creates a LOT
> of complexity in the field.  (Which is one reason it'll be nice to get
> rid of start_response(), and all its convoluted sequencing logics.)

We should capture these design principles somewhere FAQ-like, since
many of the folk participating in this rework weren't part of the
original design.

Right now, anything providing the server profile has to cope with
exceptions and translate those to 500 errors, so we have the variation
of 'status and headers may not be provided'. Most middleware can be
oblivious and delegate this to the server via bubble-up. I suspect the
same would work for a default of 200 - 99% of middleware would ignore
it and it would just work. However, I'm not super attached - it was
just an idea.

>> So a classic example for Trailers is digitally signing streamed
>> content. Using the same strawman API as above:
>>
>> def app(environ):
>>    yield {':status': '200}
>>    md5sum = md5.new()
>>    for bytes in block_reader(open('foo', 'rb'), 65536):
>>        md5sum.update(bytes)
>>        yield bytes
>>    digest = md5sum.hexdigest()
>>    signature = sign_bytes(digest.encode('utf8'))
>>    yield {'Content-MD5Sum': digest, 'X-Signature': signature}
>>
>> Note that this doesn't need to buffer or use a closure.
>
> Please bear in mind that another core WSGI design principle is that we
> don't make apps easier to write by making servers and middleware
> harder to write.  That kills adoption and growth, because the audience
> that *needs* to adopt WSGI (or any successor standard) is the audience
> of people who write servers and middleware.  If a feature is sinfully
> ugly for the app writer, but a thing of beauty for a middleware
> author, we *want* that feature.

I get that to a degree - I think there is a balance to be struck. This
is why I'd like to put a few middleware examples together to compare
and contrast different start_response replacement APIs.

> Conversely, if a feature means that *every* piece of middleware now
> has to add an extra "if" statement to support the feature in order to
> make it pretty for the app writer, then we do NOT want that feature,
> and it should be taken out and shot *at once*.

Agreed.

> It's not a fair tradeoff, because only server authors and middleware
> authors *have to* deal with WSGI directly.  App authors can use
> libraries to pretty it up, so we don't need to pretty it for them in
> advance -- especially since we don't know what their *personal* idea
> of pretty is going to be.  ;-)

Server authors and middleware authors can use libraries too: we can
write functions to provide common handling for a bunch of stuff: thats
not to say we should make things bad at the API level - we shouldn't -
but it doesn't make sense to me to say that folk writing middleware
cannot use libraries.

> The above API is cute and clean for the app writer, but for a
> middleware writer it's a barrel of misery.  *Every* piece of
> middleware that even wants to *read* anything from the response (let
> alone modify it), now needs to check types of yielded values,
> accumulate headers, and maybe buffer content.  And there are many ways
> to write that middleware that will be wrong, but *appear* right
> because the author didn't think of all the ways that an app could
> violate the middleware author's assumptions.

Hang on, why would they buffer content? Buffering response content is
currently verboten, and I haven't seen any proposal to change that. I
don't understand how phrasing the API as I suggested would lead to
buffering being permitted or required.

The type checking does squick me a little, so back to drawing board.

> On the other hand, if somebody wants to make a library implementing a
> similar API to your proposal *on top* of WSGI, then sure, why not?
> That's fine: it only adds overhead at a *single point*": the library
> that implements the pretty API on top of WSGI.
>
>
>> Writing that with a callback for trailers (which is the only
>> alternative - its either a callback or a generator - because until the
>> body is fully handled the content of the trailers cannot be
>> determined):
>
> Doesn't look bad to me.  It'd also be fine as a method on the response
> body, and that would let us stick to (status, headers, body) as a
> return value.

If its a method on the response body, the returning a list or
generator no longer works, unless you start poking random attributes
onto things. It would also be inconsistent - why would trailers be a
method on the response, but headers be a dict in the return value?

>>> The other alternative is to use a dict as the response object
>>> (analagous to environ as the request object), with named keys for
>>> status, headers, trailers, body, etc.  It would then be extensible to
>>> handle things like the "Associated content" concept.
>>
>> That might work, though it will force more closures. One of the things
>> I like about the generator style is the clarity in code that we can
>> achieve.
>
> Please try to think instead of how you could implement those things in
> a "make it nice" API for app authors.  WSGI wasn't made ugly on a
> whim; it's the direct result of some very important design principles.
> While the need for start_response() is gone, many of the other reasons
> for its ugliness remain.
>
> (In any case, you can still implement a generator-based API for
> writing WSGI apps, without needing to make WSGI *itself* be
> implemented that way.)

I don't think WSGI is ugly, but I do think that things have changed
substantially in the python world since it came to be, and we owe it
to ourselves to investigate whether we can do better now.

Is there some documentation about the other reasons that it needs to
be ugly - last thing I want to do is waste folks time suggesting
things that won't work.

>> Here's a body-size logging middleware:
>>
>> def logger(app):
>>     def middleware(environ):
>>         wrapped = app(environ)
>>         yield next(wrapped)
>>         body_bytes = 0
>>         for maybe_body in wrapped:
>>             if type(maybe_body) is bytes:
>>                 body_bytes += len(maybe_body)
>>             yield maybe_body
>>         logging.info("Saw %d bytes for %s" % (body_bytes, environ['PATH_INFO']))
>>     return middleware
>
> Perhaps you meant this as a sketch, but note that you're not calling
> close() on the underlying iterator.

Indeed, I forgot that, and right after I replied to one of the issues
saying the API would need a close method :).

> At minimum, you need a
> try/finally to do that, or else you need to use the wsgi_lite closing
> extension -- and you need to assume that your parent middleware or
> server is calling the closing extension on your response as well.
> (Just another issue with implementing the core API based on
> generators, since a generator function doesn't have access to its own
> return result -- i.e., generator instance.)

The original WSGI spec avoiding defining objects on the basis of being
extremely minimal, to ease adoption - and its been a wild success. How
much complexity are we starting to drive though, as we keep avoiding
having an object - tuple return types, iterators with extra
attributes. Would a defined ABC be a burden to implementors these
days? I presume that it was the C servers like mod_python that we
would have harmed previously?

> After your original proposal, I actually gave some thought to the
> benefits of implementing a pure-generator-based WSGI.  In theory,
> there are some good things you can do with it.  In practice, though,
> you pay a fairly high price in complexity for everything but the
> already-complicated cases.

> To put it another way, the common case for WSGI always was -- and
> mostly still is -- to return an entire HTTP response in one go,
> without any streaming or buffering or anything of that sort.  And
> simple things should be simple, with complex things still being
> possible.

I would be interesting to get stats on that. The WSGI spec goes to
great pains to require that streaming work and buffering be verboten
(presumably excusable for middleware like JPEG->PNG transformers that
simply cannot avoid buffering) - but even then they are required to
yield a b'' AIUI.

I know that the vast majority of things I write try not to buffer
unless absolutely necessary, and HTTP's last three major revisions -
1.1, 1.1bis and 2 - have all had polish done around making streaming
more reliable in the internet we live in. Content-Length is now an
interesting aberration only really needed for large static content -
the wire protocols have no need of it (they did in HTTP/0.9 and
HTTP/1.0). Its useful for progress bars for big downloads, and thats
about it.

But your points about simple and complex are interesting. Middleware
authors need to cater to everything - so making the simple simple
doesn't make it simple for middleware authors - they don't get to opt
out. Its only by making everything as simple - uncomplected[1] - as
possible that we keep things easy for server and middleware authors.

> Unfortunately, making the raw API generator-based benefits the complex
> cases at the expense of the simple ones, at the middleware level.  It
> should be *possible* to do the fancy things, but not at the expense of
> making every piece of middleware more complex.  Not as long as app
> writers can use an app-level library to get a nicer API, but
> middleware and server authors have to *always* deal with the bare
> metal.

I still disagree that middleware and server authors cannot get a nicer
API through libraries. The different between middleware or server and
apps is that apps can choose not to care about things they don't care
about, whereas middleware and servers have to care - but appropriate
helper functions can still help them.

> So, let's trim the sharp edges for the poor middleware and server
> developers, rather than polishing the bits that app writers aren't
> going to be using, anyway.  (Since most of them are going to be using
> Django, Pyramid, Flask, or whatever the latest hotness is, anyway.)

Do you have a hitlist of such sharp edges you'd like to see catered
for in this new spec?

>> Thats certainly a desirable property. If we've changed things too much
>> to infer by the basic structure then we'll need some metadata for it.
>> Works for me - I'd like to have a decorator for that:
>
> A decorator is an API; WSGI is a protocol.  Of course people can use
> decorators to implement the protocol, and wsgiref2 (or whatever)
> should include some.  But the spec should define what metadata the
> decorator would expose, rather than dictating the use of any
> particular decorator.

That seems reasonable, presumably because any code we write will not
be backported to older standard libraries. I think it would be a
mistake to not think about the default experience as well though: we
should specify the protocol, and offer a good default API on top of
it.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud