[Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

Graham Dumpleton graham.dumpleton at gmail.com
Mon Jan 4 19:12:39 EST 2016


> On 4 Jan 2016, at 11:27 PM, Cory Benfield <cory at lukasa.co.uk> wrote:
> 
> All,
> 
> **TL;DR: What do you believe WSGI 2.0 should and should not do? Should we do it at all?**
> 
> It’s a new year, and that means it’s time for another attempt to get WSGI 2.0 off the ground. Many of you may remember that we attempted to do this last year with Rob Collins leading the charge, but unfortunately personal commitments made it impossible for Rob to keep pushing that attempt forward.

Although you call this round 2, it isn’t really. Robert’s effort was not the first time someone has pushed a WSGI 2.0 variant. So this is more like being about round 5 or 6.

In part because of those repeated attempts by people to propose something and label it as WSGI 2.0, I am very cool on reusing the WSGI 2.0 moniker. You will find little or no mention of ‘WSGI 2.0’ as a label in:

    https://github.com/python-web-sig/wsgi-ng <https://github.com/python-web-sig/wsgi-ng>

That is probably somewhat due to my grumbling about the use of ‘WSGI 2.0’ back then.

Time has moved on and so the bad feelings and memories associated with the ‘WSGI 2.0’ label due to early failed efforts have faded, but I would still suggest avoiding the label ‘WSGI 2.0’ if at all possible.

My general feeling is that if any proposed changes to the existing WSGI (PEP 3333) specification cannot be technically implemented on all existing WSGI server/adapter implementations that any new specification should not still be called WSGI.

In other words, even if many of these implementations may not be used much any more, it must be able to work, without needing to mark things as optional, on CGI, FASTCGI, SCGI, mod_wsgi, gunicorn, uWSGI, Waitress, etc etc.

This is purely to avoid the confusion whereby implementations cannot or choose not to implement any new specification. The last thing any WSGI server author wants is having to deal with a constant stream of questions and bug reports about not supporting an updated specification where technically it was never going to be possible. We have some obligation not to inflict this on what are, in nearly all cases, volunteers in the Open Source world who work on these things in their spare time and who are not doing it as part of their paid employment.

> Since then, the need for a revision of WSGI has become even more apparent. Casual discussion on the web has indicated that application developers are uncomfortable with the limitations of WSGI. These limitations are providing an incentive for both application developers and server developers to take an end-run around WSGI in an attempt to get a framework that is more suitable for the modern web. A great example of the result of WSGI’s deficiencies is Andrew Godwin’s channels work[0] for Django, which represents a paradigm shift in application development that takes it far away from what WSGI is today.
> 
> For this reason, I think we need to try again to get WSGI 2.0 off the ground. But I don’t believe we can do this without getting broad consensus from the developer community that a revision to WSGI is needed, and without understanding what developers need from a new revision of WSGI. This should take into account the prior discussions we’d had on this thread: however, I’m also going to actively solicit feedback from some of the more notable WSGI implementers, to ensure that whatever comes out of this SIG is something that they would actually use.
> 
> This WG already had a list of requirements, which are as follows:
> 
> - Support servers speaking HTTP/1.x, HTTP/2 and Websockets (potentially all on a single port).

Any support for implementing WebSockets should though be seen as a separate requirement to implementing HTTP/2.

A specific WSGI server implementation may be able to support HTTP/2, but not support WebSockets, or it could support WebSockets via HTTP/1.x already. In fact basic request/response functionality of HTTP/2 maps into the existing WSGI API specification and doesn’t really require any changes be made to the WSGI specification.

For example, mod_wsgi already supports HTTP/2 by virtue of the fact that the mod_h2 module in Apache exists. The existing internal APIs of Apache and how mod_wsgi uses those means that HTTP/2 bridges into the WSGI world with no code changes to mod_wsgi.

To support WebSockets is a much bigger problem and is not achievable with CGI, FASTCGI, SCGI.

It may be able to be supported within the Apache/mod_wsgi implementation, but the major re-architecting required in the mod_wsgi code, and the fact that it couldn’t be done by simply exposing a socket, but by requiring a new high level abstract API be developed which doesn’t expose the actual socket object, means you are really talking about a whole new API.

To me the WebSocket requirement and the need for a completely new API rules out ever doing this as part of an updated WSGI specification. It should really be treated as a completely separate thing.

There has been discussed previously the possibility of bootstrapping into a WebSocket session (or any other new protocol or its corresponding API) via a connection upgrade process. In other words, you have the request actually make it to the WSGI application and it then decides to push back some response that causes the underlying server to resubmit the request back to the Python web application as a whole, but via a different API.

This idea that the WSGI application would make the decision though was a somewhat clumsy mechanism and could easily be messed up where people start wrapping WSGI middleware around applications and so the decision point is nested. This would likely be impractical for implementations such as mod_wsgi and may be uWSGI as well, where you may at the point of calling into the Python code already be nested within the layers of some C level abstractions that exist between the WSGI application and the underlying server. You are really well past the point where the decision to use a particular protocol can sensibly be made. It is just too hard to try and unwind any server level layers and switch protocols.

So if something were to support WebSockets, it should be a decision made down in the underlying server and calling into any Python web application should be done through a distinct API from the existing WSGI application API, where the API entry point for WebSocket was defined distinct from the WSGI application one.

They are therefore two different APIs and so why WebSocket should be dealt with in a separate specification and not carry the WSGI label at all. A specific WSGI server could still support the new WebSocket API, but purely because it decides to support both in the same process. Not because the WebSocket API makes use of the WSGI specification.

The only thing you might allow to make it easier to have both coexist in same code, is to add a convention that a WSGI application callable might provide a new function such as ‘__endpoint__(protocol)’ which allows the underlying server to request of the same application object an API entry point for any new protocol such as WebSocket. This may well be better done though as some new higher level abstraction encapsulating the whole concept of a web application which supports idea of startup/shutdown hooks, passing of configuration from the server etc. Right now there is no consistency for this between WSGI servers. If such a higher level abstraction for an entrypoint were created, even getting a the ‘WSGI’ API endpoint may require the initial call to request it.

Whatever way a server learns about a web application supporting additional protocols, be it through server configuration or a discoverable higher level application object abstraction, the key thing is that the server should be the one left to make the decision of what actual Python API object to call into so that the server is more readily able to set up any protocol stack with the server part to match before it is too late and it isn’t possible to undo what it may have already set up.

> - Support graceful degradation for applications that can use HTTP/2 but still support HTTP/1.x requests.

The issue here is really how much of the new functionality of HTTP/2 you expose to a Python web application.

As far as basic request/response mapping into existing WSGI interface there is no need for graceful degradation to be considered, at least not at the WSGI level, as that is an issue for the underlying server. Whether the server handles it as HTTP/1.x or HTTP/2, it still maps to the same WSGI application API and the application wouldn’t care.

For new functionality of HTTP/2, much like WebSockets, I believe a completely new API should be developed. It isn’t necessarily going to be realistic to try and shoe horn it into the existing WSGI API somehow.

> - Graceful incremental adoption path - no upgrade-all-components requirement baked into the design.

It is hard to see what you expectations are here.

Prior attempts to force ASYNC into WSGI, and in some respects WebSockets through forcing raw fd access have not been practical. WSGI simply is not a good vehicle for it. Long term it is going to be much better to have new APIs for new WebSocket and HTTP/2 support.

The only even partly graceful path is perhaps first ignoring WebSockets and HTTP/2 and coming up with a more rich higher level abstraction for the complete Python web application entry point itself. So the idea above of a higher level object which defines hooks for startup/shutdown, passing configuration and also perhaps the querying of what protocols are supported by the application and even optionally what specific URL endpoints those protocols are active on. You could even have a application say where static file assets live so the server could host them itself via any more optimal methods than the application itself could use.

Get this in place then existing WSGI servers could be changed to accomodate this new higher level abstraction for the entrypoint. They may not support new protocols initially, or maybe not at all, but it at least provides a framework for the server and application to coordinate better and so allow a server to direct certain protocol types to different endpoints in the application, or even for the server to notify the application that certain protocols aren’t supported and so allow an application to use alternative mechanisms.

Down the track with HTTP/2 support, with the ability of the application to say, this is where my static assets are, you could even perhaps have a way of flagging that certain assets should be pushed back by the server knowing that they will be required. This way the server becomes responsible for that at the place where lower level access to HTTP/2 primitives is available, which might not be passed through a higher level API.

> - Support Python 2.7 and 3.x (where x is not yet discussed)

3.3 would need to be the absolute minimum. Support anything older in 3.x is too much of a pain.

> - Support the existing ecosystem of containers (such as mod_wsgi) with the new API. We want a clean, fast and approachable API, and we want to ensure that its no less friendly to work with than WSGI, for all that it will expose much more functionality.
> - Apps need to be able to tell what protocol is in use, and what optional features are available. For instance, HTTP/2 PUSH PROMISE is an optional feature that can be disabled by clients. Websockets needs to expose a socket like object, and so on.

I will stress my opposition to exposing of any raw socket. Some existing servers will simply not be able to do that in a sensible way where they already use a internal proxying arrangement where there exists a messaging layer between processes and the raw socket is actually only available in a completely different process to the web application.

> - Support websockets
> - Support HTTP/2
> - Support HTTP/1.x (which may be just 'point at PEP-3333’.)
> - Continue to support lightweight shims being built on top such as https://github.com/Pylons/webob/blob/master/webob/request.py
> 
> I believe that all of these requirements are up for grabs, and subject to change and consensus discussion. In this thread, then, I’d like to hear from people about these requirements and others. What do you believe WSGI 2.0 should do? Just as importantly, what do you believe it should not do? What prior art should we take into account? Should we bother revising WSGI at all, or should we let the wider application ecosystem pursue its own solutions à la Django's channels? Should we simply adopt Andrew Godwin’s ASGI draft[1] on which channels is based and call *that* WSGI 2.0?

My current thinking is that what needs to be done is:

1. An optionally updated WSGI specification labelled as WSGI 1.1. This has got nothing really to do with the initiative to have a way to handle WebSockets and HTTP/2. It would simply to be integrate changes which were raised the last time the WSGI specification was updated, but which were passed over because a PEP was in the end rushed through just to deal with Python 3, ignoring other concerns. There are only a few changes which this would cover.

The first relates to the guarantee that you are able to read past CONTENT_LENGTH because a WSGI server will return an empty string on end of input. This is to support chunked request encoding and compressed request content where decompression is handled by the server. Basically, CONTENT_LENGTH becomes advisory only. WSGI applications are allowed to ignore it, expect maybe for raising a 413 response, and read to end of input. The change of wsgi.version to 1.1 is needed to allow frameworks to know the guarantee exists that this will work.

This ability has existed in Apache/mod_wsgi for a long time and the Flask builtin server also supports it with Werkzeug/Flask currently relying on looking for special non standard markers in WSGI environ to know the guarantee exists. I have blogged about this issue before.

The second relates to the wsgi.file_wrapper object being required to be a class type. I will not go into how as I have also blogged about this before, but this allows middleware to wrap a response iterable to add a action on close() but not break any optimisations for more performant sending of files.

A third change is to fix the example for wsgi.file_wrapper fallback, which doesn’t close the file descriptor properly and so results in leakage of file descriptors, with them only being cleaned up by the garbage collector.

I vaguely recollect there may have been another issue for wsgi.file_wrapper around response Content-Length. I can’t remember if that definitely required a change. I will need to go back my blog posts about that one. There could also be other things I have found as still being wrong.

As I note above, this is optional. But if we are going to close out WSGI and not develop it further, would be nice to fix up some of the last problems with it.

2. Develop a higher level abstraction for what is a Python web application. Thus hooks for startup/shutdown, passing configuration from the server, or querying back configuration from the application pertaining to supported protocols, along with what sub URLs protocols are supported on, and where static file assets may be that application may want the server to handle if that would be more performant.

I believe that such a new high level abstraction will provide a better framework to hang things off when we introduce new protocols.

3. Separate WebSocket API.

Basically ignore existing WSGI specification completely. Come up with the best API one can for WebSocket interaction at the server level. This should not just be exposing a socket, but be a higher level abstraction involving passing of actual WebSocket messages.

By using higher level abstraction it allows a server to implement the details using whatever mechanisms best fit that server implementation.

4. Separate HTTP/2 API.

Again, ignore existing WSGI specification complete. Come up with the best API one can for dealing with HTTP/2.

For (3) and (4) lets do these as being our holy grail. Rather than compromise by trying to work with WSGI, lets first come up with what would be our ideal. Then lets see how that can fit within existing servers, possibly integrated via the richer application abstraction of (2).

> Right now I want this to be very open. I’d like people to come up with a broad statement listing what they believe should and should not be present in WSGI. This first stage of the work is very general: I just want to get a feeling for what the community believes is important. Once we’re done with that, if the consensus is that this work is worth pursuing, I’ll come up with an initial draft that we can start making concrete changes to.
> 
> In the short term, I’m going to keep this consultation open for **at least two weeks**: that is, I will not start working on an initial draft PEP until at least the **18th of January**. If you believe there are application or server developers that should be involved in this discussion, please reach out to them and point them to this list. I personally have CC’d some people that I believe need to be involved in the discussion, but please reach out to others as well.

It isn’t clear what you expect this PEP to include, but trying to push for a PEP so quickly is unrealistic. There is likely going to need to be a fair bit of discussion and with the fact that people have real jobs, or other obligations, history has shown that rushing to a PEP just disenfranchises people and they will not contribute due to the inability to do so in too short a time frame.

> I’d really love to come to the end of 2016 with a solid direction for the future of web programming in Python. I’m looking forward to working with you all on achieving that.

Graham
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20160105/9bfd56fd/attachment-0001.html>


More information about the Web-SIG mailing list