[Web-SIG] Pre-PEP: The WSGI Middleware Escape for Native Server APIs

Tue Sep 30 01:19:19 CEST 2014

Per the previous discussion about HTTP/2, websockets, et al, here's my
attempt at providing something we can start using and implementing
today, as a bridge to future specifications.  If you'd prefer to read
it nicely formatted, you can find an HTML version in progress at:

    https://gist.github.com/pjeby/62e3892cd75257518eb0

I'm very interested in feedback from server and framework developers
with relevant experience to help close the "open issues and questions"
section.  Questions about the content or feedback on its presentation
would also be very helpful.

(For now, the text is in markdown, but of course I will switch it to
ReST once it begins stabilizing.)

# The WSGI Middleware Escape for Native Server APIs

# Overview

This document specifies a proposed standard WSGI extension that allows
WSGI applications to "escape" the standard WSGI API and access native
web server APIs, such as websockets, HTTP/2 features, or
Twisted/tulip-style asynchronous APIs.

The proposed extension, the Middleware Escape for Nativer Server APIs
or "MENSA", allows WSGI to continue to be used for the 98% of typical
web application use cases that fall within the basic HTTP/1.0
"request/response" paradigm, while allowing the 2% of use cases with
more sophisticated requirements to still benefit from "inbound" WSGI
middleware for sessions, authentication, authorization, routing, and
so forth, as well as keeping the other advantages of sharing the same
process with other WSGI code.

Specifically, the MENSA protocol allows a WSGI application to
*dynamically* switch at runtime from using a standard WSGI response,
to using a web server's "native" API to handle the current request
(and possibly subsequent ones), subject to certain conditions.

This approach provides present-day WSGI applications and frameworks
with a smooth upward migration path in the event that they require
access to websockets, HTTP/2-specific features, etc.  With it:

* Web servers can expose their native API to any WSGI application or framework

* Application developers can use existing middleware, libraries, or
frameworks to handle front-end tasks like routing and authentication

* Frameworks can offer a simple `response.use_native_api(...)` (or
similar) API to allow app developers to easily "jump out" of the
framework and request the use of a specific native server API for the
current request, and

* Even developers using frameworks that *don't* offer this escape API
can still use it, by invoking a short utility function given in this
specification, and adding a little framework-specific glue code

# Motivation

Recent discussion on the Python Web-SIG about incorporating HTTP/2
features into present-day WSGI has highlighted the extreme
difficulties of doing so without breaking certain types of middleware.
In addition, it highlighted the strong existing need for Websockets in
present-day web apps, and the ways in which existing Websocket
extensions for WSGI have the same problems.

Both HTTP/2 and Websockets are a fairly extreme break from the
request/response paradigm of HTTP/1.0 that WSGI was designed around,
making them difficult to represent within WSGI, and therefore a poor
fit for a direct extension of the existing WSGI protocol.

Such a direct extension would not only be premature for HTTP/2 (due to
a lack of existing HTTP/2 APIs for Python), but would also be
unnecessarily confining in terms of what features could be supported,
and unnecessarily complex in how those features would need to be
implemented.

Therefore, this proposal seeks to defer or *table* ("mensa" is the
Latin word for "table") the issue of creating an HTTP/2 WSGI extension
API, by making it possible for existing WSGI applications to access
*any* such API that existing web servers or server frameworks may wish
to provide.  (i.e. giving all of them "a seat at the table".)

Thus, it would not be necessary to standardize on the One True
Websocket API or One True HTTP/2 API at this time, because server
authors can simply expose their native APIs for the use of those web
applications that have need of such APIs.

This neatly resolves two current issues in the community at present:

1. Often, the only way to mix websockets (or HTTP/2) and WSGI is
through separate processes, often with the need to reinvent the wheel
for routing and other functions commonly handled by WSGI front-end
middleware

2. The "chicken and egg" problem of developing an HTTP/2 API spec when
there are few such APIs existing in the field, but nobody wants to
*implement* such APIs because nobody can use them from WSGI, and
nobody wants to abandon WSGI to write their entire applications or
frameworks based on a new and largely-untested API that's not yet
blessed as a specification.

In contrast, adoption of the WSGI MENSA spec allows both server
developers and application developers to experiment with advanced
server APIs, without throwing away their WSGI investments (or native
server API investments!), and only making new investments in that
portion of the application space that require access to more advanced
APIs.

That is, if the bulk of one's code is still in WSGI, it is still
migratable to other server platforms, with only the advanced portions
needing to be ported.  Thus, the risk of tying one's application too
tightly to one particular native API is considerably reduced.

Thus, as community experience with advanced server APIs is increased,
the practicality of actually defining a *standard* server API for
these types of applications is also increased.  Eventually, such a
standard API could then perhaps even replace WSGI, while still being
accessible from within legacy WSGI frameworks (via the MENSA escape).

# Scope

Goals of this specification include:

1. Defining a way for WSGI applications, at runtime (i.e., during the
execution of a request), to detect the existence of, and access,
"native sever APIs" which can be used in place of WSGI for either
effecting a response to the current request, or initiating a more
advanced communications protocol (such as websocket connections,
associated content pushing, etc.)

2. Defining ways for WSGI middleware to:

  1. Continue to be used for request routing and other pre-response
activities for all requests, as well as post-response activities for
requests that do not require native API access

  2. Intercept and assume control of any native APIs to be used by
wrapped applications or subrequests (assuming the middleware knows how
to do this for a specific native API, and desires to do so)

  3. Disable any or even *all* native API access by its wrapped apps
-- even without prior knowledge of *which* APIs might be used -- in
the event that the middleware can only perform its intended function
by denying such access

3. Defining a way for WSGI servers to negotiate a smooth transition of
response handling between standard WSGI and their native API, while
safely detecting whether intervening middleware has taken over or
altered the response in a way that conflicts with elevating the
current request to native API processing

Non-goals include:

* Actually defining any specification for the native APIs themselves  ;-)

# Specification

The basic idea of MENSA is to add a dictionary to the WSGI
environment, under the key `wsgi.native_api_hooks`.  Within this
dictionary, a single key is reserved for each non-WSGI API offered by
the server (or implemented via middleware).

So, for example, if Twisted were to offer a MENSA escape for WSGI
apps, it might register a `twisted` key within the
`wsgi.native_api_hooks` dictionary.

## Accessing a Native API

WSGI applications query the `wsgi.native_api_hooks` dictionary in
order to access the native API of their choice, and then delegate to
it.  So, for example, a pure WSGI app that switches to the `foobar`
native API mid-request might look like this:

    def my_wsgi_app(environ, start_response):

        native_apis = environ.get('wsgi.native_api_hooks', {})
        foobar_api = native_apis.get('foobar')

        if foobar_api is None:
            # appropriate error action here
            # i.e. raise something, or return an error response

        def my_foobar_app(foobar_specific_arg, another_foobar_arg, etc...):
            # code here that uses the foobar API to do something cool,
            # like maybe websockets or signed streaming trailers or
            # other buzzword-laden stuff  ;-)

        # Delegate the WSGI response to the native API
        return foobar_api(environ, start_response, my_foobar_app)

On the application side, this is all that's necessary for a pure-WSGI
application to switch to using a native server API and whatever its
advanced features permit.  (For applications using frameworks that
don't directly expose the WSGI start_response() or allow returning a
WSGI response body directly, a little extra glue code is required;
those details are covered in a later section of the spec.)

In the above example, `my_foobar_app` is a function, but depending on
the specific API involved, it could be a class or an instance of some
kind, or perhaps just a data structure of some sort.  The nature of
the "app" or other parameters passed to the API hook is completely
dependent on the design of the API being wrapped: only the first two
arguments to the hook are dictated by this specification.

So, for example, a Twisted native API might expect a `Protocol`
instance, rather than a function.  A gevent-based native API might
expect a generator, generator function, or perhaps a greenlet.  A
websocket API might take *two* parameters, for a writer and reader.
Defining and documenting the exact nature of the additional parameters
passed to the API hook is entirely up to the hook's provider.

## Providing an API

The implementation of a native API hook consists of a callable object,
looking something like this pseudocode:

    def some_server_api_hook(environ, start_response, native_app):
        response_key = new_unique_header_compatible_string()
        native_request.response_registry[response_key] = native_app
        start_response('399 WSGI-Escape: '+response_key, [
            ('Content-Type', 'application/x-wsgi-escape; id='+response_key),
            ('Content-Length', str(len(response_key)))
        ])
        return [response_key]

As you can see, this is a little bit like a WSGI application -- and in
fact it *is* a valid WSGI application, except for the addition of the
`native_app` parameter.

The API hook's job is to generate a unique ASCII "native string" key
for this response, and register the provided native app (or other
arguments) under that key for *future use*.

The server MUST NOT actually invoke or begin using the native
application until *after* the standard WSGI response process has been
completed, and it has verified that its markers are still present in
the WSGI response.

Those markers -- found in the status, headers, *and* response body --
are used to verify three things:

1. That the registered application is indeed a response to the
original incoming request, and not merely to a subrequest created by
middleware

2. That intervening middleware hasn't replaced the native API response
with a response of its own (for example, an error response created
because of an error occurring after the native app was registered, but
before it was used)

3. *Which* native application should be invoked, if more than one was registered

So, a server providing a native API must wait until it receives a WSGI
response whose status, content-type, content-length, and body all
unequivocally identify which of the native applications registered for
the current request should actually be used.

In the event that the status, type, and body all match, the server
MUST then activate the registered native application, allowing the
current request (and possibly subsequent requests, depending on the
API involved) to be handled via the associated native API.  (And
discard any other registered applications for the current request.)

In the event that neither the status nor headers designate a
registered native application, the server MUST treat the response as a
standard WSGI response, and discard all registered applications for
the current request.

In the event that the status and headers disagree on *which* native
application is to be used (or *whether* one is to be used at all), or
in the event that they *do* agree, but the body disagrees with them,
the server MUST generate an error response, and discard both the WSGI
response and any registered native applications.  (In the face of
ambiguity, refuse the temptation to guess; errors should not pass
silently.)

### Response Key Details

The key used to distinguish responses MUST be an ASCII "native string"
(as defined by PEP 3333).  It SHOULD also be relatively short, and
MUST contain only those characters that are valid in a MIME "token".
(That is, it may contain any non-space, non-control ASCII character,
except the special characters `(`, `)`, `<`, `>`, `@`, `,`, `;`, `:`,
`\`, `"`, `/`, `[`, `]`, `?`, and `=`.)

Response keys generated for a given API MUST be unique for the
duration of a given request, and MUST be generated in such a way so as
not to collide with keys issued for any *other* API during the same
request.  (e.g., by including the API's name in them.)

Response keys SHOULD also be unique within the lifetime of the process
that generates them, e.g. by simply including a global counter value.

(So, the simplest valid way of generating a response key is to just
append a global counter to a string identifying the native API.
However, there is nothing stopping a server from adding information
like a request ID, channel desginator, or other information in, as an
aid to debugging.  Just make sure there's no whitespace or special
characters involved, as mentioned above.)

## Intercepting or Disabling APIs

Because all server API hooks are contained in a single WSGI
environment key, it is easy for WSGI middleware to disable access to
them when creating subrequests, by simply deleting that key before
invoking an application.

Likewise, in the event that WSGI middleware wishes to disable one
*specific* API, or intercept it, it can do so by removing or replacing
the appropriate hook within the hooks dictionary.

(Note: The `wsgi.native_api_hooks` dictionary is to be considered
volatile in the same way as the WSGI environment is.  That is, apps or
middleware are allowed to modify or delete its contents freely, so a
copy MUST be saved by middleware if it wishes to access the original
values after it has been passed to another application or middleware.)

## Accessing Native APIs Inside Application Frameworks

Since relatively few applications are written in "pure WSGI", it's
necessary to show how one would go about accessing a native API from
inside an application framework that doesn't provide direct access to
the WSGI `start_response`, or allow directly returning a response
body.  Here is a simple, but fully-generic utility function that works
around this problem, provided there is at least access to the WSGI
environment:

    def use_native_api(environ, api_key, *args, **kw):

        native_api = environ.get('wsgi.native_api_hooks', {}).get(api_key)
        if native_api is None:
            raise RuntimeError("API unavailable")

        status = headers = None
        def start_response(s, h):
            nonlocal status, headers
            status, headers = s, h

        return status, headers, native_api(environ, start_response, *args, **kw)

The returned status, headers, and body can then be sent using
framework-specific APIs, so that they propagate back out through the
WSGI stack.

(Individual web frameworks, of course, can and *should* offer their
own, similar utilities to perform this function, e.g. by adding a
`use_native_api()` method on their response objects.  In that way,
developers can be spared the details of setting the status, headers,
etc.)

# Notes on Current Design Rationale

* A dictionary is used for all native APIs, so they can be easily
disabled for subrequests

* Multiple registrations are allowed, so that middleware invoking
multiple subrequests is unaffected, so long as exactly one
subrequest's response is returned

* A `Content-Type` header is part of the spec, because most
response-altering middleware should avoid altering content types it
does not understand, thereby increasing the likelihood that the
response will be passed through unchanged

# Open Questions and Issues

* What if middleware adds headers but leaves the status and
content-type unchanged?  Should that be an error?  What happens if
middleware requests setting cookies?
* Do the chosen status/headers/body signatures actually make sense?
Do they even need to be more specified, less-specified?
* Are there any major obstacles to sending a special status from major
web frameworks?
* Should a different status be used?
* We need better examples!  (They should more closely resemble some
actual use cases, rather than being vague abstracts)
* Are there any other ways to corrupt, confuse, or break this?
* What else am I missing, overlooking, or getting wrong?

# Acknowledgements

(TBD, but should definitely include Robert Collins for research,
inspiration, and use cases)

# References

TBD

# Copyright

This document has been placed in the public domain.