[Web-SIG] The rewritten WSGI pre-PEP

Mon Aug 9 01:59:14 CEST 2004

This version is an almost complete rewrite, based on a new interface 
approach developed by Tony Lownds and I.  As you'll see, it tries to 
address as much of the list's feedback as I could absorb and remember.  So, 
please be patient with me if I missed taking something into account.

As always, your comments and feedback are appreciated.

PEP: XXX
Title: Python Web Server Gateway Interface v1.0
Version: $Revision: 1.1 $
Last-Modified: $Date: 2004/08/08 19:48:42 $
Author: Phillip J. Eby <pje at telecommunity.com>
Discussions-To: Python Web-SIG <web-sig at python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 07-Dec-2003
Post-History: 07-Dec-2003, 08-Aug-2004

Abstract
========

This document specifies a proposed standard interface between web
servers and Python web applications or frameworks, to promote
web application portability across a variety of web servers.

Rationale
=========

Python currently boasts a wide variety of web application
frameworks, such as Zope, Quixote, Webware, Skunkware, PSO,
and Twisted Web -- to name just a few [1]_.  This wide variety
of choices can be a problem for new Python users, because
generally speaking, their choice of web framework will limit
their choice of usable web servers, and vice versa.

By contrast, although Java has just as many web application
frameworks available, Java's "servlet" API makes it possible
for applications written with any Java web application framework
to run in any web server that supports the servlet API.

The availability and widespread use of such an API in web
servers for Python -- whether those servers are written in
Python (e.g. Medusa), embed Python (e.g. mod_python), or
invoke Python via a gateway protocol (e.g. CGI, FastCGI,
etc.) -- would separate choice of framework from choice
of web server, freeing users to choose a pairing that suits
them, while freeing framework and server developers to focus
on their area of specialty.

This PEP, therefore, proposes a simple and universal interface
between web servers and web applications or frameworks: the
Python Web Server Gateway Interface (WSGI).

But the mere existence of a WSGI spec does nothing to address the
existing state of servers and frameworks for Python web applications.
Server and framework authors and maintainers must actually implement
WSGI for there to be any effect.

However, since no existing servers or frameworks support WSGI, there
is little immediate reward for an author who implements WSGI support.
Thus, WSGI *must* be easy to implement, so that an author's initial
investment in the interface can be reasonably low.

Thus, simplicity of implementation on *both* the server and framework
sides of the interface is absolutely critical to the utility of the
WSGI interface, and is therefore the principal criterion for any
design decisions.  (It should also be easy to create request
preprocessors, response postprocessors, and other "middleware"
components that look like an application to their containing server,
while acting as a server for their contained applications.)

Note, however, that simplicity of implementation for a framework
author is not the same thing as ease of use for a web application
author.  WSGI presents an absolutely "no frills" interface to the
framework author, because bells and whistles like response objects
and cookie handling would just get in the way of existing frameworks'
handling of these issues.  Again, the goal of WSGI is to facilitate
easy interconnection of existing servers and applications or
frameworks, not to create a new web framework.

Note also that this goal precludes WSGI from requiring anything that
is not already available in deployed versions of Python.  Therefore,
new standard library modules are not proposed or required by this
specification, and nothing in WSGI requires a Python version greater
than 1.5.2.  (It would be a good idea, however, for future versions
of Python to include support for this interface in web servers
provided by the standard library.)

Finally, the current version of WSGI does not prescribe any
particular mechanism for "deploying" an application for use with a
web server or server gateway.  At the present time, this is
necessarily implementation-defined by the server or gateway.
After a sufficient number of servers and frameworks have implemented
WSGI to provide field experience with varying deployment
requirements, it may make sense to create another PEP, describing
a deployment standard for WSGI servers and application frameworks.

Specification Overview
======================

The WSGI interface has two sides: the "server" or "gateway" side,
and the "application" side.  The server side invokes a callable
object that is provided by the application side.  The specifics
of how that object is provided are up to the server or gateway.
It is assumed that some servers or gateways will require an
application's deployer to write a short script to create an
instance of the server or gateway, and supply it with the
application object.  Other servers and gateways may use
configuration files or other mechanisms to specify where the
application object should be imported from.

The application object is simply a callable object that accepts
two arguments.  The term "object" should not be misconstrued as
requiring an actual object instance: a function, method, class,
or instance with a ``__call__`` method are all acceptable for
use as an application object.  Here are two example application
objects; one is a function, and the other is a class::

     def simple_app(environ, start_response):
         """Simplest possible application object"""
         status = '200 OK'
         headers = [('Content-type','text/plain')]
         write = start_response(status, headers)
         write('Hello world!\n')

     class AppClass:
         """Much the same thing, but as a class"""

         def __init__(environ, start_response):
             self.environ = environ
             self.start = start_response

         def __iter__(self):
             status = '200 OK'
             headers = [('Content-type','text/plain')]
             self.start(status, headers)

             yield "Hello world!\n"
             for i in range(1,11):
                 yield "Extra line %s\n" % i

The server or gateway invokes the application once for each request
it receives from a web browser.  To illustrate, here is a simple
CGI gateway, implemented as a function taking an application object
(all error handling omitted)::

     import os, sys

     def run_with_cgi(application):

         environ = {}
         envrion.update(os.environ)
         environ['wsgi.input']        = sys.stdin
         environ['wsgi.errors']       = sys.stderr
         environ['wsgi.version']      = '1.0'
         environ['wsgi.multithread']  = False
         environ['wsgi.multiprocess'] = True

         def start_response(status,headers):
             print "Status:", status
             for key,val in headers:
                 print "%s: %s" % (key,val)
             return sys.stdout.write

         result = application(environ, start_response)
         if result:
             try:
                 for data in result:
                     sys.stdout.write(data)
             finally:
                 if hasattr(result,'close'):
                     result.close()

In the next section, we will specify the precise semantics that
these illustrations are examples of.

Specification Details
=====================

The application object must accept two positional arguments.  For
the sake of illustration, we have named them ``environ``, and
``start_response``, but they are not required to have these names.
A server or gateway *must* invoke the application object using
positional (not keyword) arguments.

The first parameter is a dictionary object, containing CGI-style
environment variables.  This object *must* be a builtin Python
dictionary (*not* a subclass, ``UserDict`` or other dictionary
emulation), and the application is allowed to modify the dictionary
in any way it desires.  The dictionary must also include certain
WSGI-required variables (described in a later section), and may
also include server-specific extension variables, named according
to a convention that will be described below.

The second parameter is a callable accepting two positional
arguments: a status string of the form ``"999 Message here"``,
and a list of ``(header_name,header_value)`` tuples describing the
HTTP response header.  This callable must return another callable
that takes one parameter: a string to write as part of the HTTP
response body.

The application object may return either ``None`` (indicating that
there is no additional output), or it may return a non-empty
iterable yielding strings.  (For example, it could be a
generator-iterator that yields strings, or it could be a
sequence such as a list of strings.)  If the application
returns an iterable, and the iterable has a ``close()`` method,
the server or gateway *must* call that method upon completion
of the current request, whether the request was completed normally,
or terminated early due to an error.  (This is to support resource
release by the application.  The specific protocol is intended to
support PEP 325, and also the simple case of an application returning
an open text file.)

``environ`` Variables
---------------------

The ``environ`` dictionary is required to contain CGI environment
variables, as defined by the Common Gateway Interface specification
[2]_.  In addition, it must contain the following WSGI-defined
variables:

====================   =============================================
Variable               Value
====================   =============================================
``wsgi.version``       The string ``"1.0"``

``wsgi.input``         An input stream from which the HTTP request
                        body can be read.

``wsgi.errors``        An output stream to which error output can
                        be written.  For most servers, this will be
                        the server's error log.

``wsgi.multithread``   This value should be true if the application
                        object may be simultaneously invoked by
                        another thread in the same process, and
                        false otherwise.

``wsgi.multiprocess``  This value should be true if an equivalent
                        application object may be simultaneously
                        invoked by another process, and false
                        otherwise.
====================   =============================================

Finally, the ``environ`` dictionary may also contain server-defined
variables.  These variables should be named using only lower-case
letters, numbers, dots, and underscores, and should be prefixed with
a name that is unique to the defining server or gateway.  For
example, ``mod_python`` might define variables with names like
``mod_python.some_variable``.

Note: missing variables (such as ``REMOTE_USER`` when no
authentication has occurred) should be left out of the ``environ``
dictionary.  Also note that CGI-defined variables must be strings,
if they are present at all.  It is a violation of this specification
for a CGI variable's value to be of any type other than ``str``.

Input and Error Streams
~~~~~~~~~~~~~~~~~~~~~~~

The input and error streams provided by the server must support
the following methods:

===================  =========  ========
Method               Files      Notes
===================  =========  ========
``read(size)``       ``input``
``readline()``       ``input``   1
``readlines(hint)``  ``input``   2
``__iter__()``       ``input``
``flush()``          ``errors``  3
``write(str)``       ``errors``
``writelines(seq)``  ``errors``
===================  ==========  ========

The semantics of each method are as documented in the Python Library
Reference, except for these notes as listed in the table above:

1. The optional "size" argument to ``readline()`` is not supported,
    as it may be complex for server authors to implement, and is not
    often used in practice.

2. Note that the ``hint`` argument to ``readlines()`` is optional for
    both caller and implementer.  The application is free not to
    supply it, and the server or gateway is free to ignore it.

3. Since the ``errors`` stream may not be rewound, a container is
    free to forward write operations immediately, without buffering.
    In this case, the ``flush()`` method may be a no-op.  Portable
    applications, however, cannot assume that output is unbuffered
    or that ``flush()`` is a no-op.  They must call ``flush()`` if
    they need to ensure that output has in fact been written.  (For
    example, to minimize intermingling of data from multiple processes
    writing to the same error log.

The methods listed in the table above *must* be supported by all
servers conforming to this specification.  Applications conforming
to this specification *must not* use any other methods or attributes
of the ``input`` or ``errors`` objects.  In particular, applications
*must not* attempt to close these streams, even if they possess
``close()`` methods.

The ``start_response()`` Callable
---------------------------------

The second parameter passed to the application object is itself a
two-argument callable, used to begin the HTTP response and return
a ``write()`` function.  The first parameter it takes is a "status"
string, of the form ``"999 Message here"``, where ``999`` is replaced
with the HTTP status code, and ``Message here`` is replaced with the
appropriate message text.  The string *must* be pure 7-bit ASCII,
containing no control characters.  In particular, it must not be
terminated with a carriage return or linefeed.

The second parameter accepted by the ``start_response()`` callable
must be a sequence of ``(header_name,header_value)`` tuples.  Each
``header_name`` must be a valid HTTP header name, without a
trailing colon or other punctuation.  Each ``header_value``
*must not* include a trailing carriage return or linefeed: it
should be a raw header value.  (These requirements are to minimize
the complexity of parsing required by servers, gateways, and
intermediate response processors that need to inspect or modify
response headers.)

The return value of the ``start_response()`` callable is a
one-argument callable, that accepts strings to write as part of the
HTTP response body.

Implementation/Application Notes
================================

Unicode
-------

HTTP does not directly support Unicode, and neither does this
interface.  All encoding/decoding must be handled by the application;
all strings and streams passed to or from the server must be standard
Python byte strings, not Unicode objects.  The result of using a
Unicode object where a string object is required, is undefined.

Multiple Invocations
--------------------

Application objects must be able to be invoked more than once, since
virtually all servers/gateways will make such requests.

Error Handling
--------------

Servers *should* trap and log exceptions raised by
applications, and *may* continue to execute, or attempt to shut down
gracefully.  Applications *should* avoid allowing exceptions to
escape their execution scope, since the result of uncaught exceptions
is server-defined.

Thread Support
--------------

Thread support, or lack thereof, is also server-dependent.
Servers that can run multiple requests in parallel, *should* also
provide the option of running an application in a single-threaded
fashion, so that applications or frameworks that are not thread-safe
may still be used with that server.

Application Configuration
-------------------------

This specification does not define how a server selects or
obtains an application to invoke.  These and other configuration
options are highly server-specific matters.  It is expected that
server/gateway authors will document how to configure the server to
execute a particular application object, and with what options (such
as threading options).

Framework authors, on the other hand, should document how to create
an application object that wraps their framework's functionality.
The user, who has chosen both the server and the application
framework, must connect the two together.  However, since both the
framework and the server now have a common interface, this should
be merely a mechanical matter, rather than a significant engineering
effort for each new server/framework pair.

Middleware
----------

Note that a single object may play the role of a server with respect
to some application(s), while also acting as an application with
respect to some server(s).  Such "middleware" components can perform
such functions as:

   * Routing a request to different application objects based on the
     target URL, after rewriting the ``environ`` accordingly.

   * Allowing multiple applications or frameworks to run side-by-side
     in the same process

   * Load balancing and remote processing, by forwarding requests and
     responses over a network

   * Perform content postprocessing, such as applying XSL stylesheets

Given the existence of applications and servers conforming to this
specification, the appearance of such reusable middleware becomes
a possibility.

Questions and Answers
=====================

1. Why must ``environ`` be a dictionary?  What's wrong with using
    a subclass?

    The rationale for requiring a dictionary is to maximize
    portability between servers.  The alternative would be to define
    some subset of a dictionary's methods as being the standard and
    portable interface.  In practice, however, most servers will
    probably find a dictionary adequate to their needs, and thus
    framework authors will come to expect the full set of dictionary
    features to be available, since they will be there more often
    than not.  But, if some server chooses *not* to use a dictionary,
    then there will be interoperability problems despite that
    server's "conformance" to spec.  Therefore, making a dictionary
    mandatory simplifies the specification and guarantees
    interoperabilty.

    Note that this does not prevent server or framework developers
    from offering specialized services as custom variables *inside*
    the ``environ`` dictionary.  This is the recommended approach
    for offering any such value-added services.

2. Why can you call ``write()`` *and* yield strings/return an
    iterator?  Shouldn't we pick just one way?

    If we supported only the iteration approach, then current
    frameworks that assume the availability of "push" suffer.
    But, if we only support pushing via ``write()``, then
    server performance suffers for transmission of e.g. large
    files (if a worker thread can't start on a new request
    until all of the output has been sent).  Thus, this compromise
    allows an application framework to support both approaches, as
    appropriate, but with only a little more burden to the server
    implementor than a push-only approach would require.

3. What's the ``close()`` for?

    When writes are done from during the execution of an application
    object, the application can ensure that resources are released
    using a try/finally block.  But, if the application returns an
    iterator, any resources used will not be released until the
    iterator is garbage collected.  The ``close()`` idiom allows
    an application to release critical resources at the end of a
    request, and it's forward-compatible with the support for
    try/finally in generators that's proposed by PEP 325.

4. Why is this interface so low-level?  I want feature X!  (e.g.
    cookies, sessions, persistence, ...)

    This isn't Yet Another Python Web Framework.  It's just a way
    for frameworks to talk to web servers, and vice versa.  If you
    want these features, you need to pick a web framework that
    provides the features you want.  And if that framework lets
    you create a WSGI application, you should be able to run it
    in most WSGI-supporting servers.  Also, some WSGI servers may
    offer additional services via objects provided in their
    ``environ`` dictionary; see the applicable server documentation
    for details.  (Of course, applications that use such extensions
    will not be portable to other WSGI-based servers.)

Acknowledgements
================

Thanks go to the many folks on the Web-SIG mailing list whose
thoughtful feedback made this revised draft possible.  Especially:

  * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who
    beat up on the first draft as not offering any advantages
    over "plain old CGI", thus encouraging me to look for a
    better approach.

  * Ian Bicking, who helped nag me into properly specifying
    the multithreading and multiprocess options, as well as
    badgering me to provide a mechanism for servers to supply
    custom extension data to an application.

  * Tony Lownds, who came up with the concept of a ``start_response``
    function that took the status and headers, returning a ``write``
    function.

References
==========

.. [1] The Python Wiki "Web Programming" topic
    (http://www.python.org/cgi-bin/moinmoin/WebProgramming)

.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)

Copyright
=========

This document has been placed in the public domain.

..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    End: