[Web-SIG] "Web3" Spec (aka WSGI2)

Chris McDonough chrism at plope.com
Wed Oct 13 22:25:30 CEST 2010


Yeah, not sure why it didn't show up when it was sent... it's safe to
ignore.

- C

On Wed, 2010-10-13 at 15:17 -0500, Ian Bicking wrote:
> Huh, this just came through, but has an old date on it.  I'm assuming
> it was just stuck in some queue?
> 
> 
> On Tue, Jul 20, 2010 at 12:43 AM, Chris McDonough <chrism at plope.com>
> wrote:
>         Below is the first draft of a specification for a WSGI-like
>         protocol
>         I've tentatively named "Web3".  If it's formatted poorly in
>         this email
>         for you, the in-progress version of spec is also available at
>         http://svn.repoze.org/playground/chris/web3.txt
>         
>         Web3 is a backwards-incompatible variant of WSGI which:
>         
>         - Is compatible with Python 2.6, 2.7 and 3.1.
>         
>         - Uses bytes to represent all environment values and
>         application body,
>          staus, and header values.
>         
>         - Breaks support for asynchronous servers and applications.
>         
>         - Tries to address existing problems with WSGI 1.0 (at least
>         the ones
>          I found while trolling the maillist and the WSGI site).
>         
>         Much of it is a reworking of PEP 333, with significant
>         differences from
>         WSGI called out in a section near the beginning.  It also
>         contains a
>         "Points of Contention" section near the end that anticipates
>         argument.
>         
>         My reasoning for creating the spec was to see exactly how
>         crappy it
>         would be to write to a spec that drew equivalence between
>         Python 2
>         ``str`` and Python 3 ``bytes`` rather than between the Python
>         2 ``str``
>         and Python 3 ``str`` equivalence promoted by most
>         conversations here.
>         The answer: about as crappy.  But slightly less crappy than I
>         feared.
>         
>         Here's the spec...
>         
>         PEP: XXX
>         Title: Python Web3 Interface
>         Version: $Revision$
>         Last-Modified: $Date$
>         Author: Chris McDonough <chrism at plope.com>
>         Discussions-To: Python Web-SIG <web-sig at python.org>
>         Status: Draft
>         Type: Informational
>         Content-Type: text/x-rst
>         Created: 19-Jul-2010
>         
>         Abstract
>         ========
>         
>         This document specifies a proposed second-generation standard
>         interface between web servers and Python web applications or
>         frameworks.
>         
>         Rationale and Goals
>         ===================
>         
>         This protocol and specification is influenced heavily by the
>         Web
>         Services Gateway Interface (WSGI) 1.0 standard described in
>         PEP 333
>         [1]_ .  The high-level rationale for having any standard that
>         allows
>         Python-based web servers and applications to interoperate is
>         outlined
>         in PEP 333.  This document essentially uses PEP 333 as a
>         template, and
>         changes its wording in various places for the purpose of
>         forming a
>         different standard.
>         
>         Python currently boasts a wide variety of web application
>         frameworks
>         which use the WSGI 1.0 protocol.  However, due to changes in
>         the
>         language, the WSGI 1.0 protocol is not compatible with Python
>         3.  This
>         specification describes a standardized WSGI-like protocol that
>         lets
>         Python 2.6, 2.7 and 3.1+ applications communicate with web
>         servers.
>         Web3 is clearly a WSGI derivative; it only uses a different
>         name than
>         "WSGI" in order to indicate that it is not in any way
>         backwards
>         compatible.
>         
>         Applications and servers which are written to this
>         specification are
>         meant to work properly under Python 2.6.X, Python 2.7.X and
>         Python
>         3.1+.  Neither an application nor a server that implements
>         this
>         specification can be easily written which will work under
>         Python 2
>         versions earlier than 2.6 nor Python 3 versions earlier than
>         3.1.
>         
>         .. note:: whatever Python 3 version fixed
>           http://bugs.python.org/issue4006 so os.environ['foo']
>         returns
>           surrogates (ala PEP 383) when the value of 'foo' cannot be
>         decoded
>           using the current locale instead of failing with a KeyError
>         is the
>           true minimum Python 3 version.  In particular, however,
>         Python 3.0
>           is not supported.
>         
>         Explicability and documentability are the main technical
>         drivers for
>         the decisions made within the standard.
>         
>         Differences from WSGI
>         =====================
>         
>         - Asynchronous applications and servers are supported more
>         poorly by
>          Web3 than by WSGI 1.0
>         
>         - All protocol-specific environment names are prefixed with
>         ``web3.``
>          rather than ``wsgi.``, eg. ``web3.input`` rather than
>          ``wsgi.input``.
>         
>         - All values present as environment dictionary *values* are
>         explicitly
>          *bytes* instances instead of native strings.
>         
>         - All values returned by an application must be bytes
>         instances,
>          including status code, header names and values, and the body.
>         
>         - Wherever WSGI 1.0 referred to an ``app_iter``, this
>         specification
>          refers to a ``body``.
>         
>         - No ``start_response()`` callback (and therefore no
>         ``write()``
>          callable nor ``exc_info`` data).
>         
>         - The ``readline()`` function of ``web3.input`` must support a
>         size
>          hint parameter.
>         
>         - No support for asynchronous applications that cannot yield a
>          meaningful status code and a set of headers before beginning
>         to
>          produce a body.
>         
>         - No requirement for middleware to yield an empty string if it
>         needs
>          more information from an application to produce output (e.g.
>         no
>          "Middleware Handling of Block Boundaries").
>         
>         - Filelike objects passed to a "file_wrapper" must have an
>          ``__iter__`` which returns bytes (never text).
>         
>         - "file_wrapper": don't read the entire file unless a
>          ``Content-Length`` header value has been set by the
>         application;
>          under that circumstance, the file wrapper should only
>          ``Content-Length`` bytes are read from the underlying
>         filelike
>          object.
>         
>         - ``QUERY_STRING``, ``SCRIPT_NAME``, ``PATH_INFO`` values
>         required to
>          be placed in environ by server (each as the empty bytes
>         instance if
>          no associated value is received in the HTTP request).
>         
>         - ``web3.path_info`` and ``web3.script_name`` must be put into
>         the
>          WSGI environment by the origin WSGI server.  When available,
>         each is
>          the original, plain 7-bit ASCII, URL-encoded variant of its
>         CGI
>          equivalent derived directly from the request URI (with %2F
>         segment
>          markers and other meta-characters intact).
>         
>         - This requirement was removed: "middleware components **must
>         not**
>          block iteration waiting for multiple values from an
>         application
>          iterable.  If the middleware needs to accumulate more data
>         from the
>          application before it can produce any output, it **must**
>         yield an
>          empty string."
>         
>         - ``SERVER_PORT`` must be a bytes instance (not an integer).
>         
>         Specification Overview
>         ======================
>         
>         The Web3 interface has two sides: the "server" or "gateway"
>         side, and
>         the "application" or "framework" side.  The server side
>         invokes a
>         callable object that is provided by the application side.  The
>         specifics of how that object is provided are up to the server
>         or
>         gateway.  It is assumed that some servers or gateways will
>         require an
>         application's deployer to write a short script to create an
>         instance
>         of the server or gateway, and supply it with the application
>         object.
>         Other servers and gateways may use configuration files or
>         other
>         mechanisms to specify where an application object should be
>         imported
>         from, or otherwise obtained.
>         
>         In addition to "pure" servers/gateways and
>         applications/frameworks,
>         it is also possible to create "middleware" components that
>         implement
>         both sides of this specification.  Such components act as an
>         application to their containing server, and as a server to a
>         contained application, and can be used to provide extended
>         APIs,
>         content transformation, navigation, and other useful
>         functions.
>         
>         Throughout this specification, we will use the term "a
>         callable" to
>         mean "a function, method, class, or an instance with a
>         ``__call__``
>         method".  It is up to the server, gateway, or application
>         implementing
>         the callable to choose the appropriate implementation
>         technique for
>         their needs.  Conversely, a server, gateway, or application
>         that is
>         invoking a callable **must not** have any dependency on what
>         kind of
>         callable was provided to it.  Callables are only to be called,
>         not
>         introspected upon.
>         
>         The Application/Framework Side
>         ------------------------------
>         
>         The application object is simply a callable object that
>         accepts one
>         argument.  The term "object" should not be misconstrued as
>         requiring
>         an actual object instance: a function, method, class, or
>         instance with
>         a ``__call__`` method are all acceptable for use as an
>         application
>         object.  Application objects must be able to be invoked more
>         than
>         once, as virtually all servers/gateways (other than CGI) will
>         make
>         such repeated requests.
>         
>         (Note: although we refer to it as an "application" object,
>         this should
>         not be construed to mean that application developers will use
>         Web3 as
>         a web programming API.  It is assumed that application
>         developers will
>         continue to use existing, high-level framework services to
>         develop
>         their applications.  Web3 is a tool for framework and server
>         developers, and is not intended to directly support
>         application
>         developers.)
>         
>         Here are two example application objects; one is a function,
>         and the
>         other is a class::
>         
>            def simple_app(environ):
>                """Simplest possible application object"""
>                status = b'200 OK'
>                headers = [(b'Content-type', b'text/plain')]
>                body = [b'Hello world!\n']
>                return status, headers, body
>         
>            class AppClass:
>                """Produce the same output, but using a class.
>         
>                (Note: 'AppClass' is the "application" here, so calling
>         it
>                returns an instance of 'AppClass', which is then the
>         return
>                value of the "application callable" as required by the
>         spec.
>         
>                If we wanted to use *instances* of 'AppClass' as
>         application
>                objects instead, we would have to implement a
>         '__call__'
>                method, which would be invoked to execute the
>         application,
>                and we would need to create an instance for use by the
>                server or gateway.
>                """
>                def __init__(self, environ):
>                    self.environ = environ
>         
>                def __iter__(self):
>                    status = b'200 OK'
>                    headers = [(b'Content-type', b'text/plain')]
>                    body = [b'Hello world!\n']
>                    yield status
>                    yield headers
>                    yield body
>         
>         The Server/Gateway Side
>         -----------------------
>         
>         The server or gateway invokes the application callable once
>         for each
>         request it receives from an HTTP client, that is directed at
>         the
>         application.  To illustrate, here is a simple CGI gateway,
>         implemented
>         as a function taking an application object.  Note that this
>         simple
>         example has limited error handling, because by default an
>         uncaught
>         exception will be dumped to ``sys.stderr`` and logged by the
>         web
>         server.
>         
>         ::
>         
>            import locale
>            import os
>            import sys
>         
>            encoding = locale.getpreferredencoding()
>         
>            stdout = sys.stdout
>         
>            if hasattr(sys.stdout, 'buffer'):
>                # Python 3 compatibility; we need to be able to push
>         bytes out
>                stdout = sys.stdout.buffer
>         
>            def get_environ():
>                d = {}
>                for k, v in os.environ.items():
>                    # Python 3 compatibility
>                    if not insinstance(v, bytes):
>                        # We must explicitly encode the string to bytes
>         under
>                        # Python 3.1+
>                        v = v.encode(encoding, 'surrogateescape')
>                    d[k] = v
>                return d
>         
>            def run_with_cgi(application):
>         
>                environ = get_environ()
>                environ['web3.input']        = sys.stdin
>                environ['web3.errors']       = sys.stderr
>                environ['web3.version']      = (1,0)
>                environ['web3.multithread']  = False
>                environ['web3.multiprocess'] = True
>                environ['web3.run_once']     = True
>         
>                if environ.get('HTTPS', b'off') in (b'on', b'1'):
>                    environ['web3.url_scheme'] = b'https'
>                else:
>                    environ['web3.url_scheme'] = b'http'
>         
>                status, headers, body = application(environ)
>         
>                CLRF = b'\r\n'
>         
>                try:
>                    stdout.write(b'Status: ' + status + CRLF)
>                    for header_name, header_val in headers:
>                        stdout.write(header_name + b': ' + header_val +
>         CRLF)
>                    stdout.write(CRLF)
>                    for chunk in body:
>                        stdout.write(chunk)
>                    stdout.flush()
>                finally:
>                    if hasattr(body, 'close'):
>                        body.close()
>         
>         Middleware: Components that Play Both Sides
>         -------------------------------------------
>         
>         Note that a single object may play the role of a server with
>         respect
>         to some application(s), while also acting as an application
>         with
>         respect to some server(s).  Such "middleware" components can
>         perform
>         such functions as:
>         
>         * Routing a request to different application objects based on
>         the
>          target URL, after rewriting the ``environ`` accordingly.
>         
>         * Allowing multiple applications or frameworks to run
>         side-by-side in
>          the same process
>         
>         * Load balancing and remote processing, by forwarding requests
>         and
>          responses over a network
>         
>         * Perform content postprocessing, such as applying XSL
>         stylesheets
>         
>         The presence of middleware in general is transparent to both
>         the
>         "server/gateway" and the "application/framework" sides of the
>         interface, and should require no special support.  A user who
>         desires
>         to incorporate middleware into an application simply provides
>         the
>         middleware component to the server, as if it were an
>         application, and
>         configures the middleware component to invoke the application,
>         as if
>         the middleware component were a server.  Of course, the
>         "application"
>         that the middleware wraps may in fact be another middleware
>         component
>         wrapping another application, and so on, creating what is
>         referred to
>         as a "middleware stack".
>         
>         For the most part, middleware must conform to the restrictions
>         and
>         requirements of both the server and application sides of
>         Web3.  In
>         some cases, however, requirements for middleware are more
>         stringent
>         than for a "pure" server or application, and these points will
>         be
>         noted in the specification.
>         
>         Here is a (tongue-in-cheek) example of a middleware component
>         that
>         converts ``text/plain`` responses to pig latin, using Joe
>         Strout's
>         ``piglatin.py``.  (Note: a "real" middleware component would
>         probably
>         use a more robust way of checking the content type, and should
>         also
>         check for a content encoding.  Also, this simple example
>         ignores the
>         possibility that a word might be split across a block
>         boundary.)
>         
>         ::
>         
>            from piglatin import piglatin
>         
>            class LatinIter:
>         
>                """Transform iterated output to piglatin."""
>         
>                def __init__(self, result):
>                    if hasattr(result,'close'):
>                        self.close = result.close
>                    self.result = result
>                    self._next = iter(result).next
>         
>                def __iter__(self):
>                    return self
>         
>                def next(self):
>                    text = str(self._next(), 'utf-8')
>                    return piglatin(text).encode('utf-8')
>         
>            class Latinator:
>         
>                def __init__(self, application):
>                    self.application = application
>         
>                def __call__(self, environ):
>                    status, headers, body = self.application(environ)
>                    for name, value in headers:
>                        if name.lower() == b'content-type' and value ==
>         b'text/plain':
>                            body = LatinIter(body)
>                            # Strip content-length if present, else
>         it'll be wrong
>                            headers = [(name, value) for name, value in
>         headers
>                                       if name.lower() !=
>         b'content-length']
>                            break
>         
>                    return status, headers, body
>         
>            # Run foo_app under a Latinator's control, using the
>         example CGI gateway
>            from foo_app import foo_app
>            run_with_cgi(Latinator(foo_app))
>         
>         Specification Details
>         =====================
>         
>         The application object must accept one positional argument.
>          For the
>         sake of illustration, we have named it ``environ``, but it is
>         not
>         required to have this name.  A server or gateway **must**
>         invoke the
>         application object using a positional (not keyword) argument.
>         (E.g. by calling ``status, headers, body =
>         application(environ)`` as
>         shown above.)
>         
>         The ``environ`` parameter is a dictionary object, containing
>         CGI-style
>         environment variables.  This object **must** be a builtin
>         Python
>         dictionary (*not* a subclass, ``UserDict`` or other dictionary
>         emulation), and the application is allowed to modify the
>         dictionary in
>         any way it desires.  The dictionary must also include certain
>         Web3-required variables (described in a later section), and
>         may also
>         include server-specific extension variables, named according
>         to a
>         convention that will be described below.
>         
>         When called by the server, the application object must return
>         an
>         iterable yielding three elements: ``status``, ``headers`` and
>         ``body``.
>         
>         The ``status`` element is a status in bytes of the form
>         ``b'999
>         Message here'``.
>         
>         ``headers`` is a Python list of ``(header_name,
>         header_value)`` pairs
>         describing the HTTP response header.  The ``headers``
>         structure must
>         be a literal Python list; it should yield two-tuples.  Both
>         ``header_name`` and ``header_value`` must be bytes values.
>         
>         The ``body`` is an iterable yielding zero or more bytes
>         instances.
>         This can be accomplished in a variety of ways, such as by
>         returning a
>         list containing bytes instances as ``body``, or by returning a
>         generator function as ``body`` that yields bytes instances, or
>         by the
>         ``body`` being a class whose instances are iterable.
>          Regardless of
>         how it is accomplished, the application object must always
>         return a
>         ``body`` iterable yielding zero or more bytes instances.
>         
>         The server or gateway must transmit the yielded bytes to the
>         client in
>         an unbuffered fashion, completing the transmission of each set
>         of
>         bytes before requesting another one.  (In other words,
>         applications
>         **should** perform their own buffering.  See the `Buffering
>         and
>         Streaming`_ section below for more on how application output
>         must be
>         handled.)
>         
>         The server or gateway should treat the yielded bytes as binary
>         byte
>         sequences: in particular, it should ensure that line endings
>         are not
>         altered.  The application is responsible for ensuring that the
>         string(s) to be written are in a format suitable for the
>         client.  (The
>         server or gateway **may** apply HTTP transfer encodings, or
>         perform
>         other transformations for the purpose of implementing HTTP
>         features
>         such as byte-range transmission.  See `Other HTTP Features`_,
>         below,
>         for more details.)
>         
>         If a call to ``len(body)`` succeeds, the server must be able
>         to rely
>         on the result being accurate.  That is, if the ``body``
>         iterable
>         returned by the application provides a working ``__len__()``
>         method,
>         it **must** return an accurate result.  (See the `Handling the
>         Content-Length Header`_ section for information on how this
>         would
>         normally be used.)
>         
>         If the ``body`` iterable returned by the application has a
>         ``close()``
>         method, the server or gateway **must** call that method upon
>         completion of the current request, whether the request was
>         completed
>         normally, or terminated early due to an error.  (This is to
>         support
>         resource release by the application.  This protocol is
>         intended to
>         complement PEP 325's generator support, and other common
>         iterables
>         with ``close()`` methods.
>         
>         Finally, servers and gateways **must not** directly use any
>         other
>         attributes of the ``body`` iterable returned by the
>         application,
>         unless it is an instance of a type specific to that server or
>         gateway,
>         such as a "file wrapper" returned by ``web3.file_wrapper``
>         (see
>         `Optional Platform-Specific File Handling`_).  In the general
>         case,
>         only attributes specified here, or accessed via e.g. the PEP
>         234
>         iteration APIs are acceptable.
>         
>         ``environ`` Variables
>         ---------------------
>         
>         The ``environ`` dictionary is required to contain various CGI
>         environment variables, as defined by the Common Gateway
>         Interface
>         specification [2]_.
>         
>         The following CGI variables **must** be present.  Each key is
>         a native
>         string.  Each value is a bytes instance.
>         
>         .. note:: In Python 3.1+, a "native string" is a ``str`` type
>         decoded
>           using the ``surrogateescape`` error handler, as done by
>           ``os.environ.__getitem__``.  In Python 2.6 and 2.7, a
>         "native
>           string" is a ``str`` types representing a set of bytes.
>         
>         ``REQUEST_METHOD``
>          The HTTP request method, such as ``"GET"`` or ``"POST"``.
>         
>         ``SCRIPT_NAME``
>          The initial portion of the request URL's "path" that
>         corresponds to
>          the application object, so that the application knows its
>         virtual
>          "location".  This may be the empty bytes instance if the
>         application
>          corresponds to the "root" of the server.  SCRIPT_NAME will be
>         a
>          bytes instance representing a sequence of URL-encoded
>         segments
>          separated by the slash character (``/``).
>         
>         ``PATH_INFO``
>          The remainder of the request URL's "path", designating the
>         virtual
>          "location" of the request's target within the application.
>          This
>          **may** be a bytes instance if the request URL targets the
>          application root and does not have a trailing slash.
>          PATH_INFO will
>          be a bytes instance representing a sequence of URL-encoded
>         segments
>          separated by the slash character (``/``).
>         
>         ``RAW_PATH_INFO``
>          The non-URL-decoded ``PATH_INFO`` value.
>         
>          Through a historical inequity, by virtue of the CGI
>         specification,
>          ``PATH_INFO`` is present within the environment as an already
>          URL-decoded string.    This is the original URL-encoded
>         value.
>         
>         ``QUERY_STRING``
>          The portion of the request URL (in bytes) that follows the
>         ``"?"``,
>          if any, or the empty bytes instance.
>         
>         ``SERVER_NAME``, ``SERVER_PORT``
>          When combined with ``SCRIPT_NAME`` and ``PATH_INFO`` (or
>         their raw
>          equivalents)`, these variables can be used to complete the
>         URL.
>          Note, however, that ``HTTP_HOST``, if present, should be used
>         in
>          preference to ``SERVER_NAME`` for reconstructing the request
>         URL.
>          See the `URL Reconstruction`_ section below for more detail.
>          ``SERVER_PORT`` should be a bytes instance, not an integer.
>         
>         ``SERVER_PROTOCOL``
>          The version of the protocol the client used to send the
>         request.
>          Typically this will be something like ``"HTTP/1.0"`` or
>         ``"HTTP/1.1"``
>          and may be used by the application to determine how to treat
>         any
>          HTTP request headers.  (This variable should probably be
>         called
>          ``REQUEST_PROTOCOL``, since it denotes the protocol used in
>         the
>          request, and is not necessarily the protocol that will be
>         used in the
>          server's response.  However, for compatibility with CGI we
>         have to
>          keep the existing name.)
>         
>         The following CGI values **may** present be in the Web3
>         environment.
>         Each key is a native string.  Each value is a bytes instances.
>         
>         ``CONTENT_TYPE``
>          The contents of any ``Content-Type`` fields in the HTTP
>         request.
>         
>         ``CONTENT_LENGTH``
>          The contents of any ``Content-Length`` fields in the HTTP
>         request.
>         
>         ``HTTP_`` Variables
>          Variables corresponding to the client-supplied HTTP request
>         headers
>          (i.e., variables whose names begin with ``"HTTP_"``).  The
>         presence or
>          absence of these variables should correspond with the
>         presence or
>          absence of the appropriate HTTP header in the request.
>         
>         A server or gateway **should** attempt to provide as many
>         other CGI
>         variables as are applicable, each with a string for its key
>         and a
>         bytes instance for its value.  In addition, if SSL is in use,
>         the
>         server or gateway **should** also provide as many of the
>         Apache SSL
>         environment variables [5]_ as are applicable, such as
>         ``HTTPS=on`` and
>         ``SSL_PROTOCOL``.  Note, however, that an application that
>         uses any
>         CGI variables other than the ones listed above are necessarily
>         non-portable to web servers that do not support the relevant
>         extensions.  (For example, web servers that do not publish
>         files will
>         not be able to provide a meaningful ``DOCUMENT_ROOT`` or
>         ``PATH_TRANSLATED``.)
>         
>         A Web3-compliant server or gateway **should** document what
>         variables
>         it provides, along with their definitions as appropriate.
>         Applications **should** check for the presence of any
>         variables they
>         require, and have a fallback plan in the event such a variable
>         is
>         absent.
>         
>         Note that CGI-defined variable values must be bytes instances,
>         if they
>         are present at all.  It is a violation of this specification
>         for a CGI
>         variable's value to be of any type other than ``bytes``.  On
>         Python 2,
>         this means they will be of type ``str``.  On Python 2, this
>         means they
>         will be of type ``bytes``.
>         
>         In addition to the CGI-defined variables, the ``environ``
>         dictionary
>         **may** also contain arbitrary operating-system "environment
>         variables", and **must** contain the following Web3-defined
>         variables.
>         
>         =====================
>          ===============================================
>         Variable               Value
>         =====================
>          ===============================================
>         ``web3.version``       The tuple ``(1,0)``, representing Web3
>                               version 1.0.
>         
>         ``web3.url_scheme``    A bytes value representing the "scheme"
>         portion of
>                               the URL at which the application is
>         being
>                               invoked.  Normally, this will have the
>         value
>                               ``b"http"`` or ``b"https"``, as
>         appropriate.
>         
>         ``web3.input``         An input stream (file-like object) from
>         which bytes
>                               constituting the HTTP request body can
>         be read.
>                               (The server or gateway may perform reads
>                               on-demand as requested by the
>         application, or
>                               it may pre- read the client's request
>         body and
>                               buffer it in-memory or on disk, or use
>         any
>                               other technique for providing such an
>         input
>                               stream, according to its preference.)
>         
>         ``web3.errors``        An output stream (file-like object) to
>         which error
>                               output text can be written, for the
>         purpose of
>                               recording program or other errors in a
>                               standardized and possibly centralized
>         location.
>                               This should be a "text mode" stream;
>         i.e.,
>                               applications should use ``"\n"`` as a
>         line
>                               ending, and assume that it will be
>         converted to
>                               the correct line ending by the
>         server/gateway.
>                               Applications may *not* send bytes to the
>                               'write' method of this stream; they may
>         only
>                               send text.
>         
>                               For many servers, ``web3.errors`` will
>         be the
>                               server's main error log. Alternatively,
>         this
>                               may be ``sys.stderr``, or a log file of
>         some
>                               sort.  The server's documentation should
>                               include an explanation of how to
>         configure this
>                               or where to find the recorded output.  A
>         server
>                               or gateway may supply different error
>         streams
>                               to different applications, if this is
>         desired.
>         
>         ``web3.multithread``   This value should evaluate true if the
>                               application object may be simultaneously
>                               invoked by another thread in the same
>         process,
>                               and should evaluate false otherwise.
>         
>         ``web3.multiprocess``  This value should evaluate true if an
>                               equivalent application object may be
>                               simultaneously invoked by another
>         process,
>                               and should evaluate false otherwise.
>         
>         ``web3.run_once``      This value should evaluate true if the
>         server
>                               or gateway expects (but does not
>         guarantee!)
>                               that the application will only be
>         invoked this
>                               one time during the life of its
>         containing
>                               process.  Normally, this will only be
>         true for
>                               a gateway based on CGI (or something
>         similar).
>         
>         ``web3.script_name``   The non-URL-decoded ``SCRIPT_NAME``
>         value.
>                               Through a historical inequity, by virtue
>         of the
>                               CGI specification, ``SCRIPT_NAME`` is
>         present
>                               within the environment as an already
>                               URL-decoded string.  This is the
>         original
>                               URL-encoded value derived from the
>         request URI.
>         
>         ``web3.path_info``     The non-URL-decoded ``PATH_INFO``
>         value.
>                               Through a historical inequity, by virtue
>         of the
>                               CGI specification, ``PATH_INFO`` is
>         present
>                               within the environment as an already
>                               URL-decoded string.  This is the
>         original
>                               URL-encoded value derived from the
>         request URI.
>         
>         =====================
>          ===============================================
>         
>         Finally, the ``environ`` dictionary may also contain
>         server-defined
>         variables.  These variables should have names which are
>         strings,
>         composed of only lower-case letters, numbers, dots, and
>         underscores,
>         and should be prefixed with a name that is unique to the
>         defining
>         server or gateway.  For example, ``mod_python`` might define
>         variables
>         with names like ``mod_python.some_variable``.
>         
>         Input Stream
>         ~~~~~~~~~~~~
>         
>         The input stream (``web3.input``) provided by the server must
>         support
>         the following methods:
>         
>         ===================   ========
>         Method                Notes
>         ===================   ========
>         ``read(size)``        1,4
>         ``readline([size])``  1,2,4
>         ``readlines([size])`` 1,3,4
>         ``__iter__()``        4
>         ===================   ========
>         
>         The semantics of each method are as documented in the Python
>         Library
>         Reference, except for these notes as listed in the table
>         above:
>         
>         1. The server is not required to read past the client's
>         specified
>           ``Content-Length``, and is allowed to simulate an
>         end-of-file
>           condition if the application attempts to read past that
>         point.
>           The application **should not** attempt to read more data
>         than is
>           specified by the ``CONTENT_LENGTH`` variable.
>         
>         2. The implementation must support the optional ``size``
>         argument to
>           ``readline()``.
>         
>         3. The application is free to not supply a ``size`` argument
>         to
>           ``readlines()``, and the server or gateway is free to ignore
>         the
>           value of any supplied ``size`` argument.
>         
>         4. The ``read``, ``readline`` and ``__iter__`` methods must
>         return a
>           bytes instance.  The ``readlines`` method must return a
>         sequence
>           which contains instances of bytes.
>         
>         The methods listed in the table above **must** be supported by
>         all
>         servers conforming to this specification.  Applications
>         conforming to
>         this specification **must not** use any other methods or
>         attributes of
>         the ``input`` object.  In particular, applications **must
>         not**
>         attempt to close this stream, even if it possesses a
>         ``close()``
>         method.
>         
>         Error Stream
>         ~~~~~~~~~~~~
>         
>         The error stream (``web3.errors``) provided by the server must
>         support
>         the following methods:
>         
>         ===================   ==========  ========
>         Method                Stream      Notes
>         ===================   ==========  ========
>         ``flush()``           ``errors``  1
>         ``write(str)``        ``errors``  2
>         ``writelines(seq)``   ``errors``  2
>         ===================   ==========  ========
>         
>         The semantics of each method are as documented in the Python
>         Library
>         Reference, except for these notes as listed in the table
>         above:
>         
>         1. Since the ``errors`` stream may not be rewound, servers and
>           gateways are free to forward write operations immediately,
>         without
>           buffering.  In this case, the ``flush()`` method may be a
>         no-op.
>           Portable applications, however, cannot assume that output is
>           unbuffered or that ``flush()`` is a no-op.  They must call
>           ``flush()`` if they need to ensure that output has in fact
>         been
>           written.  (For example, to minimize intermingling of data
>         from
>           multiple processes writing to the same error log.)
>         
>         2. The ``write()`` method must accept a string argument, but
>         needn't
>           necessarily accept a bytes argument.  The ``writelines()``
>         method
>           must accept a sequence argument that consists entirely of
>         strings,
>           but needn't necessarily accept any bytes instance as a
>         member of
>           the sequence.
>         
>         The methods listed in the table above **must** be supported by
>         all
>         servers conforming to this specification.  Applications
>         conforming to
>         this specification **must not** use any other methods or
>         attributes of
>         the ``errors`` object.  In particular, applications **must
>         not**
>         attempt to close this stream, even if it possesses a
>         ``close()``
>         method.
>         
>         Values Returned by A Web3 Application
>         -------------------------------------
>         
>         Web3 applications return an iterable in the form (``status``,
>         ``headers``, ``body``).  The return value can be any iterable
>         type
>         that returns exactly three values.
>         
>         The ``status`` value is assumed by a gateway or server to be
>         an HTTP
>         "status" bytes instance like ``b'200 OK'`` or ``b'404 Not
>         Found'``.
>         That is, it is a string consisting of a Status-Code and a
>         Reason-Phrase, in that order and separated by a single space,
>         with no
>         surrounding whitespace or other characters.  (See RFC 2616,
>         Section
>         6.1.1 for more information.)  The string **must not** contain
>         control
>         characters, and must not be terminated with a carriage return,
>         linefeed, or combination thereof.
>         
>         The ``headers`` value is assumed by a gateway or server to be
>         a
>         literal Python list of ``(header_name, header_value)``
>         tuples.  Each
>         ``header_name`` must be a bytes instance representing a valid
>         HTTP
>         header field-name (as defined by RFC 2616, Section 4.2),
>         without a
>         trailing colon or other punctuation.  Each ``header_value``
>         must be a
>         bytes instance and **must not** include any control
>         characters,
>         including carriage returns or linefeeds, either embedded or at
>         the
>         end.  (These requirements are to minimize the complexity of
>         any
>         parsing that must be performed by servers, gateways, and
>         intermediate
>         response processors that need to inspect or modify response
>         headers.)
>         
>         In general, the server or gateway is responsible for ensuring
>         that
>         correct headers are sent to the client: if the application
>         omits
>         a header required by HTTP (or other relevant specifications
>         that are in
>         effect), the server or gateway **must** add it.  For example,
>         the HTTP
>         ``Date:`` and ``Server:`` headers would normally be supplied
>         by the
>         server or gateway.
>         
>         (A reminder for server/gateway authors: HTTP header names are
>         case-insensitive, so be sure to take that into consideration
>         when
>         examining application-supplied headers!)
>         
>         Applications and middleware are forbidden from using HTTP/1.1
>         "hop-by-hop" features or headers, any equivalent features in
>         HTTP/1.0,
>         or any headers that would affect the persistence of the
>         client's
>         connection to the web server.  These features are the
>         exclusive
>         province of the actual web server, and a server or gateway
>         **should**
>         consider it a fatal error for an application to attempt
>         sending them,
>         and raise an error if they are supplied as return values from
>         an
>         application in the ``headers`` structure.  (For more specifics
>         on
>         "hop-by-hop" features and headers, please see the `Other HTTP
>         Features`_ section below.)
>         
>         Handling the ``Content-Length`` Header
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>         
>         If the application does not supply a ``Content-Length``
>         header, a
>         server or gateway may choose one of several approaches to
>         handling it.
>         The simplest of these is to close the client connection when
>         the
>         response is completed.  Under some circumstances, however, the
>         server
>         or gateway may be able to either generate a ``Content-Length``
>         header,
>         or at least avoid the need to close the client connection.
>         
>         If the application returns a ``body`` iterable whose ``len()``
>         is 1,
>         then the server can automatically determine ``Content-Length``
>         by
>         taking the length of the first string yielded by the iterable.
>         
>         If the server and client both support HTTP/1.1 "chunked
>         encoding"
>         [3]_, then the server **may** use chunked encoding to send a
>         chunk for
>         each string yielded by the ``body`` iterable, thus generating
>         a
>         ``Content-Length`` header for each chunk.  This allows the
>         server to
>         keep the client connection alive, if it wishes to do so.  Note
>         that
>         the server **must** comply fully with RFC 2616 when doing
>         this, or
>         else fall back to one of the other strategies for dealing with
>         the
>         absence of ``Content-Length``.
>         
>         (Note: applications and middleware **must not** apply any kind
>         of
>         ``Transfer-Encoding`` to their output, such as chunking or
>         gzipping;
>         as "hop-by-hop" operations, these encodings are the province
>         of the
>         actual web server/gateway.  See `Other HTTP Features`_ below,
>         for
>         more details.)
>         
>         Dealing with Compatibility Across Python Versions
>         -------------------------------------------------
>         
>         Creating Web3 code that runs under both Python 2.6/2.7 and
>         Python 3.1+
>         requires some care on the part of the developer.  In general,
>         the Web3
>         specification assumes a certain level of equivalence between
>         the
>         Python 2 ``str`` type and the Python 3 ``bytes`` type.  For
>         example,
>         under Python 2, the values present in the Web3 ``environ``
>         will be
>         instances of the ``str`` type; in Python 3, these will be
>         instances of
>         the ``bytes`` type.  The Python 3 ``bytes`` type does not
>         possess all
>         the methods of the Python 2 ``str`` type, and some methods
>         which it
>         does possess behave differently than the Python 2 ``str``
>         type.
>         Effectively, to ensure that Web3 middleware and applications
>         work
>         across Python versions, developers must do these things:
>         
>         #) Do not assume comparison equivalence between text values
>         and bytes
>           values.  If you do so, your code may work under Python 2,
>         but it
>           will not work properly under Python 3.  For example, don't
>         write
>           ``somebytes == 'abc'``.  This will sometimes be true on
>         Python 2
>           but it will never be true on Python 3, because a sequence of
>         bytes
>           never compares equal to a string under Python 3.  Instead,
>         always
>           compare a bytes value with a bytes value, e.g. "somebytes ==
>           b'abc'".  Code which does this is compatible with and works
>         the
>           same in Python 2.6, 2.7, and 3.1.  The ``b`` in front of
>         ``'abc'``
>           signals to Python 3 that the value is a literal bytes
>         instance;
>           under Python 2 it's a forward compatibility placebo.
>         
>         #) Don't use the ``__contains__`` method (directly or
>         indirectly) of
>           items that are meant to be byteslike without ensuring that
>         its
>           argument is also a bytes instance.  If you do so, your code
>         may
>           work under Python 2, but it will not work properly under
>         Python 3.
>           For example, ``'abc' in somebytes'`` will raise a
>         ``TypeError``
>           under Python 3, but it will return ``True`` under Python 2.6
>         and
>           2.7.  However, ``b'abc' in somebytes`` will work the same on
>         both
>           versions.
>         
>         #) Dont try to use the ``format`` method or the ``__mod__``
>         method of
>           instances of bytes (directly or indirectly).  In Python 2,
>         the
>           ``str`` type which we treat equivalently to Python 3's
>         ``bytes``
>           supports these method but actual Python 3's ``bytes``
>         instances
>           don't support these methods.  If you use these methods, your
>         code
>           will work under Python 2, but not under Python 3.
>         
>         #) Do not try to concatenate a bytes value with a string
>         value.  This
>           may work under Python 2, but it will not work under Python
>         3.  For
>           example, doing ``'abc' + somebytes`` will work under Python
>         2, but
>           it will result in a ``TypeError`` under Python 3.  Instead,
>         always
>           make sure you're concatenating two items of the same type,
>           e.g. ``b'abc' + somebytes``.
>         
>         Web3 expects byte values in other places, such as in all the
>         values
>         returned by an application.
>         
>         In short, to ensure compatibility of Web3 application code
>         between
>         Python 2 and Python 3, in Python 2, treat CGI and server
>         variable
>         values in the environment as if they had the Python 3
>         ``bytes`` API
>         even though they actually have a more capable API.  Likewise
>         for all
>         stringlike values returned by a Web3 application.
>         
>         Buffering and Streaming
>         -----------------------
>         
>         Generally speaking, applications will achieve the best
>         throughput by
>         buffering their (modestly-sized) output and sending it all at
>         once.
>         This is a common approach in existing frameworks: the output
>         is
>         buffered in a StringIO or similar object, then transmitted all
>         at
>         once, along with the response headers.
>         
>         The corresponding approach in Web3 is for the application to
>         simply
>         return a single-element ``body`` iterable (such as a list)
>         containing
>         the response body as a single string.  This is the recommended
>         approach for the vast majority of application functions, that
>         render
>         HTML pages whose text easily fits in memory.
>         
>         For large files, however, or for specialized uses of HTTP
>         streaming
>         (such as multipart "server push"), an application may need to
>         provide
>         output in smaller blocks (e.g. to avoid loading a large file
>         into
>         memory).  It's also sometimes the case that part of a response
>         may
>         be time-consuming to produce, but it would be useful to send
>         ahead the
>         portion of the response that precedes it.
>         
>         In these cases, applications will usually return a ``body``
>         iterator
>         (often a generator-iterator) that produces the output in a
>         block-by-block fashion.  These blocks may be broken to
>         coincide with
>         mulitpart boundaries (for "server push"), or just before
>         time-consuming tasks (such as reading another block of an
>         on-disk
>         file).
>         
>         Web3 servers, gateways, and middleware **must not** delay the
>         transmission of any block; they **must** either fully transmit
>         the block to the client, or guarantee that they will continue
>         transmission even while the application is producing its next
>         block.
>         A server/gateway or middleware may provide this guarantee in
>         one of
>         three ways:
>         
>         1. Send the entire block to the operating system (and request
>           that any O/S buffers be flushed) before returning control
>           to the application, OR
>         
>         2. Use a different thread to ensure that the block continues
>           to be transmitted while the application produces the next
>           block.
>         
>         3. (Middleware only) send the entire block to its parent
>           gateway/server
>         
>         By providing this guarantee, Web3 allows applications to
>         ensure
>         that transmission will not become stalled at an arbitrary
>         point
>         in their output data.  This is critical for proper functioning
>         of e.g. multipart "server push" streaming, where data between
>         multipart boundaries should be transmitted in full to the
>         client.
>         
>         Unicode Issues
>         --------------
>         
>         HTTP does not directly support Unicode, and neither does this
>         interface.  All encoding/decoding must be handled by the
>         **application**; all values passed to or from the server must
>         be of
>         the Python 3 type ``bytes`` or instances of the Python 2 type
>         ``str``,
>         not Python 2 ``unicode`` or Python 3 ``str`` objects.
>         
>         All "bytes instances" referred to in this specification
>         **must**:
>         
>         - On Python 2, be of type ``str``.
>         
>         - On Python 3, be of type ``bytes``.
>         
>         All "bytes instances" **must not** :
>         
>         - On Python 2,  be of type ``unicode``.
>         
>         - On Python 3, be of type ``str``.
>         
>         The result of using a textlike object where a byteslike object
>         is
>         required is undefined.
>         
>         Values returned from a Web3 app as a status or as response
>         headers
>         **must** follow RFC 2616 with respect to encoding.  That is,
>         the bytes
>         returned must contain a character stream of ISO-8859-1
>         characters, or
>         the character stream should use RFC 2047 MIME encoding.
>         
>         On Python platforms which do not have a native bytes-like type
>         (e.g. Jython, IronPython, etc.), but instead which generally
>         use
>         textlike strings to represent bytes data, the definition of
>         "bytes
>         instance" can be changed: their "bytes instances" must be
>         native
>         strings that contain only code points representable in
>         ISO-8859-1
>         encoding (``\u0000`` through ``\u00FF``, inclusive).  It is a
>         fatal
>         error for an application on such a platform to supply strings
>         containing any other Unicode character or code point.
>          Similarly,
>         servers and gateways on those platforms **must not** supply
>         strings to
>         an application containing any other Unicode characters.
>         
>         HTTP 1.1 Expect/Continue
>         ------------------------
>         
>         Servers and gateways that implement HTTP 1.1 **must** provide
>         transparent support for HTTP 1.1's "expect/continue"
>         mechanism.  This
>         may be done in any of several ways:
>         
>         1. Respond to requests containing an ``Expect: 100-continue``
>         request
>           with an immediate "100 Continue" response, and proceed
>         normally.
>         
>         2. Proceed with the request normally, but provide the
>         application
>           with a ``web3.input`` stream that will send the "100
>         Continue"
>           response if/when the application first attempts to read from
>         the
>           input stream.  The read request must then remain blocked
>         until the
>           client responds.
>         
>         3. Wait until the client decides that the server does not
>         support
>           expect/continue, and sends the request body on its own.
>          (This
>           is suboptimal, and is not recommended.)
>         
>         Note that these behavior restrictions do not apply for HTTP
>         1.0
>         requests, or for requests that are not directed to an
>         application
>         object.  For more information on HTTP 1.1 Expect/Continue, see
>         RFC
>         2616, sections 8.2.3 and 10.1.1.
>         
>         
>         Other HTTP Features
>         -------------------
>         
>         In general, servers and gateways should "play dumb" and allow
>         the
>         application complete control over its output.  They should
>         only make
>         changes that do not alter the effective semantics of the
>         application's
>         response.  It is always possible for the application developer
>         to add
>         middleware components to supply additional features, so
>         server/gateway
>         developers should be conservative in their implementation.  In
>         a sense,
>         a server should consider itself to be like an HTTP "gateway
>         server",
>         with the application being an HTTP "origin server".  (See RFC
>         2616,
>         section 1.3, for the definition of these terms.)
>         
>         However, because Web3 servers and applications do not
>         communicate via
>         HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to
>         Web3
>         internal communications.  Web3 applications **must not**
>         generate any
>         "hop-by-hop" headers [4]_, attempt to use HTTP features that
>         would
>         require them to generate such headers, or rely on the content
>         of
>         any incoming "hop-by-hop" headers in the ``environ``
>         dictionary.
>         Web3 servers **must** handle any supported inbound
>         "hop-by-hop" headers
>         on their own, such as by decoding any inbound
>         ``Transfer-Encoding``,
>         including chunked encoding if applicable.
>         
>         Applying these principles to a variety of HTTP features, it
>         should be
>         clear that a server **may** handle cache validation via the
>         ``If-None-Match`` and ``If-Modified-Since`` request headers
>         and the
>         ``Last-Modified`` and ``ETag`` response headers.  However, it
>         is
>         not required to do this, and the application **should**
>         perform its
>         own cache validation if it wants to support that feature,
>         since
>         the server/gateway is not required to do such validation.
>         
>         Similarly, a server **may** re-encode or transport-encode an
>         application's response, but the application **should** use a
>         suitable content encoding on its own, and **must not** apply a
>         transport encoding.  A server **may** transmit byte ranges of
>         the
>         application's response if requested by the client, and the
>         application doesn't natively support byte ranges.  Again,
>         however,
>         the application **should** perform this function on its own if
>         desired.
>         
>         Note that these restrictions on applications do not
>         necessarily mean
>         that every application must reimplement every HTTP feature;
>         many HTTP
>         features can be partially or fully implemented by middleware
>         components, thus freeing both server and application authors
>         from
>         implementing the same features over and over again.
>         
>         Thread Support
>         --------------
>         
>         Thread support, or lack thereof, is also server-dependent.
>         Servers that can run multiple requests in parallel, **should**
>         also
>         provide the option of running an application in a
>         single-threaded
>         fashion, so that applications or frameworks that are not
>         thread-safe
>         may still be used with that server.
>         
>         Implementation/Application Notes
>         ================================
>         
>         Server Extension APIs
>         ---------------------
>         
>         Some server authors may wish to expose more advanced APIs,
>         that
>         application or framework authors can use for specialized
>         purposes.
>         For example, a gateway based on ``mod_python`` might wish to
>         expose
>         part of the Apache API as a Web3 extension.
>         
>         In the simplest case, this requires nothing more than defining
>         an
>         ``environ`` variable, such as ``mod_python.some_api``.  But,
>         in many
>         cases, the possible presence of middleware can make this
>         difficult.
>         For example, an API that offers access to the same HTTP
>         headers that
>         are found in ``environ`` variables, might return different
>         data if
>         ``environ`` has been modified by middleware.
>         
>         In general, any extension API that duplicates, supplants, or
>         bypasses
>         some portion of Web3 functionality runs the risk of being
>         incompatible
>         with middleware components.  Server/gateway developers should
>         *not*
>         assume that nobody will use middleware, because some framework
>         developers specifically organize their frameworks to function
>         almost
>         entirely as middleware of various kinds.
>         
>         So, to provide maximum compatibility, servers and gateways
>         that
>         provide extension APIs that replace some Web3 functionality,
>         **must**
>         design those APIs so that they are invoked using the portion
>         of the
>         API that they replace.  For example, an extension API to
>         access HTTP
>         request headers must require the application to pass in its
>         current
>         ``environ``, so that the server/gateway may verify that HTTP
>         headers
>         accessible via the API have not been altered by middleware.
>          If the
>         extension API cannot guarantee that it will always agree with
>         ``environ`` about the contents of HTTP headers, it must refuse
>         service
>         to the application, e.g. by raising an error, returning
>         ``None``
>         instead of a header collection, or whatever is appropriate to
>         the API.
>         
>         These guidelines also apply to middleware that adds
>         information such
>         as parsed cookies, form variables, sessions, and the like to
>         ``environ``.  Specifically, such middleware should provide
>         these
>         features as functions which operate on ``environ``, rather
>         than simply
>         stuffing values into ``environ``.  This helps ensure that
>         information
>         is calculated from ``environ`` *after* any middleware has done
>         any URL
>         rewrites or other ``environ`` modifications.
>         
>         It is very important that these "safe extension" rules be
>         followed by
>         both server/gateway and middleware developers, in order to
>         avoid a
>         future in which middleware developers are forced to delete any
>         and all
>         extension APIs from ``environ`` to ensure that their mediation
>         isn't
>         being bypassed by applications using those extensions!
>         
>         Application Configuration
>         -------------------------
>         
>         This specification does not define how a server selects or
>         obtains an
>         application to invoke.  These and other configuration options
>         are
>         highly server-specific matters.  It is expected that
>         server/gateway
>         authors will document how to configure the server to execute a
>         particular application object, and with what options (such as
>         threading options).
>         
>         Framework authors, on the other hand, should document how to
>         create an
>         application object that wraps their framework's
>         functionality.  The
>         user, who has chosen both the server and the application
>         framework,
>         must connect the two together.  However, since both the
>         framework and
>         the server have a common interface, this should be merely a
>         mechanical
>         matter, rather than a significant engineering effort for each
>         new
>         server/framework pair.
>         
>         Finally, some applications, frameworks, and middleware may
>         wish to use
>         the ``environ`` dictionary to receive simple string
>         configuration
>         options.  Servers and gateways **should** support this by
>         allowing an
>         application's deployer to specify name-value pairs to be
>         placed in
>         ``environ``.  In the simplest case, this support can consist
>         merely of
>         copying all operating system-supplied environment variables
>         from
>         ``os.environ`` into the ``environ`` dictionary, since the
>         deployer in
>         principle can configure these externally to the server, or in
>         the CGI
>         case they may be able to be set via the server's configuration
>         files.
>         
>         Applications **should** try to keep such required variables to
>         a
>         minimum, since not all servers will support easy configuration
>         of
>         them.  Of course, even in the worst case, persons deploying an
>         application can create a script to supply the necessary
>         configuration
>         values::
>         
>           from the_app import application
>         
>           def new_app(environ):
>               environ['the_app.configval1'] = 'something'
>               return application(environ)
>         
>         But, most existing applications and frameworks will probably
>         only need
>         a single configuration value from ``environ``, to indicate the
>         location
>         of their application or framework-specific configuration
>         file(s).  (Of
>         course, applications should cache such configuration, to avoid
>         having
>         to re-read it upon each invocation.)
>         
>         URL Reconstruction
>         ------------------
>         
>         If an application wishes to reconstruct a request's complete
>         URL (as a
>         bytes object), it may do so using the following algorithm:
>         
>            host = environ.get('HTTP_HOST')
>         
>            scheme = environ['web3.url_scheme']
>            port = environ['SERVER_PORT']
>            query = environ['QUERY_STRING']
>         
>            url = scheme + b'://'
>         
>            if host:
>                url += host
>            else:
>                url += environ['SERVER_NAME']
>         
>                if scheme == b'https':
>                    if port != b'443':
>                       url += ':' + port
>                else:
>                    if port != b'80':
>                       url += ':' + port
>         
>            url += environ['web3.script_name']
>            url += environ['web3.path_info']
>            if query:
>                url += '?' + query
>         
>         Note that such a reconstructed URL may not be precisely the
>         same URI
>         as requested by the client.  Server rewrite rules, for
>         example, may
>         have modified the client's originally requested URL to place
>         it in a
>         canonical form.
>         
>         Optional Platform-Specific File Handling
>         ----------------------------------------
>         
>         Some operating environments provide special high-performance
>         file-
>         transmission facilities, such as the Unix ``sendfile()`` call.
>         Servers and gateways **may** expose this functionality via an
>         optional
>         ``web3.file_wrapper`` key in the ``environ``.  An application
>         **may**
>         use this "file wrapper" to convert a file or file-like object
>         into the
>         ``body`` iterable that it then returns, e.g.::
>         
>            if 'web3.file_wrapper' in environ:
>                body = environ['web3.file_wrapper'](filelike,
>         block_size)
>            else:
>                body = iter(lambda: filelike.read(block_size), '')
>         
>         If the server or gateway supplies ``web3.file_wrapper``, it
>         must be a
>         callable that accepts one required positional parameter, and
>         one
>         optional positional parameter.  The first parameter is the
>         file-like
>         object to be sent, and the second parameter is an optional
>         block size
>         "suggestion" (which the server/gateway need not use).  The
>         callable
>         **must** return an iterable object, and **must not** perform
>         any data
>         transmission until and unless the server/gateway actually
>         receives the
>         iterable as a return value from the application.  (To do
>         otherwise
>         would prevent middleware from being able to interpret or
>         override the
>         response data.)
>         
>         To be considered "file-like", the object supplied by the
>         application
>         must have a ``read()`` method that takes an optional size
>         argument.
>         The ``read()`` method of the object must return *bytes*, never
>         *text*.
>         It **may** have a ``close()`` method, and if so, the iterable
>         returned
>         by ``web3.file_wrapper`` **must** have a ``close()`` method
>         that
>         invokes the original file-like object's ``close()`` method.
>          If the
>         "file-like" object has any other methods or attributes with
>         names
>         matching those of Python built-in file objects (e.g.
>         ``fileno()``),
>         the ``web3.file_wrapper`` **may** assume that these methods or
>         attributes have the same semantics as those of a built-in file
>         object.
>         
>         The actual implementation of any platform-specific file
>         handling
>         must occur **after** the application returns, and the server
>         or
>         gateway checks to see if a wrapper object was returned.
>          (Again,
>         because of the presence of middleware, error handlers, and the
>         like,
>         it is not guaranteed that any wrapper created will actually be
>         used.)
>         
>         Apart from the handling of ``close()``, the semantics of
>         returning a
>         file wrapper from the application should be the same as if the
>         application had returned ``iter(filelike.read, '')``.  In
>         other words,
>         transmission should begin at the current position within the
>         "file" at
>         the time that transmission begins, and continue until the end
>         is
>         reached unless a ``Content-Length`` header value has been set
>         by the
>         application; under that circumstance, only ``Content-Length``
>         bytes
>         are read from the "file".
>         
>         Of course, platform-specific file transmission APIs don't
>         usually
>         accept arbitrary "file-like" objects.  Therefore, a
>         ``web3.file_wrapper`` has to introspect the supplied object
>         for things
>         such as a ``fileno()`` (Unix-like OSes) or a
>         ``java.nio.FileChannel``
>         (under Jython) in order to determine if the file-like object
>         is
>         suitable for use with the platform-specific API it supports.
>         
>         Note that even if the object is *not* suitable for the
>         platform API,
>         and the ``web3.file_wrapper`` **must** still return an
>         iterable.  The
>         iterable must wrap the underlying filelike object's
>         ``close()``
>         method.  The iterable **may** be the underlying file object
>         itself but
>         also may need to be a wrapper if the underlying filelike
>         object is not
>         iterable.  Here's a simple platform-agnostic file wrapper
>         class:
>         
>            class FileWrapper(object):
>                def __init__(self, filelike, blksize=8192):
>                    self.filelike = filelike
>                    self.blksize = blksize
>                    if hasattr(filelike, 'close'):
>                        self.close = filelike.close
>         
>                def __iter__(self):
>                    try:
>                       return iter(self.filelike)
>                    except TypeError: # underlying filelike object not
>         iterable
>                       return self
>         
>                def next(self):
>                    data = self.filelike.read(self.blksize)
>                    if data:
>                        return data
>                    raise StopIteration
>         
>         and here is a snippet from a server/gateway that uses it to
>         provide
>         access to a platform-specific API::
>         
>            environ['web3.file_wrapper'] = FileWrapper
>            result = application(environ)
>         
>            try:
>                if isinstance(result, FileWrapper):
>                    # check if result.filelike is usable
>         w/platform-specific
>                    # API, and if so, use that API to transmit the
>         result.
>                    # If not, fall through to normal iterable handling
>                    # loop below.
>         
>                for data in result:
>                    # etc.
>         
>            finally:
>                if hasattr(result,'close'):
>                    result.close()
>         
>         Points of Contention
>         ====================
>         
>         Outlined below are potential points of contention regarding
>         this
>         specification.
>         
>         WSGI 1.0 Compatibility
>         ----------------------
>         
>         Components written using the WSGI 1.0 specification will not
>         transparently interoperate with components written using this
>         specification.  That's because the goals of this proposal and
>         the
>         goals of WSGI 1.0 are not directly aligned.
>         
>         WSGI 1.0 is obliged to provide specification-level backwards
>         compatibility with versions of Python between 2.2 and 2.7.
>          This
>         specification, however, ditches Python 2.5 and lower
>         compatibility in
>         order to provide compatibility between relatively recent
>         versions of
>         Python 2 (2.6 and 2.7) as well as relatively recent versions
>         of Python
>         3 (3.1).
>         
>         It is currently impossible to write components which work
>         reliably
>         under both Python 2 and Python 3 using the WSGI 1.0
>         specification,
>         because the specification implicitly posits that CGI and
>         server
>         variable values in the environ and values returned via
>         ``start_response`` represent a sequence of bytes that can be
>         addressed
>         using the Python 2 string API.  It posits such a thing because
>         that
>         sort of data type was the sensible way to represent bytes in
>         all
>         Python 2 versions, and WSGI 1.0 was conceived before Python 3
>         existed.
>         
>         Python 3's ``str`` type supports the full API provided by the
>         Python 2
>         ``str`` type, but since Python 3's ``str`` type does not
>         represent a
>         sequence of bytes, and instead represents text.  Therefore,
>         using it
>         to represent environ values also requires that the environ
>         byte
>         sequence be decoded to text via some encoding.  We cannot
>         decode these
>         bytes to text (at least in any way where the decoding has any
>         meaning
>         other than as a tunnelling mechanism) without widening the
>         scope of
>         WSGI to include server and gateway knowledge of decoding
>         policies and
>         mechanics.  WSGI 1.0 never concerned itself with encoding and
>         decoding.  It made statements about allowable transport
>         values, and
>         suggested that various values might be best decoded as one
>         encoding or
>         another, but it never required a server to *perform* any
>         decoding
>         before
>         
>         Python 3 does not have a stringlike type that can be used
>         instead to
>         represent bytes: it has a ``bytes`` type.  A bytes type
>         operates quite
>         a bit like a Python 2 ``str`` in Python 3.1+, but it lacks
>         behavior
>         equivalent to ``str.__mod__`` and its iteration protocol, and
>         containment and equivalence comparisons are different.
>         
>         In either case, there is no type in Python 3 that behaves just
>         like
>         the Python 2 ``str`` type, and a way to create such a type
>         doesn't
>         exist because there is no such thing as a "String ABC" which
>         would
>         allow a suitable type to be built.  Due to this design
>         incompatibility, existing WSGI 1.0 servers, middleware, and
>         applications will not work under Python 3, even after they are
>         run
>         through ``2to3``.
>         
>         Existing Web-SIG discussions about updating the WSGI
>         specification so
>         that it is possible to write a WSGI application that runs in
>         both
>         Python 2 and Python 3 tend to revolve around creating a
>         specification-level equivalence between the Python 2 ``str``
>         type
>         (which represents a sequence of bytes) and the Python 3
>         ``str`` type
>         (which represents text).  Such an equivalence becomes strained
>         in
>         various areas, given the different roles of these types.  An
>         arguably
>         more straightforward equivalence exists between the Python 3
>         ``bytes``
>         type API and a subset of the Python 2 ``str`` type API.  This
>         specification exploits this subset equivalence.
>         
>         In the meantime, aside from any Python 2 vs. Python 3
>         compatibility
>         issue, as various discussions on Web-SIG have pointed out, the
>         WSGI
>         1.0 specification is too general, providing support for
>         asynchronous
>         applications at the expense of implementation complexity.
>          This
>         specification uses the fundamental incompatibility between
>         WSGI 1.0
>         and Python 3 as a natural divergence point to create a
>         specification
>         with reduced complexity by removing specialized support for
>         asynchronous applications.
>         
>         To provide backwards compatibility for older WSGI 1.0
>         applications, so
>         that they may run on a Web3 stack, it is presumed that Web3
>         middleware
>         will be created which can be used "in front" of existing WSGI
>         1.0
>         applications, allowing those existing WSGI 1.0 applications to
>         run
>         under a Web3 stack.  This middleware will require, when under
>         Python
>         3, an equivalence to be drawn between Python 3 ``str`` types
>         and the
>         bytes values represented by the HTTP request and all the
>         attendant
>         encoding-guessing (or configuration) it implies.
>         
>         .. note:: Such middleware *might* in the future, instead of
>         drawing an
>           equivalnce between Python 3 ``str`` and HTTP byte values,
>         make use
>           of a yet-to-be-created "ebytes" type (aka
>         "bytes-with-benefits"),
>           particularly if a String ABC proposal is accepted into the
>         Python
>           core and implemented.
>         
>         Conversely, it is presumed that WSGI 1.0 middleware will be
>         created
>         which will allow a Web3 application to run behind a WSGI 1.0
>         stack on
>         the Python 2 platform.
>         
>         Environ and Response Values as Bytes
>         ------------------------------------
>         
>         Casual middleware and application writers may consider the use
>         of
>         bytes as environment values and response values inconvenient.
>          In
>         particular, they won't be able to use common string formatting
>         functions such as ``('%s' % bytes_val)`` or
>         ``bytes_val.format('123')`` because bytes don't have the same
>         API as
>         strings on platforms such as Python 3 where the two types
>         differ.
>         Likewise, on such platforms, stdlib HTTP-related API support
>         for using
>         bytes interchangeably with text can be spotty.  In places
>         where bytes
>         are inconvenient or incompatible with library APIs, middleware
>         and
>         application writers will have to decode such bytes to text
>         explicitly.
>         This is particularly inconvenient for middleware writers: to
>         work with
>         environment values as strings, they'll have to decode them
>         from an
>         implied encoding and if they need to mutate an environ value,
>         they'll
>         then need to encode the value into a byte stream before
>         placing it
>         into the environ.  While the use of bytes by the specification
>         as
>         environ values might be inconvenient for casual developers, it
>         provides several benefits.
>         
>         Using bytes types to represent HTTP and server values to an
>         application most closely matches reality because HTTP is
>         fundamentally
>         a bytes-oriented protocol.  If the environ values are mandated
>         to be
>         strings, each server will need to use heuristics to guess
>         about the
>         encoding of various values provided by the HTTP environment.
>          Using
>         all strings might increase casual middleware writer
>         convenience, but
>         will also lead to ambiguity and confusion when a value cannot
>         be
>         decoded to a meaningful non-surrogate string.
>         
>         Use of bytes as environ values avoids any potential for the
>         need for
>         the specification to mandate that a participating server be
>         informed
>         of encoding configuration parameters.  If environ values are
>         treated
>         as strings, and so must be decoded from bytes, configuration
>         parameters may eventually become necessary as policy clues
>         from the
>         application deployer.  Such a policy would be used to guess an
>         appropriate decoding strategy in various circumstances,
>         effectively
>         placing the burden for enforcing a particular application
>         encoding
>         policy upon the server.  If the server must serve more than
>         one
>         application, such configuration would quickly become complex.
>          Many
>         policies would also be impossible to express declaratively.
>         
>         In reality, HTTP is a complicated and legacy-fraught protocol
>         that, to
>         make sense of, requires a complex set of heuristics.  It would
>         be nice
>         if we could allow this protocol to protect us from this
>         complexity,
>         but we cannot do so reliably while still providing to
>         application
>         writers a level of control commensurate with reality.  Python
>         applications must often deal with data embedded in the
>         environment
>         which not only must be parsed by legacy heuristics, but *does
>         not
>         conform even to any existing HTTP specification*.  While these
>         eventualities are unpleasant, they crop up with regularity,
>         making it
>         impossible and undesirable to hide them from application
>         developers,
>         as application developers are the only people who are able to
>         decide
>         upon an appropriate action when an HTTP specification
>         violation is
>         detected.
>         
>         Some have argued for mixed use of bytes and string values as
>         environ
>         values.  This proposal avoids that strategy.  Sole use of
>         bytes as
>         environ values makes it possible to fit this specification
>         entirely in
>         one's head; you won't need to guess about which values are
>         strings and
>         which are bytes.
>         
>         This protocol would also fit in a developer's head if all
>         environ
>         values were strings, but this specification doesn't use that
>         strategy.
>         This will likely be the point of greatest contention regarding
>         the use
>         of bytes.  In defense of bytes: developers often prefer
>         protocols with
>         consistent contracts, even if the contracts themselves are
>         suboptimal.
>         If we hide encoding issues from a developer until a value that
>         contains surrogates causes problems after it has already
>         reached
>         beyond the I/O boundary of their application, they will need
>         to do a
>         lot more work to fix assumptions made by their application
>         than if we
>         were to just present the problem much earlier in terms of
>         "here's some
>         bytes, you decode them".  This is also a counter-argument to
>         the
>         "bytes are inconvenient" assumption: while presenting bytes to
>         an
>         application developer may be inconvenient for a casual
>         application
>         developer who doesn't care about edge cases, they are
>         extremely
>         convenient for the application developer who needs to deal
>         with
>         complex, dirty eventualities, because use of bytes allows him
>         the
>         appropriate level of control with a clear separation of
>         responsibility.
>         
>         If the protocol uses bytes, it is presumed that libraries will
>         be
>         created to make working with bytes-only in the environ and
>         within
>         return values more pleasant; for example, analogues of the
>         WSGI 1.0
>         libraries named "WebOb" and "Werkzeug".  Such libraries will
>         fill the
>         gap between convenience and control, allowing the spec to
>         remain
>         simple and regular while still allowing casual authors a
>         convenient
>         way to create Web3 middleware and application components.
>          This seems
>         to be a reasonable alternative to baking encoding policy into
>         the
>         protocol, because many such libraries can be created
>         independently
>         from the protocol, and application developers can choose the
>         one that
>         provides them the appropriate levels of control and
>         convenience for a
>         particular job.
>         
>         Here are some alternatives to using all bytes:
>         
>         - Have the server decode all values representing CGI and
>         server
>          environ values into strings using the ``latin-1`` encoding,
>         which is
>          lossless.  Smuggle any undecodable bytes within the resulting
>          string.
>         
>         - Encode all CGI and server environ values to strings using
>         the
>          ``utf-8`` encoding with the ``surrogateescape`` error
>         handler.  This
>          does not work under any existing Python 2.
>         
>         - Encode some values into bytes and other values into strings,
>         as
>          decided by their typical usages.
>         
>         Applications Should be Allowed to Read ``web3.input`` Past
>         ``CONTENT_LENGTH``
>         -----------------------------------------------------------------------------
>         
>         At
>         http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html,
>         Graham Dumpleton makes the assertion that ``wsgi.input``
>         should be
>         required to return the empty string as a signifier of
>         out-of-data, and
>         that applications should be allowed to read past the number of
>         bytes
>         specified in ``CONTENT_LENGTH``, depending only upon the empty
>         string
>         as an EOF marker.  WSGI relies on an application "being well
>         behaved
>         and once all data specified by ``CONTENT_LENGTH`` is read,
>         that it
>         processes the data and returns any response. That same socket
>         connection could then be used for a subsequent request."
>          Graham would
>         like WSGI adapters to be required to wrap raw socket
>         connections:
>         "this wrapper object will need to count how much data has been
>         read,
>         and when the amount of data reaches that as defined by
>         ``CONTENT_LENGTH``, any subsequent reads should return an
>         empty string
>         instead."  This may be useful to support chunked encoding and
>         input
>         filters.
>         
>         ``web3.input`` Unknown Length
>         ------------------------------
>         
>         There's no documented way to indicate that there is content in
>         ``environ['web3.input']``, but the content length is unknown.
>         
>         ``read()`` of ``web3.input`` Should Support No-Size Calling
>         Convention
>         ----------------------------------------------------------------------
>         
>         At
>         http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html,
>         Graham Dumpleton makes the assertion that the ``read()``
>         method of
>         ``wsgi.input`` should be callable without arguments, and that
>         the
>         result should be "all available request content".  Needs
>         discussion.
>         
>         Input Filters should set environ ``CONTENT_LENGTH`` to -1
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>         
>         At
>         http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html,
>         Graham Dumpleton suggests that an input filter might set
>         ``environ['CONTENT_LENGTH']`` to -1 to indicate that it
>         mutated the
>         input.
>         
>         ``headers`` as Literal List of Two-Tuples
>         -----------------------------------------
>         
>         Why do we make applications return a ``headers`` structure
>         that is a
>         literal list of two-tuples?  I think the iterability of
>         ``headers``
>         needs to be maintained while it moves up the stack, but I
>         don't think
>         we need to be able to mutate it in place at all times.  Could
>         we
>         loosen that requirement?
>         
>         Removed Requirement that Middleware Not Block
>         ---------------------------------------------
>         
>         This requirement was removed: "middleware components **must
>         not**
>         block iteration waiting for multiple values from an
>         application
>         iterable.  If the middleware needs to accumulate more data
>         from the
>         application before it can produce any output, it **must**
>         yield an
>         empty string."  This requirement existed to support
>         asynchronous
>         applications and servers (see PEP 333's "Middleware Handling
>         of Block
>         Boundaries").  We might reintroduce this requirement if we
>         want to
>         support asynchronous applications and servers minimally.
>         
>         ``web3.script_name`` and ``web3.path_info``
>         -------------------------------------------
>         
>         These values are required to be placed into the environment by
>         origin
>         server under this specification.  Unlike ``SCRIPT_NAME`` and
>         ``PATH_INFO``, these must be the original *URL-encoded*
>         variants
>         derived from the request URI.  We probably need to figure out
>         how
>         these should be computed originally, and what their values
>         should be
>         if the server performs URL rewriting.
>         
>         Long Response Headers
>         ---------------------
>         
>         Bob Brewer notes in
>         http://mail.python.org/pipermail/web-sig/2006-September/002244.html:
>         
>         "Each header_value must not include any control characters,
>         including
>         carriage returns or linefeeds, either embedded or at the end.
>         (These
>         requirements are to minimize the complexity of any parsing
>         that must
>         be performed by servers, gateways, and intermediate response
>         processors that need to inspect or modify response
>         headers.)" [1]
>         
>         That's understandable, but HTTP headers are defined as
>         (mostly) *TEXT,
>         and "words of *TEXT MAY contain characters from character sets
>         other
>         than ISO-8859-1 only when encoded according to the rules of
>         RFC 2047."
>         [2] And RFC 2047 specifies that "an 'encoded-word' may not be
>         more
>         than 75 characters long...If it is desirable to encode more
>         text than
>         will fit in an 'encoded-word' of 75 characters, multiple
>         'encoded-word's (separated by CRLF SPACE) may be used." [3]
>         This
>         satisfies HTTP header folding rules, as well: "Header fields
>         can be
>         extended over multiple lines by preceding each extra line with
>         at
>         least one SP or HT." [1, again]
>         
>         So in my reading of HTTP, some code somewhere should introduce
>         newlines in longish, encoded response header values. I see
>         three
>         options:
>         
>          1. Keep things as they are and disallow response header
>         values if
>            they contain words over 75 chars that are outside the
>         ISO-8859-1
>            character set
>         
>          2. Allow newline characters in WSGI response headers
>         
>          3. Require/strongly suggest WSGI servers to do the encoding
>         and
>            folding before sending the value over HTTP.
>         
>         Request Trailers and Chunked Transfer Encoding
>         ----------------------------------------------
>         
>         When using chunked transfer encoding on request content, the
>         RFCs
>         allow there to be request trailers. These are like request
>         headers but
>         come after the final null data chunk. These trailers are only
>         available when the chunked data stream is finite length and
>         when it
>         has all been read in.  Neither WSGI nor Web3 currently
>         supports them.
>         
>         References
>         ==========
>         
>         .. [1] PEP 333: Python Web Services Gateway Interface
>           (http://www.python.org/dev/peps/pep-0333/)
>         
>         .. [2] The Common Gateway Interface Specification, v 1.1, 3rd
>         Draft
>           (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
>         
>         .. [3] "Chunked Transfer Coding" -- HTTP/1.1, section 3.6.1
>         
>         (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)
>         
>         .. [4] "End-to-end and Hop-by-hop Headers" -- HTTP/1.1,
>         Section 13.5.1
>         
>         (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1)
>         
>         .. [5] mod_ssl Reference, "Environment Variables"
>           (http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25)
>         
>         
>         
>         _______________________________________________
>         Web-SIG mailing list
>         Web-SIG at python.org
>         Web SIG: http://www.python.org/sigs/web-sig
>         Unsubscribe:
>         http://mail.python.org/mailman/options/web-sig/ianb%
>         40colorstudy.com
> 
> 
> 
> -- 
> Ian Bicking  |  http://blog.ianbicking.org




More information about the Web-SIG mailing list