[Web-SIG] "Web3" Spec (aka WSGI2)
Chris McDonough
chrism at plope.com
Wed Oct 13 22:25:30 CEST 2010
Yeah, not sure why it didn't show up when it was sent... it's safe to
ignore.
- C
On Wed, 2010-10-13 at 15:17 -0500, Ian Bicking wrote:
> Huh, this just came through, but has an old date on it. I'm assuming
> it was just stuck in some queue?
>
>
> On Tue, Jul 20, 2010 at 12:43 AM, Chris McDonough <chrism at plope.com>
> wrote:
> Below is the first draft of a specification for a WSGI-like
> protocol
> I've tentatively named "Web3". If it's formatted poorly in
> this email
> for you, the in-progress version of spec is also available at
> http://svn.repoze.org/playground/chris/web3.txt
>
> Web3 is a backwards-incompatible variant of WSGI which:
>
> - Is compatible with Python 2.6, 2.7 and 3.1.
>
> - Uses bytes to represent all environment values and
> application body,
> staus, and header values.
>
> - Breaks support for asynchronous servers and applications.
>
> - Tries to address existing problems with WSGI 1.0 (at least
> the ones
> I found while trolling the maillist and the WSGI site).
>
> Much of it is a reworking of PEP 333, with significant
> differences from
> WSGI called out in a section near the beginning. It also
> contains a
> "Points of Contention" section near the end that anticipates
> argument.
>
> My reasoning for creating the spec was to see exactly how
> crappy it
> would be to write to a spec that drew equivalence between
> Python 2
> ``str`` and Python 3 ``bytes`` rather than between the Python
> 2 ``str``
> and Python 3 ``str`` equivalence promoted by most
> conversations here.
> The answer: about as crappy. But slightly less crappy than I
> feared.
>
> Here's the spec...
>
> PEP: XXX
> Title: Python Web3 Interface
> Version: $Revision$
> Last-Modified: $Date$
> Author: Chris McDonough <chrism at plope.com>
> Discussions-To: Python Web-SIG <web-sig at python.org>
> Status: Draft
> Type: Informational
> Content-Type: text/x-rst
> Created: 19-Jul-2010
>
> Abstract
> ========
>
> This document specifies a proposed second-generation standard
> interface between web servers and Python web applications or
> frameworks.
>
> Rationale and Goals
> ===================
>
> This protocol and specification is influenced heavily by the
> Web
> Services Gateway Interface (WSGI) 1.0 standard described in
> PEP 333
> [1]_ . The high-level rationale for having any standard that
> allows
> Python-based web servers and applications to interoperate is
> outlined
> in PEP 333. This document essentially uses PEP 333 as a
> template, and
> changes its wording in various places for the purpose of
> forming a
> different standard.
>
> Python currently boasts a wide variety of web application
> frameworks
> which use the WSGI 1.0 protocol. However, due to changes in
> the
> language, the WSGI 1.0 protocol is not compatible with Python
> 3. This
> specification describes a standardized WSGI-like protocol that
> lets
> Python 2.6, 2.7 and 3.1+ applications communicate with web
> servers.
> Web3 is clearly a WSGI derivative; it only uses a different
> name than
> "WSGI" in order to indicate that it is not in any way
> backwards
> compatible.
>
> Applications and servers which are written to this
> specification are
> meant to work properly under Python 2.6.X, Python 2.7.X and
> Python
> 3.1+. Neither an application nor a server that implements
> this
> specification can be easily written which will work under
> Python 2
> versions earlier than 2.6 nor Python 3 versions earlier than
> 3.1.
>
> .. note:: whatever Python 3 version fixed
> http://bugs.python.org/issue4006 so os.environ['foo']
> returns
> surrogates (ala PEP 383) when the value of 'foo' cannot be
> decoded
> using the current locale instead of failing with a KeyError
> is the
> true minimum Python 3 version. In particular, however,
> Python 3.0
> is not supported.
>
> Explicability and documentability are the main technical
> drivers for
> the decisions made within the standard.
>
> Differences from WSGI
> =====================
>
> - Asynchronous applications and servers are supported more
> poorly by
> Web3 than by WSGI 1.0
>
> - All protocol-specific environment names are prefixed with
> ``web3.``
> rather than ``wsgi.``, eg. ``web3.input`` rather than
> ``wsgi.input``.
>
> - All values present as environment dictionary *values* are
> explicitly
> *bytes* instances instead of native strings.
>
> - All values returned by an application must be bytes
> instances,
> including status code, header names and values, and the body.
>
> - Wherever WSGI 1.0 referred to an ``app_iter``, this
> specification
> refers to a ``body``.
>
> - No ``start_response()`` callback (and therefore no
> ``write()``
> callable nor ``exc_info`` data).
>
> - The ``readline()`` function of ``web3.input`` must support a
> size
> hint parameter.
>
> - No support for asynchronous applications that cannot yield a
> meaningful status code and a set of headers before beginning
> to
> produce a body.
>
> - No requirement for middleware to yield an empty string if it
> needs
> more information from an application to produce output (e.g.
> no
> "Middleware Handling of Block Boundaries").
>
> - Filelike objects passed to a "file_wrapper" must have an
> ``__iter__`` which returns bytes (never text).
>
> - "file_wrapper": don't read the entire file unless a
> ``Content-Length`` header value has been set by the
> application;
> under that circumstance, the file wrapper should only
> ``Content-Length`` bytes are read from the underlying
> filelike
> object.
>
> - ``QUERY_STRING``, ``SCRIPT_NAME``, ``PATH_INFO`` values
> required to
> be placed in environ by server (each as the empty bytes
> instance if
> no associated value is received in the HTTP request).
>
> - ``web3.path_info`` and ``web3.script_name`` must be put into
> the
> WSGI environment by the origin WSGI server. When available,
> each is
> the original, plain 7-bit ASCII, URL-encoded variant of its
> CGI
> equivalent derived directly from the request URI (with %2F
> segment
> markers and other meta-characters intact).
>
> - This requirement was removed: "middleware components **must
> not**
> block iteration waiting for multiple values from an
> application
> iterable. If the middleware needs to accumulate more data
> from the
> application before it can produce any output, it **must**
> yield an
> empty string."
>
> - ``SERVER_PORT`` must be a bytes instance (not an integer).
>
> Specification Overview
> ======================
>
> The Web3 interface has two sides: the "server" or "gateway"
> side, and
> the "application" or "framework" side. The server side
> invokes a
> callable object that is provided by the application side. The
> specifics of how that object is provided are up to the server
> or
> gateway. It is assumed that some servers or gateways will
> require an
> application's deployer to write a short script to create an
> instance
> of the server or gateway, and supply it with the application
> object.
> Other servers and gateways may use configuration files or
> other
> mechanisms to specify where an application object should be
> imported
> from, or otherwise obtained.
>
> In addition to "pure" servers/gateways and
> applications/frameworks,
> it is also possible to create "middleware" components that
> implement
> both sides of this specification. Such components act as an
> application to their containing server, and as a server to a
> contained application, and can be used to provide extended
> APIs,
> content transformation, navigation, and other useful
> functions.
>
> Throughout this specification, we will use the term "a
> callable" to
> mean "a function, method, class, or an instance with a
> ``__call__``
> method". It is up to the server, gateway, or application
> implementing
> the callable to choose the appropriate implementation
> technique for
> their needs. Conversely, a server, gateway, or application
> that is
> invoking a callable **must not** have any dependency on what
> kind of
> callable was provided to it. Callables are only to be called,
> not
> introspected upon.
>
> The Application/Framework Side
> ------------------------------
>
> The application object is simply a callable object that
> accepts one
> argument. The term "object" should not be misconstrued as
> requiring
> an actual object instance: a function, method, class, or
> instance with
> a ``__call__`` method are all acceptable for use as an
> application
> object. Application objects must be able to be invoked more
> than
> once, as virtually all servers/gateways (other than CGI) will
> make
> such repeated requests.
>
> (Note: although we refer to it as an "application" object,
> this should
> not be construed to mean that application developers will use
> Web3 as
> a web programming API. It is assumed that application
> developers will
> continue to use existing, high-level framework services to
> develop
> their applications. Web3 is a tool for framework and server
> developers, and is not intended to directly support
> application
> developers.)
>
> Here are two example application objects; one is a function,
> and the
> other is a class::
>
> def simple_app(environ):
> """Simplest possible application object"""
> status = b'200 OK'
> headers = [(b'Content-type', b'text/plain')]
> body = [b'Hello world!\n']
> return status, headers, body
>
> class AppClass:
> """Produce the same output, but using a class.
>
> (Note: 'AppClass' is the "application" here, so calling
> it
> returns an instance of 'AppClass', which is then the
> return
> value of the "application callable" as required by the
> spec.
>
> If we wanted to use *instances* of 'AppClass' as
> application
> objects instead, we would have to implement a
> '__call__'
> method, which would be invoked to execute the
> application,
> and we would need to create an instance for use by the
> server or gateway.
> """
> def __init__(self, environ):
> self.environ = environ
>
> def __iter__(self):
> status = b'200 OK'
> headers = [(b'Content-type', b'text/plain')]
> body = [b'Hello world!\n']
> yield status
> yield headers
> yield body
>
> The Server/Gateway Side
> -----------------------
>
> The server or gateway invokes the application callable once
> for each
> request it receives from an HTTP client, that is directed at
> the
> application. To illustrate, here is a simple CGI gateway,
> implemented
> as a function taking an application object. Note that this
> simple
> example has limited error handling, because by default an
> uncaught
> exception will be dumped to ``sys.stderr`` and logged by the
> web
> server.
>
> ::
>
> import locale
> import os
> import sys
>
> encoding = locale.getpreferredencoding()
>
> stdout = sys.stdout
>
> if hasattr(sys.stdout, 'buffer'):
> # Python 3 compatibility; we need to be able to push
> bytes out
> stdout = sys.stdout.buffer
>
> def get_environ():
> d = {}
> for k, v in os.environ.items():
> # Python 3 compatibility
> if not insinstance(v, bytes):
> # We must explicitly encode the string to bytes
> under
> # Python 3.1+
> v = v.encode(encoding, 'surrogateescape')
> d[k] = v
> return d
>
> def run_with_cgi(application):
>
> environ = get_environ()
> environ['web3.input'] = sys.stdin
> environ['web3.errors'] = sys.stderr
> environ['web3.version'] = (1,0)
> environ['web3.multithread'] = False
> environ['web3.multiprocess'] = True
> environ['web3.run_once'] = True
>
> if environ.get('HTTPS', b'off') in (b'on', b'1'):
> environ['web3.url_scheme'] = b'https'
> else:
> environ['web3.url_scheme'] = b'http'
>
> status, headers, body = application(environ)
>
> CLRF = b'\r\n'
>
> try:
> stdout.write(b'Status: ' + status + CRLF)
> for header_name, header_val in headers:
> stdout.write(header_name + b': ' + header_val +
> CRLF)
> stdout.write(CRLF)
> for chunk in body:
> stdout.write(chunk)
> stdout.flush()
> finally:
> if hasattr(body, 'close'):
> body.close()
>
> Middleware: Components that Play Both Sides
> -------------------------------------------
>
> Note that a single object may play the role of a server with
> respect
> to some application(s), while also acting as an application
> with
> respect to some server(s). Such "middleware" components can
> perform
> such functions as:
>
> * Routing a request to different application objects based on
> the
> target URL, after rewriting the ``environ`` accordingly.
>
> * Allowing multiple applications or frameworks to run
> side-by-side in
> the same process
>
> * Load balancing and remote processing, by forwarding requests
> and
> responses over a network
>
> * Perform content postprocessing, such as applying XSL
> stylesheets
>
> The presence of middleware in general is transparent to both
> the
> "server/gateway" and the "application/framework" sides of the
> interface, and should require no special support. A user who
> desires
> to incorporate middleware into an application simply provides
> the
> middleware component to the server, as if it were an
> application, and
> configures the middleware component to invoke the application,
> as if
> the middleware component were a server. Of course, the
> "application"
> that the middleware wraps may in fact be another middleware
> component
> wrapping another application, and so on, creating what is
> referred to
> as a "middleware stack".
>
> For the most part, middleware must conform to the restrictions
> and
> requirements of both the server and application sides of
> Web3. In
> some cases, however, requirements for middleware are more
> stringent
> than for a "pure" server or application, and these points will
> be
> noted in the specification.
>
> Here is a (tongue-in-cheek) example of a middleware component
> that
> converts ``text/plain`` responses to pig latin, using Joe
> Strout's
> ``piglatin.py``. (Note: a "real" middleware component would
> probably
> use a more robust way of checking the content type, and should
> also
> check for a content encoding. Also, this simple example
> ignores the
> possibility that a word might be split across a block
> boundary.)
>
> ::
>
> from piglatin import piglatin
>
> class LatinIter:
>
> """Transform iterated output to piglatin."""
>
> def __init__(self, result):
> if hasattr(result,'close'):
> self.close = result.close
> self.result = result
> self._next = iter(result).next
>
> def __iter__(self):
> return self
>
> def next(self):
> text = str(self._next(), 'utf-8')
> return piglatin(text).encode('utf-8')
>
> class Latinator:
>
> def __init__(self, application):
> self.application = application
>
> def __call__(self, environ):
> status, headers, body = self.application(environ)
> for name, value in headers:
> if name.lower() == b'content-type' and value ==
> b'text/plain':
> body = LatinIter(body)
> # Strip content-length if present, else
> it'll be wrong
> headers = [(name, value) for name, value in
> headers
> if name.lower() !=
> b'content-length']
> break
>
> return status, headers, body
>
> # Run foo_app under a Latinator's control, using the
> example CGI gateway
> from foo_app import foo_app
> run_with_cgi(Latinator(foo_app))
>
> Specification Details
> =====================
>
> The application object must accept one positional argument.
> For the
> sake of illustration, we have named it ``environ``, but it is
> not
> required to have this name. A server or gateway **must**
> invoke the
> application object using a positional (not keyword) argument.
> (E.g. by calling ``status, headers, body =
> application(environ)`` as
> shown above.)
>
> The ``environ`` parameter is a dictionary object, containing
> CGI-style
> environment variables. This object **must** be a builtin
> Python
> dictionary (*not* a subclass, ``UserDict`` or other dictionary
> emulation), and the application is allowed to modify the
> dictionary in
> any way it desires. The dictionary must also include certain
> Web3-required variables (described in a later section), and
> may also
> include server-specific extension variables, named according
> to a
> convention that will be described below.
>
> When called by the server, the application object must return
> an
> iterable yielding three elements: ``status``, ``headers`` and
> ``body``.
>
> The ``status`` element is a status in bytes of the form
> ``b'999
> Message here'``.
>
> ``headers`` is a Python list of ``(header_name,
> header_value)`` pairs
> describing the HTTP response header. The ``headers``
> structure must
> be a literal Python list; it should yield two-tuples. Both
> ``header_name`` and ``header_value`` must be bytes values.
>
> The ``body`` is an iterable yielding zero or more bytes
> instances.
> This can be accomplished in a variety of ways, such as by
> returning a
> list containing bytes instances as ``body``, or by returning a
> generator function as ``body`` that yields bytes instances, or
> by the
> ``body`` being a class whose instances are iterable.
> Regardless of
> how it is accomplished, the application object must always
> return a
> ``body`` iterable yielding zero or more bytes instances.
>
> The server or gateway must transmit the yielded bytes to the
> client in
> an unbuffered fashion, completing the transmission of each set
> of
> bytes before requesting another one. (In other words,
> applications
> **should** perform their own buffering. See the `Buffering
> and
> Streaming`_ section below for more on how application output
> must be
> handled.)
>
> The server or gateway should treat the yielded bytes as binary
> byte
> sequences: in particular, it should ensure that line endings
> are not
> altered. The application is responsible for ensuring that the
> string(s) to be written are in a format suitable for the
> client. (The
> server or gateway **may** apply HTTP transfer encodings, or
> perform
> other transformations for the purpose of implementing HTTP
> features
> such as byte-range transmission. See `Other HTTP Features`_,
> below,
> for more details.)
>
> If a call to ``len(body)`` succeeds, the server must be able
> to rely
> on the result being accurate. That is, if the ``body``
> iterable
> returned by the application provides a working ``__len__()``
> method,
> it **must** return an accurate result. (See the `Handling the
> Content-Length Header`_ section for information on how this
> would
> normally be used.)
>
> If the ``body`` iterable returned by the application has a
> ``close()``
> method, the server or gateway **must** call that method upon
> completion of the current request, whether the request was
> completed
> normally, or terminated early due to an error. (This is to
> support
> resource release by the application. This protocol is
> intended to
> complement PEP 325's generator support, and other common
> iterables
> with ``close()`` methods.
>
> Finally, servers and gateways **must not** directly use any
> other
> attributes of the ``body`` iterable returned by the
> application,
> unless it is an instance of a type specific to that server or
> gateway,
> such as a "file wrapper" returned by ``web3.file_wrapper``
> (see
> `Optional Platform-Specific File Handling`_). In the general
> case,
> only attributes specified here, or accessed via e.g. the PEP
> 234
> iteration APIs are acceptable.
>
> ``environ`` Variables
> ---------------------
>
> The ``environ`` dictionary is required to contain various CGI
> environment variables, as defined by the Common Gateway
> Interface
> specification [2]_.
>
> The following CGI variables **must** be present. Each key is
> a native
> string. Each value is a bytes instance.
>
> .. note:: In Python 3.1+, a "native string" is a ``str`` type
> decoded
> using the ``surrogateescape`` error handler, as done by
> ``os.environ.__getitem__``. In Python 2.6 and 2.7, a
> "native
> string" is a ``str`` types representing a set of bytes.
>
> ``REQUEST_METHOD``
> The HTTP request method, such as ``"GET"`` or ``"POST"``.
>
> ``SCRIPT_NAME``
> The initial portion of the request URL's "path" that
> corresponds to
> the application object, so that the application knows its
> virtual
> "location". This may be the empty bytes instance if the
> application
> corresponds to the "root" of the server. SCRIPT_NAME will be
> a
> bytes instance representing a sequence of URL-encoded
> segments
> separated by the slash character (``/``).
>
> ``PATH_INFO``
> The remainder of the request URL's "path", designating the
> virtual
> "location" of the request's target within the application.
> This
> **may** be a bytes instance if the request URL targets the
> application root and does not have a trailing slash.
> PATH_INFO will
> be a bytes instance representing a sequence of URL-encoded
> segments
> separated by the slash character (``/``).
>
> ``RAW_PATH_INFO``
> The non-URL-decoded ``PATH_INFO`` value.
>
> Through a historical inequity, by virtue of the CGI
> specification,
> ``PATH_INFO`` is present within the environment as an already
> URL-decoded string. This is the original URL-encoded
> value.
>
> ``QUERY_STRING``
> The portion of the request URL (in bytes) that follows the
> ``"?"``,
> if any, or the empty bytes instance.
>
> ``SERVER_NAME``, ``SERVER_PORT``
> When combined with ``SCRIPT_NAME`` and ``PATH_INFO`` (or
> their raw
> equivalents)`, these variables can be used to complete the
> URL.
> Note, however, that ``HTTP_HOST``, if present, should be used
> in
> preference to ``SERVER_NAME`` for reconstructing the request
> URL.
> See the `URL Reconstruction`_ section below for more detail.
> ``SERVER_PORT`` should be a bytes instance, not an integer.
>
> ``SERVER_PROTOCOL``
> The version of the protocol the client used to send the
> request.
> Typically this will be something like ``"HTTP/1.0"`` or
> ``"HTTP/1.1"``
> and may be used by the application to determine how to treat
> any
> HTTP request headers. (This variable should probably be
> called
> ``REQUEST_PROTOCOL``, since it denotes the protocol used in
> the
> request, and is not necessarily the protocol that will be
> used in the
> server's response. However, for compatibility with CGI we
> have to
> keep the existing name.)
>
> The following CGI values **may** present be in the Web3
> environment.
> Each key is a native string. Each value is a bytes instances.
>
> ``CONTENT_TYPE``
> The contents of any ``Content-Type`` fields in the HTTP
> request.
>
> ``CONTENT_LENGTH``
> The contents of any ``Content-Length`` fields in the HTTP
> request.
>
> ``HTTP_`` Variables
> Variables corresponding to the client-supplied HTTP request
> headers
> (i.e., variables whose names begin with ``"HTTP_"``). The
> presence or
> absence of these variables should correspond with the
> presence or
> absence of the appropriate HTTP header in the request.
>
> A server or gateway **should** attempt to provide as many
> other CGI
> variables as are applicable, each with a string for its key
> and a
> bytes instance for its value. In addition, if SSL is in use,
> the
> server or gateway **should** also provide as many of the
> Apache SSL
> environment variables [5]_ as are applicable, such as
> ``HTTPS=on`` and
> ``SSL_PROTOCOL``. Note, however, that an application that
> uses any
> CGI variables other than the ones listed above are necessarily
> non-portable to web servers that do not support the relevant
> extensions. (For example, web servers that do not publish
> files will
> not be able to provide a meaningful ``DOCUMENT_ROOT`` or
> ``PATH_TRANSLATED``.)
>
> A Web3-compliant server or gateway **should** document what
> variables
> it provides, along with their definitions as appropriate.
> Applications **should** check for the presence of any
> variables they
> require, and have a fallback plan in the event such a variable
> is
> absent.
>
> Note that CGI-defined variable values must be bytes instances,
> if they
> are present at all. It is a violation of this specification
> for a CGI
> variable's value to be of any type other than ``bytes``. On
> Python 2,
> this means they will be of type ``str``. On Python 2, this
> means they
> will be of type ``bytes``.
>
> In addition to the CGI-defined variables, the ``environ``
> dictionary
> **may** also contain arbitrary operating-system "environment
> variables", and **must** contain the following Web3-defined
> variables.
>
> =====================
> ===============================================
> Variable Value
> =====================
> ===============================================
> ``web3.version`` The tuple ``(1,0)``, representing Web3
> version 1.0.
>
> ``web3.url_scheme`` A bytes value representing the "scheme"
> portion of
> the URL at which the application is
> being
> invoked. Normally, this will have the
> value
> ``b"http"`` or ``b"https"``, as
> appropriate.
>
> ``web3.input`` An input stream (file-like object) from
> which bytes
> constituting the HTTP request body can
> be read.
> (The server or gateway may perform reads
> on-demand as requested by the
> application, or
> it may pre- read the client's request
> body and
> buffer it in-memory or on disk, or use
> any
> other technique for providing such an
> input
> stream, according to its preference.)
>
> ``web3.errors`` An output stream (file-like object) to
> which error
> output text can be written, for the
> purpose of
> recording program or other errors in a
> standardized and possibly centralized
> location.
> This should be a "text mode" stream;
> i.e.,
> applications should use ``"\n"`` as a
> line
> ending, and assume that it will be
> converted to
> the correct line ending by the
> server/gateway.
> Applications may *not* send bytes to the
> 'write' method of this stream; they may
> only
> send text.
>
> For many servers, ``web3.errors`` will
> be the
> server's main error log. Alternatively,
> this
> may be ``sys.stderr``, or a log file of
> some
> sort. The server's documentation should
> include an explanation of how to
> configure this
> or where to find the recorded output. A
> server
> or gateway may supply different error
> streams
> to different applications, if this is
> desired.
>
> ``web3.multithread`` This value should evaluate true if the
> application object may be simultaneously
> invoked by another thread in the same
> process,
> and should evaluate false otherwise.
>
> ``web3.multiprocess`` This value should evaluate true if an
> equivalent application object may be
> simultaneously invoked by another
> process,
> and should evaluate false otherwise.
>
> ``web3.run_once`` This value should evaluate true if the
> server
> or gateway expects (but does not
> guarantee!)
> that the application will only be
> invoked this
> one time during the life of its
> containing
> process. Normally, this will only be
> true for
> a gateway based on CGI (or something
> similar).
>
> ``web3.script_name`` The non-URL-decoded ``SCRIPT_NAME``
> value.
> Through a historical inequity, by virtue
> of the
> CGI specification, ``SCRIPT_NAME`` is
> present
> within the environment as an already
> URL-decoded string. This is the
> original
> URL-encoded value derived from the
> request URI.
>
> ``web3.path_info`` The non-URL-decoded ``PATH_INFO``
> value.
> Through a historical inequity, by virtue
> of the
> CGI specification, ``PATH_INFO`` is
> present
> within the environment as an already
> URL-decoded string. This is the
> original
> URL-encoded value derived from the
> request URI.
>
> =====================
> ===============================================
>
> Finally, the ``environ`` dictionary may also contain
> server-defined
> variables. These variables should have names which are
> strings,
> composed of only lower-case letters, numbers, dots, and
> underscores,
> and should be prefixed with a name that is unique to the
> defining
> server or gateway. For example, ``mod_python`` might define
> variables
> with names like ``mod_python.some_variable``.
>
> Input Stream
> ~~~~~~~~~~~~
>
> The input stream (``web3.input``) provided by the server must
> support
> the following methods:
>
> =================== ========
> Method Notes
> =================== ========
> ``read(size)`` 1,4
> ``readline([size])`` 1,2,4
> ``readlines([size])`` 1,3,4
> ``__iter__()`` 4
> =================== ========
>
> The semantics of each method are as documented in the Python
> Library
> Reference, except for these notes as listed in the table
> above:
>
> 1. The server is not required to read past the client's
> specified
> ``Content-Length``, and is allowed to simulate an
> end-of-file
> condition if the application attempts to read past that
> point.
> The application **should not** attempt to read more data
> than is
> specified by the ``CONTENT_LENGTH`` variable.
>
> 2. The implementation must support the optional ``size``
> argument to
> ``readline()``.
>
> 3. The application is free to not supply a ``size`` argument
> to
> ``readlines()``, and the server or gateway is free to ignore
> the
> value of any supplied ``size`` argument.
>
> 4. The ``read``, ``readline`` and ``__iter__`` methods must
> return a
> bytes instance. The ``readlines`` method must return a
> sequence
> which contains instances of bytes.
>
> The methods listed in the table above **must** be supported by
> all
> servers conforming to this specification. Applications
> conforming to
> this specification **must not** use any other methods or
> attributes of
> the ``input`` object. In particular, applications **must
> not**
> attempt to close this stream, even if it possesses a
> ``close()``
> method.
>
> Error Stream
> ~~~~~~~~~~~~
>
> The error stream (``web3.errors``) provided by the server must
> support
> the following methods:
>
> =================== ========== ========
> Method Stream Notes
> =================== ========== ========
> ``flush()`` ``errors`` 1
> ``write(str)`` ``errors`` 2
> ``writelines(seq)`` ``errors`` 2
> =================== ========== ========
>
> The semantics of each method are as documented in the Python
> Library
> Reference, except for these notes as listed in the table
> above:
>
> 1. Since the ``errors`` stream may not be rewound, servers and
> gateways are free to forward write operations immediately,
> without
> buffering. In this case, the ``flush()`` method may be a
> no-op.
> Portable applications, however, cannot assume that output is
> unbuffered or that ``flush()`` is a no-op. They must call
> ``flush()`` if they need to ensure that output has in fact
> been
> written. (For example, to minimize intermingling of data
> from
> multiple processes writing to the same error log.)
>
> 2. The ``write()`` method must accept a string argument, but
> needn't
> necessarily accept a bytes argument. The ``writelines()``
> method
> must accept a sequence argument that consists entirely of
> strings,
> but needn't necessarily accept any bytes instance as a
> member of
> the sequence.
>
> The methods listed in the table above **must** be supported by
> all
> servers conforming to this specification. Applications
> conforming to
> this specification **must not** use any other methods or
> attributes of
> the ``errors`` object. In particular, applications **must
> not**
> attempt to close this stream, even if it possesses a
> ``close()``
> method.
>
> Values Returned by A Web3 Application
> -------------------------------------
>
> Web3 applications return an iterable in the form (``status``,
> ``headers``, ``body``). The return value can be any iterable
> type
> that returns exactly three values.
>
> The ``status`` value is assumed by a gateway or server to be
> an HTTP
> "status" bytes instance like ``b'200 OK'`` or ``b'404 Not
> Found'``.
> That is, it is a string consisting of a Status-Code and a
> Reason-Phrase, in that order and separated by a single space,
> with no
> surrounding whitespace or other characters. (See RFC 2616,
> Section
> 6.1.1 for more information.) The string **must not** contain
> control
> characters, and must not be terminated with a carriage return,
> linefeed, or combination thereof.
>
> The ``headers`` value is assumed by a gateway or server to be
> a
> literal Python list of ``(header_name, header_value)``
> tuples. Each
> ``header_name`` must be a bytes instance representing a valid
> HTTP
> header field-name (as defined by RFC 2616, Section 4.2),
> without a
> trailing colon or other punctuation. Each ``header_value``
> must be a
> bytes instance and **must not** include any control
> characters,
> including carriage returns or linefeeds, either embedded or at
> the
> end. (These requirements are to minimize the complexity of
> any
> parsing that must be performed by servers, gateways, and
> intermediate
> response processors that need to inspect or modify response
> headers.)
>
> In general, the server or gateway is responsible for ensuring
> that
> correct headers are sent to the client: if the application
> omits
> a header required by HTTP (or other relevant specifications
> that are in
> effect), the server or gateway **must** add it. For example,
> the HTTP
> ``Date:`` and ``Server:`` headers would normally be supplied
> by the
> server or gateway.
>
> (A reminder for server/gateway authors: HTTP header names are
> case-insensitive, so be sure to take that into consideration
> when
> examining application-supplied headers!)
>
> Applications and middleware are forbidden from using HTTP/1.1
> "hop-by-hop" features or headers, any equivalent features in
> HTTP/1.0,
> or any headers that would affect the persistence of the
> client's
> connection to the web server. These features are the
> exclusive
> province of the actual web server, and a server or gateway
> **should**
> consider it a fatal error for an application to attempt
> sending them,
> and raise an error if they are supplied as return values from
> an
> application in the ``headers`` structure. (For more specifics
> on
> "hop-by-hop" features and headers, please see the `Other HTTP
> Features`_ section below.)
>
> Handling the ``Content-Length`` Header
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> If the application does not supply a ``Content-Length``
> header, a
> server or gateway may choose one of several approaches to
> handling it.
> The simplest of these is to close the client connection when
> the
> response is completed. Under some circumstances, however, the
> server
> or gateway may be able to either generate a ``Content-Length``
> header,
> or at least avoid the need to close the client connection.
>
> If the application returns a ``body`` iterable whose ``len()``
> is 1,
> then the server can automatically determine ``Content-Length``
> by
> taking the length of the first string yielded by the iterable.
>
> If the server and client both support HTTP/1.1 "chunked
> encoding"
> [3]_, then the server **may** use chunked encoding to send a
> chunk for
> each string yielded by the ``body`` iterable, thus generating
> a
> ``Content-Length`` header for each chunk. This allows the
> server to
> keep the client connection alive, if it wishes to do so. Note
> that
> the server **must** comply fully with RFC 2616 when doing
> this, or
> else fall back to one of the other strategies for dealing with
> the
> absence of ``Content-Length``.
>
> (Note: applications and middleware **must not** apply any kind
> of
> ``Transfer-Encoding`` to their output, such as chunking or
> gzipping;
> as "hop-by-hop" operations, these encodings are the province
> of the
> actual web server/gateway. See `Other HTTP Features`_ below,
> for
> more details.)
>
> Dealing with Compatibility Across Python Versions
> -------------------------------------------------
>
> Creating Web3 code that runs under both Python 2.6/2.7 and
> Python 3.1+
> requires some care on the part of the developer. In general,
> the Web3
> specification assumes a certain level of equivalence between
> the
> Python 2 ``str`` type and the Python 3 ``bytes`` type. For
> example,
> under Python 2, the values present in the Web3 ``environ``
> will be
> instances of the ``str`` type; in Python 3, these will be
> instances of
> the ``bytes`` type. The Python 3 ``bytes`` type does not
> possess all
> the methods of the Python 2 ``str`` type, and some methods
> which it
> does possess behave differently than the Python 2 ``str``
> type.
> Effectively, to ensure that Web3 middleware and applications
> work
> across Python versions, developers must do these things:
>
> #) Do not assume comparison equivalence between text values
> and bytes
> values. If you do so, your code may work under Python 2,
> but it
> will not work properly under Python 3. For example, don't
> write
> ``somebytes == 'abc'``. This will sometimes be true on
> Python 2
> but it will never be true on Python 3, because a sequence of
> bytes
> never compares equal to a string under Python 3. Instead,
> always
> compare a bytes value with a bytes value, e.g. "somebytes ==
> b'abc'". Code which does this is compatible with and works
> the
> same in Python 2.6, 2.7, and 3.1. The ``b`` in front of
> ``'abc'``
> signals to Python 3 that the value is a literal bytes
> instance;
> under Python 2 it's a forward compatibility placebo.
>
> #) Don't use the ``__contains__`` method (directly or
> indirectly) of
> items that are meant to be byteslike without ensuring that
> its
> argument is also a bytes instance. If you do so, your code
> may
> work under Python 2, but it will not work properly under
> Python 3.
> For example, ``'abc' in somebytes'`` will raise a
> ``TypeError``
> under Python 3, but it will return ``True`` under Python 2.6
> and
> 2.7. However, ``b'abc' in somebytes`` will work the same on
> both
> versions.
>
> #) Dont try to use the ``format`` method or the ``__mod__``
> method of
> instances of bytes (directly or indirectly). In Python 2,
> the
> ``str`` type which we treat equivalently to Python 3's
> ``bytes``
> supports these method but actual Python 3's ``bytes``
> instances
> don't support these methods. If you use these methods, your
> code
> will work under Python 2, but not under Python 3.
>
> #) Do not try to concatenate a bytes value with a string
> value. This
> may work under Python 2, but it will not work under Python
> 3. For
> example, doing ``'abc' + somebytes`` will work under Python
> 2, but
> it will result in a ``TypeError`` under Python 3. Instead,
> always
> make sure you're concatenating two items of the same type,
> e.g. ``b'abc' + somebytes``.
>
> Web3 expects byte values in other places, such as in all the
> values
> returned by an application.
>
> In short, to ensure compatibility of Web3 application code
> between
> Python 2 and Python 3, in Python 2, treat CGI and server
> variable
> values in the environment as if they had the Python 3
> ``bytes`` API
> even though they actually have a more capable API. Likewise
> for all
> stringlike values returned by a Web3 application.
>
> Buffering and Streaming
> -----------------------
>
> Generally speaking, applications will achieve the best
> throughput by
> buffering their (modestly-sized) output and sending it all at
> once.
> This is a common approach in existing frameworks: the output
> is
> buffered in a StringIO or similar object, then transmitted all
> at
> once, along with the response headers.
>
> The corresponding approach in Web3 is for the application to
> simply
> return a single-element ``body`` iterable (such as a list)
> containing
> the response body as a single string. This is the recommended
> approach for the vast majority of application functions, that
> render
> HTML pages whose text easily fits in memory.
>
> For large files, however, or for specialized uses of HTTP
> streaming
> (such as multipart "server push"), an application may need to
> provide
> output in smaller blocks (e.g. to avoid loading a large file
> into
> memory). It's also sometimes the case that part of a response
> may
> be time-consuming to produce, but it would be useful to send
> ahead the
> portion of the response that precedes it.
>
> In these cases, applications will usually return a ``body``
> iterator
> (often a generator-iterator) that produces the output in a
> block-by-block fashion. These blocks may be broken to
> coincide with
> mulitpart boundaries (for "server push"), or just before
> time-consuming tasks (such as reading another block of an
> on-disk
> file).
>
> Web3 servers, gateways, and middleware **must not** delay the
> transmission of any block; they **must** either fully transmit
> the block to the client, or guarantee that they will continue
> transmission even while the application is producing its next
> block.
> A server/gateway or middleware may provide this guarantee in
> one of
> three ways:
>
> 1. Send the entire block to the operating system (and request
> that any O/S buffers be flushed) before returning control
> to the application, OR
>
> 2. Use a different thread to ensure that the block continues
> to be transmitted while the application produces the next
> block.
>
> 3. (Middleware only) send the entire block to its parent
> gateway/server
>
> By providing this guarantee, Web3 allows applications to
> ensure
> that transmission will not become stalled at an arbitrary
> point
> in their output data. This is critical for proper functioning
> of e.g. multipart "server push" streaming, where data between
> multipart boundaries should be transmitted in full to the
> client.
>
> Unicode Issues
> --------------
>
> HTTP does not directly support Unicode, and neither does this
> interface. All encoding/decoding must be handled by the
> **application**; all values passed to or from the server must
> be of
> the Python 3 type ``bytes`` or instances of the Python 2 type
> ``str``,
> not Python 2 ``unicode`` or Python 3 ``str`` objects.
>
> All "bytes instances" referred to in this specification
> **must**:
>
> - On Python 2, be of type ``str``.
>
> - On Python 3, be of type ``bytes``.
>
> All "bytes instances" **must not** :
>
> - On Python 2, be of type ``unicode``.
>
> - On Python 3, be of type ``str``.
>
> The result of using a textlike object where a byteslike object
> is
> required is undefined.
>
> Values returned from a Web3 app as a status or as response
> headers
> **must** follow RFC 2616 with respect to encoding. That is,
> the bytes
> returned must contain a character stream of ISO-8859-1
> characters, or
> the character stream should use RFC 2047 MIME encoding.
>
> On Python platforms which do not have a native bytes-like type
> (e.g. Jython, IronPython, etc.), but instead which generally
> use
> textlike strings to represent bytes data, the definition of
> "bytes
> instance" can be changed: their "bytes instances" must be
> native
> strings that contain only code points representable in
> ISO-8859-1
> encoding (``\u0000`` through ``\u00FF``, inclusive). It is a
> fatal
> error for an application on such a platform to supply strings
> containing any other Unicode character or code point.
> Similarly,
> servers and gateways on those platforms **must not** supply
> strings to
> an application containing any other Unicode characters.
>
> HTTP 1.1 Expect/Continue
> ------------------------
>
> Servers and gateways that implement HTTP 1.1 **must** provide
> transparent support for HTTP 1.1's "expect/continue"
> mechanism. This
> may be done in any of several ways:
>
> 1. Respond to requests containing an ``Expect: 100-continue``
> request
> with an immediate "100 Continue" response, and proceed
> normally.
>
> 2. Proceed with the request normally, but provide the
> application
> with a ``web3.input`` stream that will send the "100
> Continue"
> response if/when the application first attempts to read from
> the
> input stream. The read request must then remain blocked
> until the
> client responds.
>
> 3. Wait until the client decides that the server does not
> support
> expect/continue, and sends the request body on its own.
> (This
> is suboptimal, and is not recommended.)
>
> Note that these behavior restrictions do not apply for HTTP
> 1.0
> requests, or for requests that are not directed to an
> application
> object. For more information on HTTP 1.1 Expect/Continue, see
> RFC
> 2616, sections 8.2.3 and 10.1.1.
>
>
> Other HTTP Features
> -------------------
>
> In general, servers and gateways should "play dumb" and allow
> the
> application complete control over its output. They should
> only make
> changes that do not alter the effective semantics of the
> application's
> response. It is always possible for the application developer
> to add
> middleware components to supply additional features, so
> server/gateway
> developers should be conservative in their implementation. In
> a sense,
> a server should consider itself to be like an HTTP "gateway
> server",
> with the application being an HTTP "origin server". (See RFC
> 2616,
> section 1.3, for the definition of these terms.)
>
> However, because Web3 servers and applications do not
> communicate via
> HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to
> Web3
> internal communications. Web3 applications **must not**
> generate any
> "hop-by-hop" headers [4]_, attempt to use HTTP features that
> would
> require them to generate such headers, or rely on the content
> of
> any incoming "hop-by-hop" headers in the ``environ``
> dictionary.
> Web3 servers **must** handle any supported inbound
> "hop-by-hop" headers
> on their own, such as by decoding any inbound
> ``Transfer-Encoding``,
> including chunked encoding if applicable.
>
> Applying these principles to a variety of HTTP features, it
> should be
> clear that a server **may** handle cache validation via the
> ``If-None-Match`` and ``If-Modified-Since`` request headers
> and the
> ``Last-Modified`` and ``ETag`` response headers. However, it
> is
> not required to do this, and the application **should**
> perform its
> own cache validation if it wants to support that feature,
> since
> the server/gateway is not required to do such validation.
>
> Similarly, a server **may** re-encode or transport-encode an
> application's response, but the application **should** use a
> suitable content encoding on its own, and **must not** apply a
> transport encoding. A server **may** transmit byte ranges of
> the
> application's response if requested by the client, and the
> application doesn't natively support byte ranges. Again,
> however,
> the application **should** perform this function on its own if
> desired.
>
> Note that these restrictions on applications do not
> necessarily mean
> that every application must reimplement every HTTP feature;
> many HTTP
> features can be partially or fully implemented by middleware
> components, thus freeing both server and application authors
> from
> implementing the same features over and over again.
>
> Thread Support
> --------------
>
> Thread support, or lack thereof, is also server-dependent.
> Servers that can run multiple requests in parallel, **should**
> also
> provide the option of running an application in a
> single-threaded
> fashion, so that applications or frameworks that are not
> thread-safe
> may still be used with that server.
>
> Implementation/Application Notes
> ================================
>
> Server Extension APIs
> ---------------------
>
> Some server authors may wish to expose more advanced APIs,
> that
> application or framework authors can use for specialized
> purposes.
> For example, a gateway based on ``mod_python`` might wish to
> expose
> part of the Apache API as a Web3 extension.
>
> In the simplest case, this requires nothing more than defining
> an
> ``environ`` variable, such as ``mod_python.some_api``. But,
> in many
> cases, the possible presence of middleware can make this
> difficult.
> For example, an API that offers access to the same HTTP
> headers that
> are found in ``environ`` variables, might return different
> data if
> ``environ`` has been modified by middleware.
>
> In general, any extension API that duplicates, supplants, or
> bypasses
> some portion of Web3 functionality runs the risk of being
> incompatible
> with middleware components. Server/gateway developers should
> *not*
> assume that nobody will use middleware, because some framework
> developers specifically organize their frameworks to function
> almost
> entirely as middleware of various kinds.
>
> So, to provide maximum compatibility, servers and gateways
> that
> provide extension APIs that replace some Web3 functionality,
> **must**
> design those APIs so that they are invoked using the portion
> of the
> API that they replace. For example, an extension API to
> access HTTP
> request headers must require the application to pass in its
> current
> ``environ``, so that the server/gateway may verify that HTTP
> headers
> accessible via the API have not been altered by middleware.
> If the
> extension API cannot guarantee that it will always agree with
> ``environ`` about the contents of HTTP headers, it must refuse
> service
> to the application, e.g. by raising an error, returning
> ``None``
> instead of a header collection, or whatever is appropriate to
> the API.
>
> These guidelines also apply to middleware that adds
> information such
> as parsed cookies, form variables, sessions, and the like to
> ``environ``. Specifically, such middleware should provide
> these
> features as functions which operate on ``environ``, rather
> than simply
> stuffing values into ``environ``. This helps ensure that
> information
> is calculated from ``environ`` *after* any middleware has done
> any URL
> rewrites or other ``environ`` modifications.
>
> It is very important that these "safe extension" rules be
> followed by
> both server/gateway and middleware developers, in order to
> avoid a
> future in which middleware developers are forced to delete any
> and all
> extension APIs from ``environ`` to ensure that their mediation
> isn't
> being bypassed by applications using those extensions!
>
> Application Configuration
> -------------------------
>
> This specification does not define how a server selects or
> obtains an
> application to invoke. These and other configuration options
> are
> highly server-specific matters. It is expected that
> server/gateway
> authors will document how to configure the server to execute a
> particular application object, and with what options (such as
> threading options).
>
> Framework authors, on the other hand, should document how to
> create an
> application object that wraps their framework's
> functionality. The
> user, who has chosen both the server and the application
> framework,
> must connect the two together. However, since both the
> framework and
> the server have a common interface, this should be merely a
> mechanical
> matter, rather than a significant engineering effort for each
> new
> server/framework pair.
>
> Finally, some applications, frameworks, and middleware may
> wish to use
> the ``environ`` dictionary to receive simple string
> configuration
> options. Servers and gateways **should** support this by
> allowing an
> application's deployer to specify name-value pairs to be
> placed in
> ``environ``. In the simplest case, this support can consist
> merely of
> copying all operating system-supplied environment variables
> from
> ``os.environ`` into the ``environ`` dictionary, since the
> deployer in
> principle can configure these externally to the server, or in
> the CGI
> case they may be able to be set via the server's configuration
> files.
>
> Applications **should** try to keep such required variables to
> a
> minimum, since not all servers will support easy configuration
> of
> them. Of course, even in the worst case, persons deploying an
> application can create a script to supply the necessary
> configuration
> values::
>
> from the_app import application
>
> def new_app(environ):
> environ['the_app.configval1'] = 'something'
> return application(environ)
>
> But, most existing applications and frameworks will probably
> only need
> a single configuration value from ``environ``, to indicate the
> location
> of their application or framework-specific configuration
> file(s). (Of
> course, applications should cache such configuration, to avoid
> having
> to re-read it upon each invocation.)
>
> URL Reconstruction
> ------------------
>
> If an application wishes to reconstruct a request's complete
> URL (as a
> bytes object), it may do so using the following algorithm:
>
> host = environ.get('HTTP_HOST')
>
> scheme = environ['web3.url_scheme']
> port = environ['SERVER_PORT']
> query = environ['QUERY_STRING']
>
> url = scheme + b'://'
>
> if host:
> url += host
> else:
> url += environ['SERVER_NAME']
>
> if scheme == b'https':
> if port != b'443':
> url += ':' + port
> else:
> if port != b'80':
> url += ':' + port
>
> url += environ['web3.script_name']
> url += environ['web3.path_info']
> if query:
> url += '?' + query
>
> Note that such a reconstructed URL may not be precisely the
> same URI
> as requested by the client. Server rewrite rules, for
> example, may
> have modified the client's originally requested URL to place
> it in a
> canonical form.
>
> Optional Platform-Specific File Handling
> ----------------------------------------
>
> Some operating environments provide special high-performance
> file-
> transmission facilities, such as the Unix ``sendfile()`` call.
> Servers and gateways **may** expose this functionality via an
> optional
> ``web3.file_wrapper`` key in the ``environ``. An application
> **may**
> use this "file wrapper" to convert a file or file-like object
> into the
> ``body`` iterable that it then returns, e.g.::
>
> if 'web3.file_wrapper' in environ:
> body = environ['web3.file_wrapper'](filelike,
> block_size)
> else:
> body = iter(lambda: filelike.read(block_size), '')
>
> If the server or gateway supplies ``web3.file_wrapper``, it
> must be a
> callable that accepts one required positional parameter, and
> one
> optional positional parameter. The first parameter is the
> file-like
> object to be sent, and the second parameter is an optional
> block size
> "suggestion" (which the server/gateway need not use). The
> callable
> **must** return an iterable object, and **must not** perform
> any data
> transmission until and unless the server/gateway actually
> receives the
> iterable as a return value from the application. (To do
> otherwise
> would prevent middleware from being able to interpret or
> override the
> response data.)
>
> To be considered "file-like", the object supplied by the
> application
> must have a ``read()`` method that takes an optional size
> argument.
> The ``read()`` method of the object must return *bytes*, never
> *text*.
> It **may** have a ``close()`` method, and if so, the iterable
> returned
> by ``web3.file_wrapper`` **must** have a ``close()`` method
> that
> invokes the original file-like object's ``close()`` method.
> If the
> "file-like" object has any other methods or attributes with
> names
> matching those of Python built-in file objects (e.g.
> ``fileno()``),
> the ``web3.file_wrapper`` **may** assume that these methods or
> attributes have the same semantics as those of a built-in file
> object.
>
> The actual implementation of any platform-specific file
> handling
> must occur **after** the application returns, and the server
> or
> gateway checks to see if a wrapper object was returned.
> (Again,
> because of the presence of middleware, error handlers, and the
> like,
> it is not guaranteed that any wrapper created will actually be
> used.)
>
> Apart from the handling of ``close()``, the semantics of
> returning a
> file wrapper from the application should be the same as if the
> application had returned ``iter(filelike.read, '')``. In
> other words,
> transmission should begin at the current position within the
> "file" at
> the time that transmission begins, and continue until the end
> is
> reached unless a ``Content-Length`` header value has been set
> by the
> application; under that circumstance, only ``Content-Length``
> bytes
> are read from the "file".
>
> Of course, platform-specific file transmission APIs don't
> usually
> accept arbitrary "file-like" objects. Therefore, a
> ``web3.file_wrapper`` has to introspect the supplied object
> for things
> such as a ``fileno()`` (Unix-like OSes) or a
> ``java.nio.FileChannel``
> (under Jython) in order to determine if the file-like object
> is
> suitable for use with the platform-specific API it supports.
>
> Note that even if the object is *not* suitable for the
> platform API,
> and the ``web3.file_wrapper`` **must** still return an
> iterable. The
> iterable must wrap the underlying filelike object's
> ``close()``
> method. The iterable **may** be the underlying file object
> itself but
> also may need to be a wrapper if the underlying filelike
> object is not
> iterable. Here's a simple platform-agnostic file wrapper
> class:
>
> class FileWrapper(object):
> def __init__(self, filelike, blksize=8192):
> self.filelike = filelike
> self.blksize = blksize
> if hasattr(filelike, 'close'):
> self.close = filelike.close
>
> def __iter__(self):
> try:
> return iter(self.filelike)
> except TypeError: # underlying filelike object not
> iterable
> return self
>
> def next(self):
> data = self.filelike.read(self.blksize)
> if data:
> return data
> raise StopIteration
>
> and here is a snippet from a server/gateway that uses it to
> provide
> access to a platform-specific API::
>
> environ['web3.file_wrapper'] = FileWrapper
> result = application(environ)
>
> try:
> if isinstance(result, FileWrapper):
> # check if result.filelike is usable
> w/platform-specific
> # API, and if so, use that API to transmit the
> result.
> # If not, fall through to normal iterable handling
> # loop below.
>
> for data in result:
> # etc.
>
> finally:
> if hasattr(result,'close'):
> result.close()
>
> Points of Contention
> ====================
>
> Outlined below are potential points of contention regarding
> this
> specification.
>
> WSGI 1.0 Compatibility
> ----------------------
>
> Components written using the WSGI 1.0 specification will not
> transparently interoperate with components written using this
> specification. That's because the goals of this proposal and
> the
> goals of WSGI 1.0 are not directly aligned.
>
> WSGI 1.0 is obliged to provide specification-level backwards
> compatibility with versions of Python between 2.2 and 2.7.
> This
> specification, however, ditches Python 2.5 and lower
> compatibility in
> order to provide compatibility between relatively recent
> versions of
> Python 2 (2.6 and 2.7) as well as relatively recent versions
> of Python
> 3 (3.1).
>
> It is currently impossible to write components which work
> reliably
> under both Python 2 and Python 3 using the WSGI 1.0
> specification,
> because the specification implicitly posits that CGI and
> server
> variable values in the environ and values returned via
> ``start_response`` represent a sequence of bytes that can be
> addressed
> using the Python 2 string API. It posits such a thing because
> that
> sort of data type was the sensible way to represent bytes in
> all
> Python 2 versions, and WSGI 1.0 was conceived before Python 3
> existed.
>
> Python 3's ``str`` type supports the full API provided by the
> Python 2
> ``str`` type, but since Python 3's ``str`` type does not
> represent a
> sequence of bytes, and instead represents text. Therefore,
> using it
> to represent environ values also requires that the environ
> byte
> sequence be decoded to text via some encoding. We cannot
> decode these
> bytes to text (at least in any way where the decoding has any
> meaning
> other than as a tunnelling mechanism) without widening the
> scope of
> WSGI to include server and gateway knowledge of decoding
> policies and
> mechanics. WSGI 1.0 never concerned itself with encoding and
> decoding. It made statements about allowable transport
> values, and
> suggested that various values might be best decoded as one
> encoding or
> another, but it never required a server to *perform* any
> decoding
> before
>
> Python 3 does not have a stringlike type that can be used
> instead to
> represent bytes: it has a ``bytes`` type. A bytes type
> operates quite
> a bit like a Python 2 ``str`` in Python 3.1+, but it lacks
> behavior
> equivalent to ``str.__mod__`` and its iteration protocol, and
> containment and equivalence comparisons are different.
>
> In either case, there is no type in Python 3 that behaves just
> like
> the Python 2 ``str`` type, and a way to create such a type
> doesn't
> exist because there is no such thing as a "String ABC" which
> would
> allow a suitable type to be built. Due to this design
> incompatibility, existing WSGI 1.0 servers, middleware, and
> applications will not work under Python 3, even after they are
> run
> through ``2to3``.
>
> Existing Web-SIG discussions about updating the WSGI
> specification so
> that it is possible to write a WSGI application that runs in
> both
> Python 2 and Python 3 tend to revolve around creating a
> specification-level equivalence between the Python 2 ``str``
> type
> (which represents a sequence of bytes) and the Python 3
> ``str`` type
> (which represents text). Such an equivalence becomes strained
> in
> various areas, given the different roles of these types. An
> arguably
> more straightforward equivalence exists between the Python 3
> ``bytes``
> type API and a subset of the Python 2 ``str`` type API. This
> specification exploits this subset equivalence.
>
> In the meantime, aside from any Python 2 vs. Python 3
> compatibility
> issue, as various discussions on Web-SIG have pointed out, the
> WSGI
> 1.0 specification is too general, providing support for
> asynchronous
> applications at the expense of implementation complexity.
> This
> specification uses the fundamental incompatibility between
> WSGI 1.0
> and Python 3 as a natural divergence point to create a
> specification
> with reduced complexity by removing specialized support for
> asynchronous applications.
>
> To provide backwards compatibility for older WSGI 1.0
> applications, so
> that they may run on a Web3 stack, it is presumed that Web3
> middleware
> will be created which can be used "in front" of existing WSGI
> 1.0
> applications, allowing those existing WSGI 1.0 applications to
> run
> under a Web3 stack. This middleware will require, when under
> Python
> 3, an equivalence to be drawn between Python 3 ``str`` types
> and the
> bytes values represented by the HTTP request and all the
> attendant
> encoding-guessing (or configuration) it implies.
>
> .. note:: Such middleware *might* in the future, instead of
> drawing an
> equivalnce between Python 3 ``str`` and HTTP byte values,
> make use
> of a yet-to-be-created "ebytes" type (aka
> "bytes-with-benefits"),
> particularly if a String ABC proposal is accepted into the
> Python
> core and implemented.
>
> Conversely, it is presumed that WSGI 1.0 middleware will be
> created
> which will allow a Web3 application to run behind a WSGI 1.0
> stack on
> the Python 2 platform.
>
> Environ and Response Values as Bytes
> ------------------------------------
>
> Casual middleware and application writers may consider the use
> of
> bytes as environment values and response values inconvenient.
> In
> particular, they won't be able to use common string formatting
> functions such as ``('%s' % bytes_val)`` or
> ``bytes_val.format('123')`` because bytes don't have the same
> API as
> strings on platforms such as Python 3 where the two types
> differ.
> Likewise, on such platforms, stdlib HTTP-related API support
> for using
> bytes interchangeably with text can be spotty. In places
> where bytes
> are inconvenient or incompatible with library APIs, middleware
> and
> application writers will have to decode such bytes to text
> explicitly.
> This is particularly inconvenient for middleware writers: to
> work with
> environment values as strings, they'll have to decode them
> from an
> implied encoding and if they need to mutate an environ value,
> they'll
> then need to encode the value into a byte stream before
> placing it
> into the environ. While the use of bytes by the specification
> as
> environ values might be inconvenient for casual developers, it
> provides several benefits.
>
> Using bytes types to represent HTTP and server values to an
> application most closely matches reality because HTTP is
> fundamentally
> a bytes-oriented protocol. If the environ values are mandated
> to be
> strings, each server will need to use heuristics to guess
> about the
> encoding of various values provided by the HTTP environment.
> Using
> all strings might increase casual middleware writer
> convenience, but
> will also lead to ambiguity and confusion when a value cannot
> be
> decoded to a meaningful non-surrogate string.
>
> Use of bytes as environ values avoids any potential for the
> need for
> the specification to mandate that a participating server be
> informed
> of encoding configuration parameters. If environ values are
> treated
> as strings, and so must be decoded from bytes, configuration
> parameters may eventually become necessary as policy clues
> from the
> application deployer. Such a policy would be used to guess an
> appropriate decoding strategy in various circumstances,
> effectively
> placing the burden for enforcing a particular application
> encoding
> policy upon the server. If the server must serve more than
> one
> application, such configuration would quickly become complex.
> Many
> policies would also be impossible to express declaratively.
>
> In reality, HTTP is a complicated and legacy-fraught protocol
> that, to
> make sense of, requires a complex set of heuristics. It would
> be nice
> if we could allow this protocol to protect us from this
> complexity,
> but we cannot do so reliably while still providing to
> application
> writers a level of control commensurate with reality. Python
> applications must often deal with data embedded in the
> environment
> which not only must be parsed by legacy heuristics, but *does
> not
> conform even to any existing HTTP specification*. While these
> eventualities are unpleasant, they crop up with regularity,
> making it
> impossible and undesirable to hide them from application
> developers,
> as application developers are the only people who are able to
> decide
> upon an appropriate action when an HTTP specification
> violation is
> detected.
>
> Some have argued for mixed use of bytes and string values as
> environ
> values. This proposal avoids that strategy. Sole use of
> bytes as
> environ values makes it possible to fit this specification
> entirely in
> one's head; you won't need to guess about which values are
> strings and
> which are bytes.
>
> This protocol would also fit in a developer's head if all
> environ
> values were strings, but this specification doesn't use that
> strategy.
> This will likely be the point of greatest contention regarding
> the use
> of bytes. In defense of bytes: developers often prefer
> protocols with
> consistent contracts, even if the contracts themselves are
> suboptimal.
> If we hide encoding issues from a developer until a value that
> contains surrogates causes problems after it has already
> reached
> beyond the I/O boundary of their application, they will need
> to do a
> lot more work to fix assumptions made by their application
> than if we
> were to just present the problem much earlier in terms of
> "here's some
> bytes, you decode them". This is also a counter-argument to
> the
> "bytes are inconvenient" assumption: while presenting bytes to
> an
> application developer may be inconvenient for a casual
> application
> developer who doesn't care about edge cases, they are
> extremely
> convenient for the application developer who needs to deal
> with
> complex, dirty eventualities, because use of bytes allows him
> the
> appropriate level of control with a clear separation of
> responsibility.
>
> If the protocol uses bytes, it is presumed that libraries will
> be
> created to make working with bytes-only in the environ and
> within
> return values more pleasant; for example, analogues of the
> WSGI 1.0
> libraries named "WebOb" and "Werkzeug". Such libraries will
> fill the
> gap between convenience and control, allowing the spec to
> remain
> simple and regular while still allowing casual authors a
> convenient
> way to create Web3 middleware and application components.
> This seems
> to be a reasonable alternative to baking encoding policy into
> the
> protocol, because many such libraries can be created
> independently
> from the protocol, and application developers can choose the
> one that
> provides them the appropriate levels of control and
> convenience for a
> particular job.
>
> Here are some alternatives to using all bytes:
>
> - Have the server decode all values representing CGI and
> server
> environ values into strings using the ``latin-1`` encoding,
> which is
> lossless. Smuggle any undecodable bytes within the resulting
> string.
>
> - Encode all CGI and server environ values to strings using
> the
> ``utf-8`` encoding with the ``surrogateescape`` error
> handler. This
> does not work under any existing Python 2.
>
> - Encode some values into bytes and other values into strings,
> as
> decided by their typical usages.
>
> Applications Should be Allowed to Read ``web3.input`` Past
> ``CONTENT_LENGTH``
> -----------------------------------------------------------------------------
>
> At
> http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html,
> Graham Dumpleton makes the assertion that ``wsgi.input``
> should be
> required to return the empty string as a signifier of
> out-of-data, and
> that applications should be allowed to read past the number of
> bytes
> specified in ``CONTENT_LENGTH``, depending only upon the empty
> string
> as an EOF marker. WSGI relies on an application "being well
> behaved
> and once all data specified by ``CONTENT_LENGTH`` is read,
> that it
> processes the data and returns any response. That same socket
> connection could then be used for a subsequent request."
> Graham would
> like WSGI adapters to be required to wrap raw socket
> connections:
> "this wrapper object will need to count how much data has been
> read,
> and when the amount of data reaches that as defined by
> ``CONTENT_LENGTH``, any subsequent reads should return an
> empty string
> instead." This may be useful to support chunked encoding and
> input
> filters.
>
> ``web3.input`` Unknown Length
> ------------------------------
>
> There's no documented way to indicate that there is content in
> ``environ['web3.input']``, but the content length is unknown.
>
> ``read()`` of ``web3.input`` Should Support No-Size Calling
> Convention
> ----------------------------------------------------------------------
>
> At
> http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html,
> Graham Dumpleton makes the assertion that the ``read()``
> method of
> ``wsgi.input`` should be callable without arguments, and that
> the
> result should be "all available request content". Needs
> discussion.
>
> Input Filters should set environ ``CONTENT_LENGTH`` to -1
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> At
> http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html,
> Graham Dumpleton suggests that an input filter might set
> ``environ['CONTENT_LENGTH']`` to -1 to indicate that it
> mutated the
> input.
>
> ``headers`` as Literal List of Two-Tuples
> -----------------------------------------
>
> Why do we make applications return a ``headers`` structure
> that is a
> literal list of two-tuples? I think the iterability of
> ``headers``
> needs to be maintained while it moves up the stack, but I
> don't think
> we need to be able to mutate it in place at all times. Could
> we
> loosen that requirement?
>
> Removed Requirement that Middleware Not Block
> ---------------------------------------------
>
> This requirement was removed: "middleware components **must
> not**
> block iteration waiting for multiple values from an
> application
> iterable. If the middleware needs to accumulate more data
> from the
> application before it can produce any output, it **must**
> yield an
> empty string." This requirement existed to support
> asynchronous
> applications and servers (see PEP 333's "Middleware Handling
> of Block
> Boundaries"). We might reintroduce this requirement if we
> want to
> support asynchronous applications and servers minimally.
>
> ``web3.script_name`` and ``web3.path_info``
> -------------------------------------------
>
> These values are required to be placed into the environment by
> origin
> server under this specification. Unlike ``SCRIPT_NAME`` and
> ``PATH_INFO``, these must be the original *URL-encoded*
> variants
> derived from the request URI. We probably need to figure out
> how
> these should be computed originally, and what their values
> should be
> if the server performs URL rewriting.
>
> Long Response Headers
> ---------------------
>
> Bob Brewer notes in
> http://mail.python.org/pipermail/web-sig/2006-September/002244.html:
>
> "Each header_value must not include any control characters,
> including
> carriage returns or linefeeds, either embedded or at the end.
> (These
> requirements are to minimize the complexity of any parsing
> that must
> be performed by servers, gateways, and intermediate response
> processors that need to inspect or modify response
> headers.)" [1]
>
> That's understandable, but HTTP headers are defined as
> (mostly) *TEXT,
> and "words of *TEXT MAY contain characters from character sets
> other
> than ISO-8859-1 only when encoded according to the rules of
> RFC 2047."
> [2] And RFC 2047 specifies that "an 'encoded-word' may not be
> more
> than 75 characters long...If it is desirable to encode more
> text than
> will fit in an 'encoded-word' of 75 characters, multiple
> 'encoded-word's (separated by CRLF SPACE) may be used." [3]
> This
> satisfies HTTP header folding rules, as well: "Header fields
> can be
> extended over multiple lines by preceding each extra line with
> at
> least one SP or HT." [1, again]
>
> So in my reading of HTTP, some code somewhere should introduce
> newlines in longish, encoded response header values. I see
> three
> options:
>
> 1. Keep things as they are and disallow response header
> values if
> they contain words over 75 chars that are outside the
> ISO-8859-1
> character set
>
> 2. Allow newline characters in WSGI response headers
>
> 3. Require/strongly suggest WSGI servers to do the encoding
> and
> folding before sending the value over HTTP.
>
> Request Trailers and Chunked Transfer Encoding
> ----------------------------------------------
>
> When using chunked transfer encoding on request content, the
> RFCs
> allow there to be request trailers. These are like request
> headers but
> come after the final null data chunk. These trailers are only
> available when the chunked data stream is finite length and
> when it
> has all been read in. Neither WSGI nor Web3 currently
> supports them.
>
> References
> ==========
>
> .. [1] PEP 333: Python Web Services Gateway Interface
> (http://www.python.org/dev/peps/pep-0333/)
>
> .. [2] The Common Gateway Interface Specification, v 1.1, 3rd
> Draft
> (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
>
> .. [3] "Chunked Transfer Coding" -- HTTP/1.1, section 3.6.1
>
> (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)
>
> .. [4] "End-to-end and Hop-by-hop Headers" -- HTTP/1.1,
> Section 13.5.1
>
> (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1)
>
> .. [5] mod_ssl Reference, "Environment Variables"
> (http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25)
>
>
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/ianb%
> 40colorstudy.com
>
>
>
> --
> Ian Bicking | http://blog.ianbicking.org
More information about the Web-SIG
mailing list