[Web-SIG] Draft PEP: WSGI 1.1

Graham Dumpleton graham.dumpleton at gmail.com
Fri Apr 16 03:41:55 CEST 2010


I haven't read what you have done yet, but if you have done so
already, ensure you read:

 http://bitbucket.org/ianb/wsgi-peps/src/

This is Ian's and Armin's previous go at new specification. It though
tried to go further than what you are doing.

Also read:

 http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html

I explain what I mean by native strings in that.

Graham

On 15 April 2010 22:54, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>
> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>
> Let's have comments here (comments in the form of diffs are
> particularly welcome, of course). Remember, the idea is not to change
> or improve WSGI right now, but only to improve the spec, improving
> interoperability and enabling Python 3 support.
>
> Graham, I hope I did a good job with your suggestions. (Since so much
> of this is yours, I've just listed you as the second author.) I tried
> to clarify exactly what you meant by "native strings", can you check
> that out?
>
> Cheers,
>
> Dirkjan
>
> --- pep-0333.txt        2010-04-15 14:46:02.000000000 +0200
> +++ wsgi-1.1.txt        2010-04-15 14:51:39.000000000 +0200
> @@ -1,114 +1,124 @@
> -PEP: 333
> -Title: Python Web Server Gateway Interface v1.0
> +PEP: 0000
> +Title: Python Web Server Gateway Interface 1.1
>  Version: $Revision$
>  Last-Modified: $Date$
> -Author: Phillip J. Eby <pje at telecommunity.com>
> +Author: Dirkjan Ochtman <dirkjan at ochtman.nl>,
> +        Graham Dumpleton <graham.dumpleton at gmail.com>
>  Discussions-To: Python Web-SIG <web-sig at python.org>
>  Status: Draft
>  Type: Informational
>  Content-Type: text/x-rst
> -Created: 07-Dec-2003
> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
> +Created: 15-04-2010
> +Post-History: Not yet
>
>
>  Abstract
>  ========
>
> -This document specifies a proposed standard interface between web
> -servers and Python web applications or frameworks, to promote web
> -application portability across a variety of web servers.
> +This document specifies a revision of the proposed standard interface
> +between web servers and Python web applications or frameworks, to
> +promote web application portability across a variety of web servers.
>
>
>  Rationale and Goals
>  ===================
>
> -Python currently boasts a wide variety of web application frameworks,
> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
> -name just a few [1]_.  This wide variety of choices can be a problem
> -for new Python users, because generally speaking, their choice of web
> -framework will limit their choice of usable web servers, and vice
> -versa.
> -
> -By contrast, although Java has just as many web application frameworks
> -available, Java's "servlet" API makes it possible for applications
> -written with any Java web application framework to run in any web
> -server that supports the servlet API.
> -
> -The availability and widespread use of such an API in web servers for
> -Python -- whether those servers are written in Python (e.g. Medusa),
> -embed Python (e.g. mod_python), or invoke Python via a gateway
> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
> -framework from choice of web server, freeing users to choose a pairing
> -that suits them, while freeing framework and server developers to
> -focus on their preferred area of specialization.
> -
> -This PEP, therefore, proposes a simple and universal interface between
> -web servers and web applications or frameworks: the Python Web Server
> -Gateway Interface (WSGI).
> -
> -But the mere existence of a WSGI spec does nothing to address the
> -existing state of servers and frameworks for Python web applications.
> -Server and framework authors and maintainers must actually implement
> -WSGI for there to be any effect.
> -
> -However, since no existing servers or frameworks support WSGI, there
> -is little immediate reward for an author who implements WSGI support.
> -Thus, WSGI **must** be easy to implement, so that an author's initial
> -investment in the interface can be reasonably low.
> -
> -Thus, simplicity of implementation on *both* the server and framework
> -sides of the interface is absolutely critical to the utility of the
> -WSGI interface, and is therefore the principal criterion for any
> -design decisions.
> -
> -Note, however, that simplicity of implementation for a framework
> -author is not the same thing as ease of use for a web application
> -author.  WSGI presents an absolutely "no frills" interface to the
> -framework author, because bells and whistles like response objects and
> -cookie handling would just get in the way of existing frameworks'
> -handling of these issues.  Again, the goal of WSGI is to facilitate
> -easy interconnection of existing servers and applications or
> -frameworks, not to create a new web framework.
> -
> -Note also that this goal precludes WSGI from requiring anything that
> -is not already available in deployed versions of Python.  Therefore,
> -new standard library modules are not proposed or required by this
> -specification, and nothing in WSGI requires a Python version greater
> -than 2.2.2.  (It would be a good idea, however, for future versions
> -of Python to include support for this interface in web servers
> -provided by the standard library.)
> -
> -In addition to ease of implementation for existing and future
> -frameworks and servers, it should also be easy to create request
> -preprocessors, response postprocessors, and other WSGI-based
> -"middleware" components that look like an application to their
> -containing server, while acting as a server for their contained
> -applications.
> -
> -If middleware can be both simple and robust, and WSGI is widely
> -available in servers and frameworks, it allows for the possibility
> -of an entirely new kind of Python web application framework: one
> -consisting of loosely-coupled WSGI middleware components.  Indeed,
> -existing framework authors may even choose to refactor their
> -frameworks' existing services to be provided in this way, becoming
> -more like libraries used with WSGI, and less like monolithic
> -frameworks.  This would then allow application developers to choose
> -"best-of-breed" components for specific functionality, rather than
> -having to commit to all the pros and cons of a single framework.
> -
> -Of course, as of this writing, that day is doubtless quite far off.
> -In the meantime, it is a sufficient short-term goal for WSGI to
> -enable the use of any framework with any server.
> -
> -Finally, it should be mentioned that the current version of WSGI
> -does not prescribe any particular mechanism for "deploying" an
> -application for use with a web server or server gateway.  At the
> -present time, this is necessarily implementation-defined by the
> -server or gateway.  After a sufficient number of servers and
> -frameworks have implemented WSGI to provide field experience with
> -varying deployment requirements, it may make sense to create
> -another PEP, describing a deployment standard for WSGI servers and
> -application frameworks.
> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
> +for web applications and web servers to interface with each other.
> +It has become very much the standard it was meant to be and an
> +important part of the Python web development infrastructure.
> +
> +After several implementations were built by different developers,
> +it inevitably turned out that the specification wasn't perfect. It
> +left out some details that were implemented by all the web server
> +interfaces because they were critical for many applications (or
> +application frameworks). Additionally, the specification was written
> +before Python 3.x was specified, resulting in a lack of clear
> +specification on what to do with unicode strings.
> +
> +While there are some ideas around to improve WSGI further in less
> +compatible ways, we feel that there is value to be had in first
> +specifying a minor revision of the specification, which is largely
> +compatible with existing implementations. Further simplification
> +and experimentation are therefore deferred to a 2.0 version.
> +
> +
> +Differences with WSGI 1.0
> +=========================
> +
> +Descriptive changes
> +-------------------
> +
> +The following changes were made to realign the spec with
> +implementations 'in the wild'.
> +
> +1. The 'readline()' function of 'wsgi.input' must optionally take
> +   a size hint. This is required because many applications use
> +   cgi.FieldStorage, which uses this functionality.
> +
> +2. The 'wsgi.input' functions for reading input must return an empty
> +   string as end of input stream marker. This is required for support
> +   of HTTP 1.1 request pipelining. A correctly implemented WSGI
> +   middleware already has to cope with an empty string as end
> +   sentinel anyway to detect premature end of input.
> +
> +3. Any WSGI application or middleware should not itself return, or
> +   consume from a wrapped WSGI component, more data than specified by
> +   the Content-Length response header if defined. Middleware that
> +   does this is arguably broken and can generate incorrect data.
> +   This is just a clarification of obligations.
> +
> +4. The WSGI adapter must not pass on to the server any data above
> +   what the Content-Length response header defines, if supplied.
> +   Doing this is technically a violation of HTTP. This is another
> +   clarification of obligations.
> +
> +
> +String handling changes
> +-----------------------
> +
> +The following changes were made to make WSGI work on Python 3.x.
> +
> +1. The application is passed an instance of a Python dictionary
> +   containing what is referred to as the WSGI environment. All keys
> +   in this dictionary are native strings. For CGI variables, all names
> +   are going to be ISO-8859-1 and so where native strings are
> +   unicode strings, that encoding is used for the names of CGI
> +   variables.
> +
> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
> +   environment, the value of the variable should be a native string.
> +
> +3. For the CGI variables contained in the WSGI environment, the values
> +   of the variables are native strings. Where native strings are
> +   unicode strings, ISO-8859-1 encoding would be used such that the
> +   original character data is preserved and as necessary the unicode
> +   string can be converted back to bytes and thence decoded to unicode
> +   again using a different encoding.
> +
> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
> +   and from which request content is read, should yield byte strings.
> +
> +5. The status line specified by the WSGI application should be a byte
> +   string. Where native strings are unicode strings, the native string
> +   type can also be returned in which case it would be encoded as
> +   ISO-8859-1.
> +
> +6. The list of response headers specified by the WSGI application should
> +   contain tuples consisting of two values, where each value is a byte
> +   string. Where native strings are unicode strings, the native string
> +   type can also be returned in which case it would be encoded as
> +   ISO-8859-1.
> +
> +7. The iterable returned by the application and from which response
> +   content is derived, should yield byte strings. Where native strings
> +   are unicode strings, the native string type can also be returned in
> +   which case it would be encoded as ISO-8859-1.
> +
> +8. The value passed to the 'write()' callback returned by
> +   'start_response()' should be a byte string. Where native strings
> +   are unicode strings, a native string type can also be supplied, in
> +   which case it would be encoded as ISO-8859-1.
>
>
>  Specification Overview
> @@ -447,6 +457,13 @@
>  Streaming`_ section below for more on how application output must be
>  handled.)
>
> +Further on, several places specify constraints upon string types used
> +in the WSGI API. The term native string is used to mean the 'str' class
> +in both Python 2.x and 3.x. The spec tries to ensure optimal
> +compatibility and ease of use by allowing implementations running on
> +Python 3.x to encode strings (which are Unicode strings with no
> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
> +
>  The server or gateway should treat the yielded strings as binary byte
>  sequences: in particular, it should ensure that line endings are
>  not altered.  The application is responsible for ensuring that the
> @@ -489,12 +506,22 @@
>  ``environ`` Variables
>  ---------------------
>
> +All keys in this dictionary are native strings. For CGI variables,
> +all names are going to be ISO-8859-1 and so where native strings are
> +unicode strings, that encoding is used for the names of CGI variables.
> +
>  The ``environ`` dictionary is required to contain these CGI
>  environment variables, as defined by the Common Gateway Interface
>  specification [2]_.  The following variables **must** be present,
>  unless their value would be an empty string, in which case they
>  **may** be omitted, except as otherwise noted below.
>
> +The values for CGI variables are native strings. Where native strings
> +are unicode strings, ISO-8859-1 encoding would be used such that the
> +original character data is preserved and as necessary the unicode
> +string can be converted back to bytes and thence decoded to unicode
> +again using a different encoding.
> +
>  ``REQUEST_METHOD``
>   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
>   cannot ever be an empty string, and so is always required.
> @@ -575,13 +602,14 @@
>  =====================  ===============================================
>  Variable               Value
>  =====================  ===============================================
> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>                        version 1.0.
>
>  ``wsgi.url_scheme``    A string representing the "scheme" portion of
>                        the URL at which the application is being
>                        invoked.  Normally, this will have the value
> -                       ``"http"`` or ``"https"``, as appropriate.
> +                       ``"http"`` or ``"https"``, as appropriate. The
> +                       value is a native string.
>
>  ``wsgi.input``         An input stream (file-like object) from which
>                        the HTTP request body can be read.  (The server
> @@ -646,7 +674,7 @@
>  Method               Stream      Notes
>  ===================  ==========  ========
>  ``read(size)``       ``input``   1
> -``readline()``       ``input``   1,2
> +``readline(hint)``   ``input``   1,2
>  ``readlines(hint)``  ``input``   1,3
>  ``__iter__()``       ``input``
>  ``flush()``          ``errors``  4
> @@ -661,11 +689,12 @@
>    ``Content-Length``, and is allowed to simulate an end-of-file
>    condition if the application attempts to read past that point.
>    The application **should not** attempt to read more data than is
> -   specified by the ``CONTENT_LENGTH`` variable.
> +   specified by the ``CONTENT_LENGTH`` variable. All read functions
> +   are required to return an empty string as the end of input stream
> +   marker. They must yield byte strings.
>
> -2. The optional "size" argument to ``readline()`` is not supported,
> -   as it may be complex for server authors to implement, and is not
> -   often used in practice.
> +2. The optional "size" argument to ``readline()`` is required for
> +   the implementer, but optional for callers.
>
>  3. Note that the ``hint`` argument to ``readlines()`` is optional for
>    both caller and implementer.  The application is free not to
> @@ -692,12 +721,15 @@
>  ---------------------------------
>
>  The second parameter passed to the application object is a callable
> -of the form ``start_response(status,response_headers,exc_info=None)``.
> +of the form ``start_response(status, response_headers, exc_info=None)``.
>  (As with all WSGI callables, the arguments must be supplied
>  positionally, not by keyword.)  The ``start_response`` callable is
>  used to begin the HTTP response, and it must return a
>  ``write(body_data)`` callable (see the `Buffering and Streaming`_
> -section, below).
> +section, below). Values passed to the ``write(body_data)`` callable
> +should be byte strings. Where native strings are unicode strings, a
> +native strings type can also be supplied, in which case it would be
> +encoded as ISO-8859-1.
>
>  The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>  or ``"404 Not Found"``.  That is, it is a string consisting of a
> @@ -705,14 +737,20 @@
>  single space, with no surrounding whitespace or other characters.
>  (See RFC 2616, Section 6.1.1 for more information.)  The string
>  **must not** contain control characters, and must not be terminated
> -with a carriage return, linefeed, or combination thereof.
> +with a carriage return, linefeed, or combination thereof. This
> +value should be a byte string. Where native strings are unicode
> +strings, the native string type can also be returned, in which
> +case it would be encoded as ISO-8859-1.
>
>  The ``response_headers`` argument is a list of ``(header_name,
>  header_value)`` tuples.  It must be a Python list; i.e.
> -``type(response_headers) is ListType``, and the server **may** change
> +``type(response_headers) is list``, and the server **may** change
>  its contents in any way it desires.  Each ``header_name`` must be a
>  valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
> -without a trailing colon or other punctuation.
> +without a trailing colon or other punctuation. Both the header_name
> +and the header_value should be byte strings. Where native strings
> +are unicode strings, the native string type can also be returned,
> +in which case it would be encoded as ISO-8859-1.
>
>  Each ``header_value`` **must not** include *any* control characters,
>  including carriage returns or linefeeds, either embedded or at the end.
> @@ -809,6 +847,14 @@
>  Handling the ``Content-Length`` Header
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> +If an application or middleware layer chooses to return a
> +Content-Length header, it should not return more data than specified
> +by the header value. Any wrapping middleware layer should not
> +consume more data than specified in the header value from the
> +wrapped component (either middleware or application). Any WSGI
> +adapter must similarly not pass on data above what the
> +Content-Length response header value defines.
> +
>  If the application does not supply a ``Content-Length`` header, a
>  server or gateway may choose one of several approaches to handling
>  it.  The simplest of these is to close the client connection when
> @@ -1569,55 +1615,13 @@
>    developers.
>
>
> -Proposed/Under Discussion
> -=========================
> -
> -These items are currently being discussed on the Web-SIG and elsewhere,
> -or are on the PEP author's "to-do" list:
> -
> -* Should ``wsgi.input`` be an iterator instead of a file?  This would
> -  help for asynchronous applications and chunked-encoding input
> -  streams.
> -
> -* Optional extensions are being discussed for pausing iteration of an
> -  application's ouptut until input is available or until a callback
> -  occurs.
> -
> -* Add a section about synchronous vs. asynchronous apps and servers,
> -  the relevant threading models, and issues/design goals in these
> -  areas.
> -
> -
>  Acknowledgements
>  ================
>
> -Thanks go to the many folks on the Web-SIG mailing list whose
> -thoughtful feedback made this revised draft possible.  Especially:
> +Thanks go to many folks on the Web-SIG mailing list for helping the work
> +on clarifying and improving this specification. In particular:
>
> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
> -  on the first draft as not offering any advantages over "plain old
> -  CGI", thus encouraging me to look for a better approach.
> -
> -* Ian Bicking, who helped nag me into properly specifying the
> -  multithreading and multiprocess options, as well as badgering me to
> -  provide a mechanism for servers to supply custom extension data to
> -  an application.
> -
> -* Tony Lownds, who came up with the concept of a ``start_response``
> -  function that took the status and headers, returning a ``write``
> -  function.  His input also guided the design of the exception handling
> -  facilities, especially in the area of allowing for middleware that
> -  overrides application error messages.
> -
> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
> -  (well before the spec was finalized) helped to shape the "supporting
> -  older versions of Python" section, as well as the optional
> -  ``wsgi.file_wrapper`` facility.
> -
> -* Mark Nottingham, who reviewed the spec extensively for issues with
> -  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
> -  I didn't even know existed until he pointed them out.
> -
> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>
>  References
>  ==========
> @@ -1643,8 +1647,6 @@
>
>  This document has been placed in the public domain.
>
> -
> -
>  ..
>    Local Variables:
>    mode: indented-text
>


More information about the Web-SIG mailing list