[Web-SIG] Draft PEP: WSGI 1.1

Paul Davis paul.joseph.davis at gmail.com
Fri Apr 16 05:29:29 CEST 2010


On Thu, Apr 15, 2010 at 10:08 PM, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
> On 16 April 2010 11:41, Graham Dumpleton <graham.dumpleton at gmail.com> wrote:
>> I haven't read what you have done yet
>
> And still haven't. Don't know when I will get a chance to do so.
>
> Two points from a quick scan of emails.
>
> 1. The following section of PEP needs to be updated:
>
> """
>  1417 Apart from the handling of ``close()``, the semantics of returning a
>  1418 file wrapper from the application should be the same as if the
>  1419 application had returned ``iter(filelike.read, '')``.  In other words,
>  1420 transmission should begin at the current position within the "file"
>  1421 at the time that transmission begins, and continue until the end is
>  1422 reached.
> """
>
> It can't say read until 'end is reached' of file as Content-Length
> must be honoured and less returned if Content-Length is less than what
> is available in the remainder of the file as per descriptive changes
> (3) and (4).
>
> In respect of question about readline() arguments and whether -1 or
> None is allowed. I would say no they are not. Must be positive integer
> or no argument supplied at all.
>
> Different implementations use -1 or None as value of a default
> argument to know when an argument wasn't supplied. One cant rely
> though on one or the other being used and so that supplying those
> arguments explicitly means the same thing as no argument supplied. In
> other words, supplying anything but positive integer or no argument at
> all is undefined.
>
> Same issue arises with read() except that only positive integer can
> technically be supplied and argument is not optional. Although, any
> implementation which implements wsgi.input as a proper file like
> argument is going to accept no argument to mean read all input, this
> is outside of WSGI specification and calling with no argument is
> undefined.
>
> Graham

I happened to have just started hitting the body reading functions on
an HTTP parser I've been working on. I'd be interested to hear a
response on what happens when the various read functions are called
with a size hint of zero.

I realize that zero is not a positive integer but I'm not quite sure
on what the recommended return value would be. I'm can see None and -1
being obvious flags for "no size hint", but zero is a tad weird. I
want to say that it'd either return "" (which could sorta kinda
violate #2) or raise an exception. I really haven't got any reason to
prefer on over the other though.

As an aside, I think that "honoring Content-Length" should probably be
rephrased to a "middleware should not break HTTP" coupled with a page
that lists common ways that middle ware breaks HTTP. I reckon its the
same reasoning for 333's dictation that hop-by-hop headers are server
only, though there are plenty of other ways I could violate RFC 2616
as a middleware author without violating WSGI. Pie in the sky, the
common ways would be included with wsgiref's validate decorator.

Paul

>> but if you have done so
>> already, ensure you read:
>>
>>  http://bitbucket.org/ianb/wsgi-peps/src/
>>
>> This is Ian's and Armin's previous go at new specification. It though
>> tried to go further than what you are doing.
>>
>> Also read:
>>
>>  http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html
>>
>> I explain what I mean by native strings in that.
>>
>> Graham
>>
>> On 15 April 2010 22:54, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
>>> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>>>
>>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>>>
>>> Let's have comments here (comments in the form of diffs are
>>> particularly welcome, of course). Remember, the idea is not to change
>>> or improve WSGI right now, but only to improve the spec, improving
>>> interoperability and enabling Python 3 support.
>>>
>>> Graham, I hope I did a good job with your suggestions. (Since so much
>>> of this is yours, I've just listed you as the second author.) I tried
>>> to clarify exactly what you meant by "native strings", can you check
>>> that out?
>>>
>>> Cheers,
>>>
>>> Dirkjan
>>>
>>> --- pep-0333.txt        2010-04-15 14:46:02.000000000 +0200
>>> +++ wsgi-1.1.txt        2010-04-15 14:51:39.000000000 +0200
>>> @@ -1,114 +1,124 @@
>>> -PEP: 333
>>> -Title: Python Web Server Gateway Interface v1.0
>>> +PEP: 0000
>>> +Title: Python Web Server Gateway Interface 1.1
>>>  Version: $Revision$
>>>  Last-Modified: $Date$
>>> -Author: Phillip J. Eby <pje at telecommunity.com>
>>> +Author: Dirkjan Ochtman <dirkjan at ochtman.nl>,
>>> +        Graham Dumpleton <graham.dumpleton at gmail.com>
>>>  Discussions-To: Python Web-SIG <web-sig at python.org>
>>>  Status: Draft
>>>  Type: Informational
>>>  Content-Type: text/x-rst
>>> -Created: 07-Dec-2003
>>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
>>> +Created: 15-04-2010
>>> +Post-History: Not yet
>>>
>>>
>>>  Abstract
>>>  ========
>>>
>>> -This document specifies a proposed standard interface between web
>>> -servers and Python web applications or frameworks, to promote web
>>> -application portability across a variety of web servers.
>>> +This document specifies a revision of the proposed standard interface
>>> +between web servers and Python web applications or frameworks, to
>>> +promote web application portability across a variety of web servers.
>>>
>>>
>>>  Rationale and Goals
>>>  ===================
>>>
>>> -Python currently boasts a wide variety of web application frameworks,
>>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
>>> -name just a few [1]_.  This wide variety of choices can be a problem
>>> -for new Python users, because generally speaking, their choice of web
>>> -framework will limit their choice of usable web servers, and vice
>>> -versa.
>>> -
>>> -By contrast, although Java has just as many web application frameworks
>>> -available, Java's "servlet" API makes it possible for applications
>>> -written with any Java web application framework to run in any web
>>> -server that supports the servlet API.
>>> -
>>> -The availability and widespread use of such an API in web servers for
>>> -Python -- whether those servers are written in Python (e.g. Medusa),
>>> -embed Python (e.g. mod_python), or invoke Python via a gateway
>>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
>>> -framework from choice of web server, freeing users to choose a pairing
>>> -that suits them, while freeing framework and server developers to
>>> -focus on their preferred area of specialization.
>>> -
>>> -This PEP, therefore, proposes a simple and universal interface between
>>> -web servers and web applications or frameworks: the Python Web Server
>>> -Gateway Interface (WSGI).
>>> -
>>> -But the mere existence of a WSGI spec does nothing to address the
>>> -existing state of servers and frameworks for Python web applications.
>>> -Server and framework authors and maintainers must actually implement
>>> -WSGI for there to be any effect.
>>> -
>>> -However, since no existing servers or frameworks support WSGI, there
>>> -is little immediate reward for an author who implements WSGI support.
>>> -Thus, WSGI **must** be easy to implement, so that an author's initial
>>> -investment in the interface can be reasonably low.
>>> -
>>> -Thus, simplicity of implementation on *both* the server and framework
>>> -sides of the interface is absolutely critical to the utility of the
>>> -WSGI interface, and is therefore the principal criterion for any
>>> -design decisions.
>>> -
>>> -Note, however, that simplicity of implementation for a framework
>>> -author is not the same thing as ease of use for a web application
>>> -author.  WSGI presents an absolutely "no frills" interface to the
>>> -framework author, because bells and whistles like response objects and
>>> -cookie handling would just get in the way of existing frameworks'
>>> -handling of these issues.  Again, the goal of WSGI is to facilitate
>>> -easy interconnection of existing servers and applications or
>>> -frameworks, not to create a new web framework.
>>> -
>>> -Note also that this goal precludes WSGI from requiring anything that
>>> -is not already available in deployed versions of Python.  Therefore,
>>> -new standard library modules are not proposed or required by this
>>> -specification, and nothing in WSGI requires a Python version greater
>>> -than 2.2.2.  (It would be a good idea, however, for future versions
>>> -of Python to include support for this interface in web servers
>>> -provided by the standard library.)
>>> -
>>> -In addition to ease of implementation for existing and future
>>> -frameworks and servers, it should also be easy to create request
>>> -preprocessors, response postprocessors, and other WSGI-based
>>> -"middleware" components that look like an application to their
>>> -containing server, while acting as a server for their contained
>>> -applications.
>>> -
>>> -If middleware can be both simple and robust, and WSGI is widely
>>> -available in servers and frameworks, it allows for the possibility
>>> -of an entirely new kind of Python web application framework: one
>>> -consisting of loosely-coupled WSGI middleware components.  Indeed,
>>> -existing framework authors may even choose to refactor their
>>> -frameworks' existing services to be provided in this way, becoming
>>> -more like libraries used with WSGI, and less like monolithic
>>> -frameworks.  This would then allow application developers to choose
>>> -"best-of-breed" components for specific functionality, rather than
>>> -having to commit to all the pros and cons of a single framework.
>>> -
>>> -Of course, as of this writing, that day is doubtless quite far off.
>>> -In the meantime, it is a sufficient short-term goal for WSGI to
>>> -enable the use of any framework with any server.
>>> -
>>> -Finally, it should be mentioned that the current version of WSGI
>>> -does not prescribe any particular mechanism for "deploying" an
>>> -application for use with a web server or server gateway.  At the
>>> -present time, this is necessarily implementation-defined by the
>>> -server or gateway.  After a sufficient number of servers and
>>> -frameworks have implemented WSGI to provide field experience with
>>> -varying deployment requirements, it may make sense to create
>>> -another PEP, describing a deployment standard for WSGI servers and
>>> -application frameworks.
>>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
>>> +for web applications and web servers to interface with each other.
>>> +It has become very much the standard it was meant to be and an
>>> +important part of the Python web development infrastructure.
>>> +
>>> +After several implementations were built by different developers,
>>> +it inevitably turned out that the specification wasn't perfect. It
>>> +left out some details that were implemented by all the web server
>>> +interfaces because they were critical for many applications (or
>>> +application frameworks). Additionally, the specification was written
>>> +before Python 3.x was specified, resulting in a lack of clear
>>> +specification on what to do with unicode strings.
>>> +
>>> +While there are some ideas around to improve WSGI further in less
>>> +compatible ways, we feel that there is value to be had in first
>>> +specifying a minor revision of the specification, which is largely
>>> +compatible with existing implementations. Further simplification
>>> +and experimentation are therefore deferred to a 2.0 version.
>>> +
>>> +
>>> +Differences with WSGI 1.0
>>> +=========================
>>> +
>>> +Descriptive changes
>>> +-------------------
>>> +
>>> +The following changes were made to realign the spec with
>>> +implementations 'in the wild'.
>>> +
>>> +1. The 'readline()' function of 'wsgi.input' must optionally take
>>> +   a size hint. This is required because many applications use
>>> +   cgi.FieldStorage, which uses this functionality.
>>> +
>>> +2. The 'wsgi.input' functions for reading input must return an empty
>>> +   string as end of input stream marker. This is required for support
>>> +   of HTTP 1.1 request pipelining. A correctly implemented WSGI
>>> +   middleware already has to cope with an empty string as end
>>> +   sentinel anyway to detect premature end of input.
>>> +
>>> +3. Any WSGI application or middleware should not itself return, or
>>> +   consume from a wrapped WSGI component, more data than specified by
>>> +   the Content-Length response header if defined. Middleware that
>>> +   does this is arguably broken and can generate incorrect data.
>>> +   This is just a clarification of obligations.
>>> +
>>> +4. The WSGI adapter must not pass on to the server any data above
>>> +   what the Content-Length response header defines, if supplied.
>>> +   Doing this is technically a violation of HTTP. This is another
>>> +   clarification of obligations.
>>> +
>>> +
>>> +String handling changes
>>> +-----------------------
>>> +
>>> +The following changes were made to make WSGI work on Python 3.x.
>>> +
>>> +1. The application is passed an instance of a Python dictionary
>>> +   containing what is referred to as the WSGI environment. All keys
>>> +   in this dictionary are native strings. For CGI variables, all names
>>> +   are going to be ISO-8859-1 and so where native strings are
>>> +   unicode strings, that encoding is used for the names of CGI
>>> +   variables.
>>> +
>>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>>> +   environment, the value of the variable should be a native string.
>>> +
>>> +3. For the CGI variables contained in the WSGI environment, the values
>>> +   of the variables are native strings. Where native strings are
>>> +   unicode strings, ISO-8859-1 encoding would be used such that the
>>> +   original character data is preserved and as necessary the unicode
>>> +   string can be converted back to bytes and thence decoded to unicode
>>> +   again using a different encoding.
>>> +
>>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
>>> +   and from which request content is read, should yield byte strings.
>>> +
>>> +5. The status line specified by the WSGI application should be a byte
>>> +   string. Where native strings are unicode strings, the native string
>>> +   type can also be returned in which case it would be encoded as
>>> +   ISO-8859-1.
>>> +
>>> +6. The list of response headers specified by the WSGI application should
>>> +   contain tuples consisting of two values, where each value is a byte
>>> +   string. Where native strings are unicode strings, the native string
>>> +   type can also be returned in which case it would be encoded as
>>> +   ISO-8859-1.
>>> +
>>> +7. The iterable returned by the application and from which response
>>> +   content is derived, should yield byte strings. Where native strings
>>> +   are unicode strings, the native string type can also be returned in
>>> +   which case it would be encoded as ISO-8859-1.
>>> +
>>> +8. The value passed to the 'write()' callback returned by
>>> +   'start_response()' should be a byte string. Where native strings
>>> +   are unicode strings, a native string type can also be supplied, in
>>> +   which case it would be encoded as ISO-8859-1.
>>>
>>>
>>>  Specification Overview
>>> @@ -447,6 +457,13 @@
>>>  Streaming`_ section below for more on how application output must be
>>>  handled.)
>>>
>>> +Further on, several places specify constraints upon string types used
>>> +in the WSGI API. The term native string is used to mean the 'str' class
>>> +in both Python 2.x and 3.x. The spec tries to ensure optimal
>>> +compatibility and ease of use by allowing implementations running on
>>> +Python 3.x to encode strings (which are Unicode strings with no
>>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
>>> +
>>>  The server or gateway should treat the yielded strings as binary byte
>>>  sequences: in particular, it should ensure that line endings are
>>>  not altered.  The application is responsible for ensuring that the
>>> @@ -489,12 +506,22 @@
>>>  ``environ`` Variables
>>>  ---------------------
>>>
>>> +All keys in this dictionary are native strings. For CGI variables,
>>> +all names are going to be ISO-8859-1 and so where native strings are
>>> +unicode strings, that encoding is used for the names of CGI variables.
>>> +
>>>  The ``environ`` dictionary is required to contain these CGI
>>>  environment variables, as defined by the Common Gateway Interface
>>>  specification [2]_.  The following variables **must** be present,
>>>  unless their value would be an empty string, in which case they
>>>  **may** be omitted, except as otherwise noted below.
>>>
>>> +The values for CGI variables are native strings. Where native strings
>>> +are unicode strings, ISO-8859-1 encoding would be used such that the
>>> +original character data is preserved and as necessary the unicode
>>> +string can be converted back to bytes and thence decoded to unicode
>>> +again using a different encoding.
>>> +
>>>  ``REQUEST_METHOD``
>>>   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
>>>   cannot ever be an empty string, and so is always required.
>>> @@ -575,13 +602,14 @@
>>>  =====================  ===============================================
>>>  Variable               Value
>>>  =====================  ===============================================
>>> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
>>> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>>>                        version 1.0.
>>>
>>>  ``wsgi.url_scheme``    A string representing the "scheme" portion of
>>>                        the URL at which the application is being
>>>                        invoked.  Normally, this will have the value
>>> -                       ``"http"`` or ``"https"``, as appropriate.
>>> +                       ``"http"`` or ``"https"``, as appropriate. The
>>> +                       value is a native string.
>>>
>>>  ``wsgi.input``         An input stream (file-like object) from which
>>>                        the HTTP request body can be read.  (The server
>>> @@ -646,7 +674,7 @@
>>>  Method               Stream      Notes
>>>  ===================  ==========  ========
>>>  ``read(size)``       ``input``   1
>>> -``readline()``       ``input``   1,2
>>> +``readline(hint)``   ``input``   1,2
>>>  ``readlines(hint)``  ``input``   1,3
>>>  ``__iter__()``       ``input``
>>>  ``flush()``          ``errors``  4
>>> @@ -661,11 +689,12 @@
>>>    ``Content-Length``, and is allowed to simulate an end-of-file
>>>    condition if the application attempts to read past that point.
>>>    The application **should not** attempt to read more data than is
>>> -   specified by the ``CONTENT_LENGTH`` variable.
>>> +   specified by the ``CONTENT_LENGTH`` variable. All read functions
>>> +   are required to return an empty string as the end of input stream
>>> +   marker. They must yield byte strings.
>>>
>>> -2. The optional "size" argument to ``readline()`` is not supported,
>>> -   as it may be complex for server authors to implement, and is not
>>> -   often used in practice.
>>> +2. The optional "size" argument to ``readline()`` is required for
>>> +   the implementer, but optional for callers.
>>>
>>>  3. Note that the ``hint`` argument to ``readlines()`` is optional for
>>>    both caller and implementer.  The application is free not to
>>> @@ -692,12 +721,15 @@
>>>  ---------------------------------
>>>
>>>  The second parameter passed to the application object is a callable
>>> -of the form ``start_response(status,response_headers,exc_info=None)``.
>>> +of the form ``start_response(status, response_headers, exc_info=None)``.
>>>  (As with all WSGI callables, the arguments must be supplied
>>>  positionally, not by keyword.)  The ``start_response`` callable is
>>>  used to begin the HTTP response, and it must return a
>>>  ``write(body_data)`` callable (see the `Buffering and Streaming`_
>>> -section, below).
>>> +section, below). Values passed to the ``write(body_data)`` callable
>>> +should be byte strings. Where native strings are unicode strings, a
>>> +native strings type can also be supplied, in which case it would be
>>> +encoded as ISO-8859-1.
>>>
>>>  The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>>>  or ``"404 Not Found"``.  That is, it is a string consisting of a
>>> @@ -705,14 +737,20 @@
>>>  single space, with no surrounding whitespace or other characters.
>>>  (See RFC 2616, Section 6.1.1 for more information.)  The string
>>>  **must not** contain control characters, and must not be terminated
>>> -with a carriage return, linefeed, or combination thereof.
>>> +with a carriage return, linefeed, or combination thereof. This
>>> +value should be a byte string. Where native strings are unicode
>>> +strings, the native string type can also be returned, in which
>>> +case it would be encoded as ISO-8859-1.
>>>
>>>  The ``response_headers`` argument is a list of ``(header_name,
>>>  header_value)`` tuples.  It must be a Python list; i.e.
>>> -``type(response_headers) is ListType``, and the server **may** change
>>> +``type(response_headers) is list``, and the server **may** change
>>>  its contents in any way it desires.  Each ``header_name`` must be a
>>>  valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
>>> -without a trailing colon or other punctuation.
>>> +without a trailing colon or other punctuation. Both the header_name
>>> +and the header_value should be byte strings. Where native strings
>>> +are unicode strings, the native string type can also be returned,
>>> +in which case it would be encoded as ISO-8859-1.
>>>
>>>  Each ``header_value`` **must not** include *any* control characters,
>>>  including carriage returns or linefeeds, either embedded or at the end.
>>> @@ -809,6 +847,14 @@
>>>  Handling the ``Content-Length`` Header
>>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> +If an application or middleware layer chooses to return a
>>> +Content-Length header, it should not return more data than specified
>>> +by the header value. Any wrapping middleware layer should not
>>> +consume more data than specified in the header value from the
>>> +wrapped component (either middleware or application). Any WSGI
>>> +adapter must similarly not pass on data above what the
>>> +Content-Length response header value defines.
>>> +
>>>  If the application does not supply a ``Content-Length`` header, a
>>>  server or gateway may choose one of several approaches to handling
>>>  it.  The simplest of these is to close the client connection when
>>> @@ -1569,55 +1615,13 @@
>>>    developers.
>>>
>>>
>>> -Proposed/Under Discussion
>>> -=========================
>>> -
>>> -These items are currently being discussed on the Web-SIG and elsewhere,
>>> -or are on the PEP author's "to-do" list:
>>> -
>>> -* Should ``wsgi.input`` be an iterator instead of a file?  This would
>>> -  help for asynchronous applications and chunked-encoding input
>>> -  streams.
>>> -
>>> -* Optional extensions are being discussed for pausing iteration of an
>>> -  application's ouptut until input is available or until a callback
>>> -  occurs.
>>> -
>>> -* Add a section about synchronous vs. asynchronous apps and servers,
>>> -  the relevant threading models, and issues/design goals in these
>>> -  areas.
>>> -
>>> -
>>>  Acknowledgements
>>>  ================
>>>
>>> -Thanks go to the many folks on the Web-SIG mailing list whose
>>> -thoughtful feedback made this revised draft possible.  Especially:
>>> +Thanks go to many folks on the Web-SIG mailing list for helping the work
>>> +on clarifying and improving this specification. In particular:
>>>
>>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
>>> -  on the first draft as not offering any advantages over "plain old
>>> -  CGI", thus encouraging me to look for a better approach.
>>> -
>>> -* Ian Bicking, who helped nag me into properly specifying the
>>> -  multithreading and multiprocess options, as well as badgering me to
>>> -  provide a mechanism for servers to supply custom extension data to
>>> -  an application.
>>> -
>>> -* Tony Lownds, who came up with the concept of a ``start_response``
>>> -  function that took the status and headers, returning a ``write``
>>> -  function.  His input also guided the design of the exception handling
>>> -  facilities, especially in the area of allowing for middleware that
>>> -  overrides application error messages.
>>> -
>>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
>>> -  (well before the spec was finalized) helped to shape the "supporting
>>> -  older versions of Python" section, as well as the optional
>>> -  ``wsgi.file_wrapper`` facility.
>>> -
>>> -* Mark Nottingham, who reviewed the spec extensively for issues with
>>> -  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
>>> -  I didn't even know existed until he pointed them out.
>>> -
>>> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>>>
>>>  References
>>>  ==========
>>> @@ -1643,8 +1647,6 @@
>>>
>>>  This document has been placed in the public domain.
>>>
>>> -
>>> -
>>>  ..
>>>    Local Variables:
>>>    mode: indented-text
>>>
>>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com
>


More information about the Web-SIG mailing list