[Web-SIG] Draft PEP: WSGI 1.1
Paul Davis
paul.joseph.davis at gmail.com
Fri Apr 16 05:29:29 CEST 2010
On Thu, Apr 15, 2010 at 10:08 PM, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
> On 16 April 2010 11:41, Graham Dumpleton <graham.dumpleton at gmail.com> wrote:
>> I haven't read what you have done yet
>
> And still haven't. Don't know when I will get a chance to do so.
>
> Two points from a quick scan of emails.
>
> 1. The following section of PEP needs to be updated:
>
> """
> 1417 Apart from the handling of ``close()``, the semantics of returning a
> 1418 file wrapper from the application should be the same as if the
> 1419 application had returned ``iter(filelike.read, '')``. In other words,
> 1420 transmission should begin at the current position within the "file"
> 1421 at the time that transmission begins, and continue until the end is
> 1422 reached.
> """
>
> It can't say read until 'end is reached' of file as Content-Length
> must be honoured and less returned if Content-Length is less than what
> is available in the remainder of the file as per descriptive changes
> (3) and (4).
>
> In respect of question about readline() arguments and whether -1 or
> None is allowed. I would say no they are not. Must be positive integer
> or no argument supplied at all.
>
> Different implementations use -1 or None as value of a default
> argument to know when an argument wasn't supplied. One cant rely
> though on one or the other being used and so that supplying those
> arguments explicitly means the same thing as no argument supplied. In
> other words, supplying anything but positive integer or no argument at
> all is undefined.
>
> Same issue arises with read() except that only positive integer can
> technically be supplied and argument is not optional. Although, any
> implementation which implements wsgi.input as a proper file like
> argument is going to accept no argument to mean read all input, this
> is outside of WSGI specification and calling with no argument is
> undefined.
>
> Graham
I happened to have just started hitting the body reading functions on
an HTTP parser I've been working on. I'd be interested to hear a
response on what happens when the various read functions are called
with a size hint of zero.
I realize that zero is not a positive integer but I'm not quite sure
on what the recommended return value would be. I'm can see None and -1
being obvious flags for "no size hint", but zero is a tad weird. I
want to say that it'd either return "" (which could sorta kinda
violate #2) or raise an exception. I really haven't got any reason to
prefer on over the other though.
As an aside, I think that "honoring Content-Length" should probably be
rephrased to a "middleware should not break HTTP" coupled with a page
that lists common ways that middle ware breaks HTTP. I reckon its the
same reasoning for 333's dictation that hop-by-hop headers are server
only, though there are plenty of other ways I could violate RFC 2616
as a middleware author without violating WSGI. Pie in the sky, the
common ways would be included with wsgiref's validate decorator.
Paul
>> but if you have done so
>> already, ensure you read:
>>
>> http://bitbucket.org/ianb/wsgi-peps/src/
>>
>> This is Ian's and Armin's previous go at new specification. It though
>> tried to go further than what you are doing.
>>
>> Also read:
>>
>> http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html
>>
>> I explain what I mean by native strings in that.
>>
>> Graham
>>
>> On 15 April 2010 22:54, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
>>> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>>>
>>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>>>
>>> Let's have comments here (comments in the form of diffs are
>>> particularly welcome, of course). Remember, the idea is not to change
>>> or improve WSGI right now, but only to improve the spec, improving
>>> interoperability and enabling Python 3 support.
>>>
>>> Graham, I hope I did a good job with your suggestions. (Since so much
>>> of this is yours, I've just listed you as the second author.) I tried
>>> to clarify exactly what you meant by "native strings", can you check
>>> that out?
>>>
>>> Cheers,
>>>
>>> Dirkjan
>>>
>>> --- pep-0333.txt 2010-04-15 14:46:02.000000000 +0200
>>> +++ wsgi-1.1.txt 2010-04-15 14:51:39.000000000 +0200
>>> @@ -1,114 +1,124 @@
>>> -PEP: 333
>>> -Title: Python Web Server Gateway Interface v1.0
>>> +PEP: 0000
>>> +Title: Python Web Server Gateway Interface 1.1
>>> Version: $Revision$
>>> Last-Modified: $Date$
>>> -Author: Phillip J. Eby <pje at telecommunity.com>
>>> +Author: Dirkjan Ochtman <dirkjan at ochtman.nl>,
>>> + Graham Dumpleton <graham.dumpleton at gmail.com>
>>> Discussions-To: Python Web-SIG <web-sig at python.org>
>>> Status: Draft
>>> Type: Informational
>>> Content-Type: text/x-rst
>>> -Created: 07-Dec-2003
>>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
>>> +Created: 15-04-2010
>>> +Post-History: Not yet
>>>
>>>
>>> Abstract
>>> ========
>>>
>>> -This document specifies a proposed standard interface between web
>>> -servers and Python web applications or frameworks, to promote web
>>> -application portability across a variety of web servers.
>>> +This document specifies a revision of the proposed standard interface
>>> +between web servers and Python web applications or frameworks, to
>>> +promote web application portability across a variety of web servers.
>>>
>>>
>>> Rationale and Goals
>>> ===================
>>>
>>> -Python currently boasts a wide variety of web application frameworks,
>>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
>>> -name just a few [1]_. This wide variety of choices can be a problem
>>> -for new Python users, because generally speaking, their choice of web
>>> -framework will limit their choice of usable web servers, and vice
>>> -versa.
>>> -
>>> -By contrast, although Java has just as many web application frameworks
>>> -available, Java's "servlet" API makes it possible for applications
>>> -written with any Java web application framework to run in any web
>>> -server that supports the servlet API.
>>> -
>>> -The availability and widespread use of such an API in web servers for
>>> -Python -- whether those servers are written in Python (e.g. Medusa),
>>> -embed Python (e.g. mod_python), or invoke Python via a gateway
>>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
>>> -framework from choice of web server, freeing users to choose a pairing
>>> -that suits them, while freeing framework and server developers to
>>> -focus on their preferred area of specialization.
>>> -
>>> -This PEP, therefore, proposes a simple and universal interface between
>>> -web servers and web applications or frameworks: the Python Web Server
>>> -Gateway Interface (WSGI).
>>> -
>>> -But the mere existence of a WSGI spec does nothing to address the
>>> -existing state of servers and frameworks for Python web applications.
>>> -Server and framework authors and maintainers must actually implement
>>> -WSGI for there to be any effect.
>>> -
>>> -However, since no existing servers or frameworks support WSGI, there
>>> -is little immediate reward for an author who implements WSGI support.
>>> -Thus, WSGI **must** be easy to implement, so that an author's initial
>>> -investment in the interface can be reasonably low.
>>> -
>>> -Thus, simplicity of implementation on *both* the server and framework
>>> -sides of the interface is absolutely critical to the utility of the
>>> -WSGI interface, and is therefore the principal criterion for any
>>> -design decisions.
>>> -
>>> -Note, however, that simplicity of implementation for a framework
>>> -author is not the same thing as ease of use for a web application
>>> -author. WSGI presents an absolutely "no frills" interface to the
>>> -framework author, because bells and whistles like response objects and
>>> -cookie handling would just get in the way of existing frameworks'
>>> -handling of these issues. Again, the goal of WSGI is to facilitate
>>> -easy interconnection of existing servers and applications or
>>> -frameworks, not to create a new web framework.
>>> -
>>> -Note also that this goal precludes WSGI from requiring anything that
>>> -is not already available in deployed versions of Python. Therefore,
>>> -new standard library modules are not proposed or required by this
>>> -specification, and nothing in WSGI requires a Python version greater
>>> -than 2.2.2. (It would be a good idea, however, for future versions
>>> -of Python to include support for this interface in web servers
>>> -provided by the standard library.)
>>> -
>>> -In addition to ease of implementation for existing and future
>>> -frameworks and servers, it should also be easy to create request
>>> -preprocessors, response postprocessors, and other WSGI-based
>>> -"middleware" components that look like an application to their
>>> -containing server, while acting as a server for their contained
>>> -applications.
>>> -
>>> -If middleware can be both simple and robust, and WSGI is widely
>>> -available in servers and frameworks, it allows for the possibility
>>> -of an entirely new kind of Python web application framework: one
>>> -consisting of loosely-coupled WSGI middleware components. Indeed,
>>> -existing framework authors may even choose to refactor their
>>> -frameworks' existing services to be provided in this way, becoming
>>> -more like libraries used with WSGI, and less like monolithic
>>> -frameworks. This would then allow application developers to choose
>>> -"best-of-breed" components for specific functionality, rather than
>>> -having to commit to all the pros and cons of a single framework.
>>> -
>>> -Of course, as of this writing, that day is doubtless quite far off.
>>> -In the meantime, it is a sufficient short-term goal for WSGI to
>>> -enable the use of any framework with any server.
>>> -
>>> -Finally, it should be mentioned that the current version of WSGI
>>> -does not prescribe any particular mechanism for "deploying" an
>>> -application for use with a web server or server gateway. At the
>>> -present time, this is necessarily implementation-defined by the
>>> -server or gateway. After a sufficient number of servers and
>>> -frameworks have implemented WSGI to provide field experience with
>>> -varying deployment requirements, it may make sense to create
>>> -another PEP, describing a deployment standard for WSGI servers and
>>> -application frameworks.
>>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
>>> +for web applications and web servers to interface with each other.
>>> +It has become very much the standard it was meant to be and an
>>> +important part of the Python web development infrastructure.
>>> +
>>> +After several implementations were built by different developers,
>>> +it inevitably turned out that the specification wasn't perfect. It
>>> +left out some details that were implemented by all the web server
>>> +interfaces because they were critical for many applications (or
>>> +application frameworks). Additionally, the specification was written
>>> +before Python 3.x was specified, resulting in a lack of clear
>>> +specification on what to do with unicode strings.
>>> +
>>> +While there are some ideas around to improve WSGI further in less
>>> +compatible ways, we feel that there is value to be had in first
>>> +specifying a minor revision of the specification, which is largely
>>> +compatible with existing implementations. Further simplification
>>> +and experimentation are therefore deferred to a 2.0 version.
>>> +
>>> +
>>> +Differences with WSGI 1.0
>>> +=========================
>>> +
>>> +Descriptive changes
>>> +-------------------
>>> +
>>> +The following changes were made to realign the spec with
>>> +implementations 'in the wild'.
>>> +
>>> +1. The 'readline()' function of 'wsgi.input' must optionally take
>>> + a size hint. This is required because many applications use
>>> + cgi.FieldStorage, which uses this functionality.
>>> +
>>> +2. The 'wsgi.input' functions for reading input must return an empty
>>> + string as end of input stream marker. This is required for support
>>> + of HTTP 1.1 request pipelining. A correctly implemented WSGI
>>> + middleware already has to cope with an empty string as end
>>> + sentinel anyway to detect premature end of input.
>>> +
>>> +3. Any WSGI application or middleware should not itself return, or
>>> + consume from a wrapped WSGI component, more data than specified by
>>> + the Content-Length response header if defined. Middleware that
>>> + does this is arguably broken and can generate incorrect data.
>>> + This is just a clarification of obligations.
>>> +
>>> +4. The WSGI adapter must not pass on to the server any data above
>>> + what the Content-Length response header defines, if supplied.
>>> + Doing this is technically a violation of HTTP. This is another
>>> + clarification of obligations.
>>> +
>>> +
>>> +String handling changes
>>> +-----------------------
>>> +
>>> +The following changes were made to make WSGI work on Python 3.x.
>>> +
>>> +1. The application is passed an instance of a Python dictionary
>>> + containing what is referred to as the WSGI environment. All keys
>>> + in this dictionary are native strings. For CGI variables, all names
>>> + are going to be ISO-8859-1 and so where native strings are
>>> + unicode strings, that encoding is used for the names of CGI
>>> + variables.
>>> +
>>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>>> + environment, the value of the variable should be a native string.
>>> +
>>> +3. For the CGI variables contained in the WSGI environment, the values
>>> + of the variables are native strings. Where native strings are
>>> + unicode strings, ISO-8859-1 encoding would be used such that the
>>> + original character data is preserved and as necessary the unicode
>>> + string can be converted back to bytes and thence decoded to unicode
>>> + again using a different encoding.
>>> +
>>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
>>> + and from which request content is read, should yield byte strings.
>>> +
>>> +5. The status line specified by the WSGI application should be a byte
>>> + string. Where native strings are unicode strings, the native string
>>> + type can also be returned in which case it would be encoded as
>>> + ISO-8859-1.
>>> +
>>> +6. The list of response headers specified by the WSGI application should
>>> + contain tuples consisting of two values, where each value is a byte
>>> + string. Where native strings are unicode strings, the native string
>>> + type can also be returned in which case it would be encoded as
>>> + ISO-8859-1.
>>> +
>>> +7. The iterable returned by the application and from which response
>>> + content is derived, should yield byte strings. Where native strings
>>> + are unicode strings, the native string type can also be returned in
>>> + which case it would be encoded as ISO-8859-1.
>>> +
>>> +8. The value passed to the 'write()' callback returned by
>>> + 'start_response()' should be a byte string. Where native strings
>>> + are unicode strings, a native string type can also be supplied, in
>>> + which case it would be encoded as ISO-8859-1.
>>>
>>>
>>> Specification Overview
>>> @@ -447,6 +457,13 @@
>>> Streaming`_ section below for more on how application output must be
>>> handled.)
>>>
>>> +Further on, several places specify constraints upon string types used
>>> +in the WSGI API. The term native string is used to mean the 'str' class
>>> +in both Python 2.x and 3.x. The spec tries to ensure optimal
>>> +compatibility and ease of use by allowing implementations running on
>>> +Python 3.x to encode strings (which are Unicode strings with no
>>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
>>> +
>>> The server or gateway should treat the yielded strings as binary byte
>>> sequences: in particular, it should ensure that line endings are
>>> not altered. The application is responsible for ensuring that the
>>> @@ -489,12 +506,22 @@
>>> ``environ`` Variables
>>> ---------------------
>>>
>>> +All keys in this dictionary are native strings. For CGI variables,
>>> +all names are going to be ISO-8859-1 and so where native strings are
>>> +unicode strings, that encoding is used for the names of CGI variables.
>>> +
>>> The ``environ`` dictionary is required to contain these CGI
>>> environment variables, as defined by the Common Gateway Interface
>>> specification [2]_. The following variables **must** be present,
>>> unless their value would be an empty string, in which case they
>>> **may** be omitted, except as otherwise noted below.
>>>
>>> +The values for CGI variables are native strings. Where native strings
>>> +are unicode strings, ISO-8859-1 encoding would be used such that the
>>> +original character data is preserved and as necessary the unicode
>>> +string can be converted back to bytes and thence decoded to unicode
>>> +again using a different encoding.
>>> +
>>> ``REQUEST_METHOD``
>>> The HTTP request method, such as ``"GET"`` or ``"POST"``. This
>>> cannot ever be an empty string, and so is always required.
>>> @@ -575,13 +602,14 @@
>>> ===================== ===============================================
>>> Variable Value
>>> ===================== ===============================================
>>> -``wsgi.version`` The tuple ``(1,0)``, representing WSGI
>>> +``wsgi.version`` The tuple ``(1, 0)``, representing WSGI
>>> version 1.0.
>>>
>>> ``wsgi.url_scheme`` A string representing the "scheme" portion of
>>> the URL at which the application is being
>>> invoked. Normally, this will have the value
>>> - ``"http"`` or ``"https"``, as appropriate.
>>> + ``"http"`` or ``"https"``, as appropriate. The
>>> + value is a native string.
>>>
>>> ``wsgi.input`` An input stream (file-like object) from which
>>> the HTTP request body can be read. (The server
>>> @@ -646,7 +674,7 @@
>>> Method Stream Notes
>>> =================== ========== ========
>>> ``read(size)`` ``input`` 1
>>> -``readline()`` ``input`` 1,2
>>> +``readline(hint)`` ``input`` 1,2
>>> ``readlines(hint)`` ``input`` 1,3
>>> ``__iter__()`` ``input``
>>> ``flush()`` ``errors`` 4
>>> @@ -661,11 +689,12 @@
>>> ``Content-Length``, and is allowed to simulate an end-of-file
>>> condition if the application attempts to read past that point.
>>> The application **should not** attempt to read more data than is
>>> - specified by the ``CONTENT_LENGTH`` variable.
>>> + specified by the ``CONTENT_LENGTH`` variable. All read functions
>>> + are required to return an empty string as the end of input stream
>>> + marker. They must yield byte strings.
>>>
>>> -2. The optional "size" argument to ``readline()`` is not supported,
>>> - as it may be complex for server authors to implement, and is not
>>> - often used in practice.
>>> +2. The optional "size" argument to ``readline()`` is required for
>>> + the implementer, but optional for callers.
>>>
>>> 3. Note that the ``hint`` argument to ``readlines()`` is optional for
>>> both caller and implementer. The application is free not to
>>> @@ -692,12 +721,15 @@
>>> ---------------------------------
>>>
>>> The second parameter passed to the application object is a callable
>>> -of the form ``start_response(status,response_headers,exc_info=None)``.
>>> +of the form ``start_response(status, response_headers, exc_info=None)``.
>>> (As with all WSGI callables, the arguments must be supplied
>>> positionally, not by keyword.) The ``start_response`` callable is
>>> used to begin the HTTP response, and it must return a
>>> ``write(body_data)`` callable (see the `Buffering and Streaming`_
>>> -section, below).
>>> +section, below). Values passed to the ``write(body_data)`` callable
>>> +should be byte strings. Where native strings are unicode strings, a
>>> +native strings type can also be supplied, in which case it would be
>>> +encoded as ISO-8859-1.
>>>
>>> The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>>> or ``"404 Not Found"``. That is, it is a string consisting of a
>>> @@ -705,14 +737,20 @@
>>> single space, with no surrounding whitespace or other characters.
>>> (See RFC 2616, Section 6.1.1 for more information.) The string
>>> **must not** contain control characters, and must not be terminated
>>> -with a carriage return, linefeed, or combination thereof.
>>> +with a carriage return, linefeed, or combination thereof. This
>>> +value should be a byte string. Where native strings are unicode
>>> +strings, the native string type can also be returned, in which
>>> +case it would be encoded as ISO-8859-1.
>>>
>>> The ``response_headers`` argument is a list of ``(header_name,
>>> header_value)`` tuples. It must be a Python list; i.e.
>>> -``type(response_headers) is ListType``, and the server **may** change
>>> +``type(response_headers) is list``, and the server **may** change
>>> its contents in any way it desires. Each ``header_name`` must be a
>>> valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
>>> -without a trailing colon or other punctuation.
>>> +without a trailing colon or other punctuation. Both the header_name
>>> +and the header_value should be byte strings. Where native strings
>>> +are unicode strings, the native string type can also be returned,
>>> +in which case it would be encoded as ISO-8859-1.
>>>
>>> Each ``header_value`` **must not** include *any* control characters,
>>> including carriage returns or linefeeds, either embedded or at the end.
>>> @@ -809,6 +847,14 @@
>>> Handling the ``Content-Length`` Header
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> +If an application or middleware layer chooses to return a
>>> +Content-Length header, it should not return more data than specified
>>> +by the header value. Any wrapping middleware layer should not
>>> +consume more data than specified in the header value from the
>>> +wrapped component (either middleware or application). Any WSGI
>>> +adapter must similarly not pass on data above what the
>>> +Content-Length response header value defines.
>>> +
>>> If the application does not supply a ``Content-Length`` header, a
>>> server or gateway may choose one of several approaches to handling
>>> it. The simplest of these is to close the client connection when
>>> @@ -1569,55 +1615,13 @@
>>> developers.
>>>
>>>
>>> -Proposed/Under Discussion
>>> -=========================
>>> -
>>> -These items are currently being discussed on the Web-SIG and elsewhere,
>>> -or are on the PEP author's "to-do" list:
>>> -
>>> -* Should ``wsgi.input`` be an iterator instead of a file? This would
>>> - help for asynchronous applications and chunked-encoding input
>>> - streams.
>>> -
>>> -* Optional extensions are being discussed for pausing iteration of an
>>> - application's ouptut until input is available or until a callback
>>> - occurs.
>>> -
>>> -* Add a section about synchronous vs. asynchronous apps and servers,
>>> - the relevant threading models, and issues/design goals in these
>>> - areas.
>>> -
>>> -
>>> Acknowledgements
>>> ================
>>>
>>> -Thanks go to the many folks on the Web-SIG mailing list whose
>>> -thoughtful feedback made this revised draft possible. Especially:
>>> +Thanks go to many folks on the Web-SIG mailing list for helping the work
>>> +on clarifying and improving this specification. In particular:
>>>
>>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
>>> - on the first draft as not offering any advantages over "plain old
>>> - CGI", thus encouraging me to look for a better approach.
>>> -
>>> -* Ian Bicking, who helped nag me into properly specifying the
>>> - multithreading and multiprocess options, as well as badgering me to
>>> - provide a mechanism for servers to supply custom extension data to
>>> - an application.
>>> -
>>> -* Tony Lownds, who came up with the concept of a ``start_response``
>>> - function that took the status and headers, returning a ``write``
>>> - function. His input also guided the design of the exception handling
>>> - facilities, especially in the area of allowing for middleware that
>>> - overrides application error messages.
>>> -
>>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
>>> - (well before the spec was finalized) helped to shape the "supporting
>>> - older versions of Python" section, as well as the optional
>>> - ``wsgi.file_wrapper`` facility.
>>> -
>>> -* Mark Nottingham, who reviewed the spec extensively for issues with
>>> - HTTP RFC compliance, especially with regard to HTTP/1.1 features that
>>> - I didn't even know existed until he pointed them out.
>>> -
>>> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>>>
>>> References
>>> ==========
>>> @@ -1643,8 +1647,6 @@
>>>
>>> This document has been placed in the public domain.
>>>
>>> -
>>> -
>>> ..
>>> Local Variables:
>>> mode: indented-text
>>>
>>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com
>
More information about the Web-SIG
mailing list