[Web-SIG] Latest WSGI Draft

Sun Aug 22 19:12:02 CEST 2004

At 11:29 PM 8/21/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Note also that this goal precludes WSGI from requiring anything that
>>is not already available in deployed versions of Python.  Therefore,
>>new standard library modules are not proposed or required by this
>>specification, and nothing in WSGI requires a Python version greater
>>than 1.5.2.  (It would be a good idea, however, for future versions
>>of Python to include support for this interface in web servers
>>provided by the standard library.)
>
>Like you said, maybe 1.5.2 is optimistic.  The spec works for 1.5.2, but 
>most servers and applications will have higher requirements, and the 
>iteration is annoying to handle in those versions.

Fine, we'll say 2.2.2, since that version had True and False as well as 
__iter__.

>>If middleware can be both simple and robust, and WSGI is widely
>>available in servers and frameworks, it allows for the possibility
>>of an entirely new kind of Python web application framework: one
>>consisting of loosely-coupled WSGI middleware components.  Indeed,
>>existing framework authors may even choose to refactor their
>>frameworks' existing services to be provided in this way, becoming
>>more like libraries used with WSGI, and less like monolithic
>>frameworks.  This would then allow application developers to choose
>>"best-of-breed" components for specific functionality, rather than
>>having to commit to all the pros and cons of a single framework.
>>Of course, as of this writing, that day is doubtless quite far off.
>>In the meantime, it is a sufficient short-term goal for WSGI to
>>enable the use of any framework with any server.
>
>That's a awfully pessimistic paragraph ;)

Are you being ironic?  I'm not sure I follow you here.

>>The WSGI interface has two sides: the "server" or "gateway" side,
>>and the "application" side.  The server side invokes a callable
>>object that is provided by the application side.  The specifics
>>of how that object is provided are up to the server or gateway.
>>It is assumed that some servers or gateways will require an
>>application's deployer to write a short script to create an
>>instance of the server or gateway, and supply it with the
>>application object.  Other servers and gateways may use
>>configuration files or other mechanisms to specify where the
>>application object should be imported from.
>
>Maybe "gateway" is just distracting.

Do you have a specific suggestion here?

>>     class AppClass:
>>         """Much the same thing, but as a class"""
>>         def __init__(self, environ, start_response):
>>             self.environ = environ
>>             self.start = start_response
>>         def __iter__(self):
>>             status = '200 OK'
>>             headers = [('Content-type','text/plain')]
>>             self.start(status, headers)
>>             yield "Hello world!\n"
>>             for i in range(1,11):
>>                 yield "Extra line %s\n" % i
>
>This second example confuses me.  Though as I reread it I realize more 
>clearly what it's doing; __init__ is the callable (in essence), but self 
>is automatically returned.  I think an instance with a __call__ method 
>would be easier to understand.  OTOH, there's more concurrency 
>overhead.  I dunno.  Anyway, that one confused me.

Perhaps you could suggest some text to add to the docstring that would have 
prevented your initial confusion?

>>The application object must accept two positional arguments.  For
>>the sake of illustration, we have named them ``environ``, and
>>``start_response``, but they are not required to have these names.
>>A server or gateway *must* invoke the application object using
>>positional (not keyword) arguments.
>>The first parameter is a dictionary object, containing CGI-style
>>environment variables.
>
>I think the spec is easier to understand if you use names here, i.e., 
>"environ is a dictionary object".  Or remind the reader of the invocation, 
>i.e., note application(environ, start_response) is called.

I'll try to do something with this.

>>The second parameter is a callable accepting two positional
>>arguments: a status string of the form ``"999 Message here"``,
>>and a list of ``(header_name,header_value)`` tuples describing the
>>HTTP response header.  This callable must return another callable
>>that takes one parameter: a string to write as part of the HTTP
>>response body.
>
>"This callable must return a writing function: a function that takes a 
>single string as an argument, which is written as the HTTP response body."

I'll work on this one too.

>I guess "function" is more specific than "callable", but it seems easier 
>to understand.  Though honestly, I find the CGI example the easiest way to 
>understand this, so maybe being more accurate here is fine.

I've got to explain *somewhere* that these are any callable.  Maybe I 
should preface the overview with an explanation of what "a callable" means, 
and reinforce it once or twice in the form "such and such is a callable 
(function, method, class, callable instance, etc.) that blah blah blah".

>>  * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with
>>    ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the
>
>You forgot to finish your sentence.  Also SERVER_NAME is a fallback if 
>HTTP_HOST isn't present; generally SERVER_NAME indicates the canonical 
>host name, not necessarily the actual host name.

Ah yes.  Tony already provided a patch for the typo, but I'll add something 
about HTTP_HOST.

>>``wsgi.last_call``     This value should be true if this is expected
>>                        to be the last invocation of the application
>>                        in this process.  This is provided to allow
>>                        applications to optimize their setup for
>>                        long-running vs. short-running scenarios.
>>                        This flag should normally only be true for
>>                        CGI applications, or while a server is doing
>>                        some kind of "graceful shutdown".  Note that
>>                        a server or gateway is still allowed to invoke
>>                        the application again; this flag is only
>>                        a "suggestion" to the application that it is
>>                        unlikely to be reinvoked.
>
>wsgi.last_call seems to complicated from this.

It's precisely what you agreed to as a solution for your issue.  Granted, I 
was also surprised by how long the "official" explanation of the feature 
turned out to be.

>   Really, it's for CGI and nothing else.  Maybe just 
> wsgi.cgi?  wsgi.run_once?  I think the semantics shouldn't be any more 
> general than that.  Then we can also guarantee that it won't be called again.

I'm really reluctant to require the server to make such a guarantee.  My 
understanding of your use case is really more like, "I'm not likely to run 
you again for a while, so don't optimize for frequent execution."

Hm.  Now that I'm thinking about it more, it seems to me that this could be 
just as easily handled by application/framework-side configuration, and I'm 
inclined to remove it from the spec altogether.

>>The ``start_response()`` Callable
>>---------------------------------
>>The second parameter passed to the application object is itself a
>>two-argument callable, used to begin the HTTP response and return
>>a ``write()`` callable.
>
>"The second parameters passed to the application object (start_response) 
>is a callable, used like ``start_response(status, headers)``.

I'll work on this.

>The status argument is a string like "404 Not Found" or "200 OK".  This 
>string must be pure 7-bit ASCII, containing no control characters, and not 
>terminated with a return or linefeed.
>
>The headers argument is a sequence of ``(header_name, header_value)`` 
>tuples.  Each ``header_name`` must be a valid... (and continuing on with 
>your text).

I'll work on this.

>Though I'm not clear what "folding" means.  I'm guessing you mean:
>
>Header: blah
>     continuing Header content

Yes.

>Does the HTTP spec care about folding?  Seems like a distraction to 
>mention it.

I'll check.

>>Middleware components that transform the request or response data
>>should in general remove WSGI extension data from the ``environ``
>>that the middleware does not understand, to prevent applications
>>from inadvertently bypassing the middleware's mediation of the
>>interaction by use of a server extension.  The simplest way to do
>>this is to just delete keys from ``environ`` that are all lowercase
>>and do not begin with ``"wsgi."``, before passing the ``environ``
>>on to the application.
>
>I don't understand this.  To me it seems more reasonable that middleware 
>leave the extra arguments in place.
>
>For instance, lets say I have a URL redirecting middleware.  There's a 
>chance I need to look at the parsed form of QUERY_STRING, and I cache the 
>result as a dictionary in, say, webkit.query_vars.  That's just as valid 
>later.  Oh, well, unless someone rewrites QUERY_STRING.  So to be safe, I 
>put the query string I parsed in webkit.query_string.
>
>But maybe I have some other middleware that handles configuration.  It 
>runs after the URL parser, for localized configuration.  It doesn't 
>necessarily know about the query string, or about the other piece of 
>middleware.  And it shouldn't know about it, because what would be the 
>point of that?  They are decoupled.  But I don't want it throwing away 
>that information.
>
>In that case, it's just some lost time reparsing the URL, but I can 
>imagine more important things, and a lot of pieces of middleware where the 
>only point is that they add something to the environ dictionary. E.g., a 
>session-handling middleware.  There's not point to these if other 
>middleware is going to throw information away.
>
>If there's reliability issues -- like middleware rewriting QUERY_STRING, 
>but passing through a cached parse of the old QUERY_STRING that it didn't 
>know about -- these can be handled pretty easily.  But if one middleware 
>throws away keys it doesn't know about, it messes up the whole stack.

You're right.  The extension mechanism needs to be clearer.  Instead of 
throwing away everything, there needs to be a way to identify that a 
server-supplied value may be used in place of some WSGI functionality, so 
that middleware can remove only those items, rather than every item.

Hmmm.  Maybe we should have a 'wsgi.extensions' key that contains a 
dictionary for items that middleware *must* either understand, or not pass 
through.  If a framework or middleware author did your hypothetical query 
string parsing, he would have to place it in 'wsgi.extensions' if he did 
not implement the cross-check you describe.

Sigh.  This will probably need to be a new section on "WSGI Extensions and 
Middleware".