[Web-SIG] Latest WSGI Draft

Sun Aug 22 21:18:52 CEST 2004

Phillip J. Eby wrote:
>> That's a awfully pessimistic paragraph ;)
> 
> 
> Are you being ironic?  I'm not sure I follow you here.

I don't know if I was being ironic.  But it was just an offhanded 
comment, not a suggestion to change anything.

>>> The WSGI interface has two sides: the "server" or "gateway" side,
>>> and the "application" side.  The server side invokes a callable
>>> object that is provided by the application side.  The specifics
>>> of how that object is provided are up to the server or gateway.
>>> It is assumed that some servers or gateways will require an
>>> application's deployer to write a short script to create an
>>> instance of the server or gateway, and supply it with the
>>> application object.  Other servers and gateways may use
>>> configuration files or other mechanisms to specify where the
>>> application object should be imported from.
>>
>>
>> Maybe "gateway" is just distracting.
> 
> 
> Do you have a specific suggestion here?

Use only the term "server".

>>>     class AppClass:
>>>         """Much the same thing, but as a class"""
>>>         def __init__(self, environ, start_response):
>>>             self.environ = environ
>>>             self.start = start_response
>>>         def __iter__(self):
>>>             status = '200 OK'
>>>             headers = [('Content-type','text/plain')]
>>>             self.start(status, headers)
>>>             yield "Hello world!\n"
>>>             for i in range(1,11):
>>>                 yield "Extra line %s\n" % i
>>
>>
>> This second example confuses me.  Though as I reread it I realize more 
>> clearly what it's doing; __init__ is the callable (in essence), but 
>> self is automatically returned.  I think an instance with a __call__ 
>> method would be easier to understand.  OTOH, there's more concurrency 
>> overhead.  I dunno.  Anyway, that one confused me.
> 
> 
> Perhaps you could suggest some text to add to the docstring that would 
> have prevented your initial confusion?

I think it makes sense when you see it in action, i.e.,:

AppClass *is* the application object (*not* instances of AppClass). 
AppClass(environ, start_response) starts the response; it returns an 
instance of itself, which is an iterator that produces the content.

I see what really confused me.  Shouldn't that be more like:

class AppClass:
     def __init__(self, environ, start_response):
         self.environ = environ
         status = '200 OK'
         headers = [('Content-type', 'text/plain')]
         start_response(status, headers)
         # return self is implicit
     def __iter__(self):
         yield "Hello world!\n"
         for i in range(1, 11):
             yield "Extra line %s\n" % i

running start_response in __iter__ seems strange to me.  Maybe it's 
correct, but I expect the call sequence to be:

application(environ, start_response)
   start_response(status_code, environ) returns write()
   possible write() calls
application returns iterable
server uses iterable

In this example, the write() function only is created after you start 
the iteration.  Maybe that's fine, I'm not sure -- it's a little odd, 
because when you start the iteration you expect to be getting the body, 
but the headers haven't been sent yet.  Of course, you ensure the 
headers get sent, but it definitely confuses me.

>> I guess "function" is more specific than "callable", but it seems 
>> easier to understand.  Though honestly, I find the CGI example the 
>> easiest way to understand this, so maybe being more accurate here is 
>> fine.
> 
> 
> I've got to explain *somewhere* that these are any callable.  Maybe I 
> should preface the overview with an explanation of what "a callable" 
> means, and reinforce it once or twice in the form "such and such is a 
> callable (function, method, class, callable instance, etc.) that blah 
> blah blah".

Sure.  But it might not be that big a deal -- I think just using names 
more often might help.  "The write callable", for instance, instead of 
"a callable".

>>> ``wsgi.last_call``     This value should be true if this is expected
>>>                        to be the last invocation of the application
>>>                        in this process.  This is provided to allow
>>>                        applications to optimize their setup for
>>>                        long-running vs. short-running scenarios.
>>>                        This flag should normally only be true for
>>>                        CGI applications, or while a server is doing
>>>                        some kind of "graceful shutdown".  Note that
>>>                        a server or gateway is still allowed to invoke
>>>                        the application again; this flag is only
>>>                        a "suggestion" to the application that it is
>>>                        unlikely to be reinvoked.
>>
>>
>> wsgi.last_call seems to complicated from this.
> 
> 
> It's precisely what you agreed to as a solution for your issue.  
> Granted, I was also surprised by how long the "official" explanation of 
> the feature turned out to be.

Yes, it's what I agreed to.  But looking at the length of the 
description, I think I was wrong, it's shouldn't be that complicated to 
explain.

>>   Really, it's for CGI and nothing else.  Maybe just wsgi.cgi?  
>> wsgi.run_once?  I think the semantics shouldn't be any more general 
>> than that.  Then we can also guarantee that it won't be called again.
> 
> 
> I'm really reluctant to require the server to make such a guarantee.  My 
> understanding of your use case is really more like, "I'm not likely to 
> run you again for a while, so don't optimize for frequent execution."
> 
> Hm.  Now that I'm thinking about it more, it seems to me that this could 
> be just as easily handled by application/framework-side configuration, 
> and I'm inclined to remove it from the spec altogether.

That was initially how multithreaded and multiprocess was going to be 
handled too, but I think it's really important that those will be 
specified.  CGI is the only realistic use case for this feature, but 
it's a really common use case (since it's really just a widely supported 
standard that we are building on), and it presents a distinct set of 
problems for Python.  I don't see any reason not to just be explicit 
about being in a CGI environment -- every server will clearly know if 
it's in a CGI environment, every application can ignore it if it 
chooses, everyone will know exactly what it means in the spec.

>>> Middleware components that transform the request or response data
>>> should in general remove WSGI extension data from the ``environ``
>>> that the middleware does not understand, to prevent applications
>>> from inadvertently bypassing the middleware's mediation of the
>>> interaction by use of a server extension.  The simplest way to do
>>> this is to just delete keys from ``environ`` that are all lowercase
>>> and do not begin with ``"wsgi."``, before passing the ``environ``
>>> on to the application.
>>
>>
>> I don't understand this.  To me it seems more reasonable that 
>> middleware leave the extra arguments in place.
>>
>> For instance, lets say I have a URL redirecting middleware.  There's a 
>> chance I need to look at the parsed form of QUERY_STRING, and I cache 
>> the result as a dictionary in, say, webkit.query_vars.  That's just as 
>> valid later.  Oh, well, unless someone rewrites QUERY_STRING.  So to 
>> be safe, I put the query string I parsed in webkit.query_string.
>>
>> But maybe I have some other middleware that handles configuration.  It 
>> runs after the URL parser, for localized configuration.  It doesn't 
>> necessarily know about the query string, or about the other piece of 
>> middleware.  And it shouldn't know about it, because what would be the 
>> point of that?  They are decoupled.  But I don't want it throwing away 
>> that information.
>>
>> In that case, it's just some lost time reparsing the URL, but I can 
>> imagine more important things, and a lot of pieces of middleware where 
>> the only point is that they add something to the environ dictionary. 
>> E.g., a session-handling middleware.  There's not point to these if 
>> other middleware is going to throw information away.
>>
>> If there's reliability issues -- like middleware rewriting 
>> QUERY_STRING, but passing through a cached parse of the old 
>> QUERY_STRING that it didn't know about -- these can be handled pretty 
>> easily.  But if one middleware throws away keys it doesn't know about, 
>> it messes up the whole stack.
> 
> 
> You're right.  The extension mechanism needs to be clearer.  Instead of 
> throwing away everything, there needs to be a way to identify that a 
> server-supplied value may be used in place of some WSGI functionality, 
> so that middleware can remove only those items, rather than every item.
> 
> Hmmm.  Maybe we should have a 'wsgi.extensions' key that contains a 
> dictionary for items that middleware *must* either understand, or not 
> pass through.  If a framework or middleware author did your hypothetical 
> query string parsing, he would have to place it in 'wsgi.extensions' if 
> he did not implement the cross-check you describe.

I'm quite comfortable with solving this in on ad hoc basis.  Generally 
the issue is middleware that rewrites the environment, but some 
extension depends on a value in the environment and isn't simultaneously 
updated.  In general, keeping a note about what the value of the key was 
will work fine, in those small number of cases where it is an issue. 
Then it's up to the extension-using application (and middleware) to 
agree on a reliable way to do things, and other pieces of middleware 
don't need to worry about any of it.

I guess the problem is that someone might build in a dependency, but not 
be careful about it, and bugs would only arise in the presence of some 
middleware which the author didn't test with.  It's the same issue if 
the author doesn't set wsgi.extensions properly, though that's more 
explicit and maybe harder to miss.

> Sigh.  This will probably need to be a new section on "WSGI Extensions 
> and Middleware".

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org