[Web-SIG] Python Web Modules - Version 0.5.0

Tue Feb 1 00:48:56 CET 2005

James Gardner wrote:
> Ian Bicking wrote:
> 
>> web.wsgi.error: one standard I'd like for middleware would be some key 
>> you could set that would indicate that some error handler exists, and 
>> applications further down the stack shouldn't catch unexpected 
>> exceptions (of course expected exceptions are a different matter).  
>> Then the best error handler available would eventually get the error, 
>> and process it somehow (e.g., mailing a report, displaying an error, 
>> starting a debugger, etc).  Anyway, something to think about for this.
> 
> 
> That could be useful. Presumably the middleware component nearest the 
> server is likely to have the best error handling (as you would put the 
> best error handler in a position to catch the most errors). So this 
> could be as simple as agreeing a variable name like wsgi.error for the 
> environ dictionary which the highest middleware component up the chain 
> would set to True and ones lower down wouldn't provide error handling if 
> it was already set.

Right.  Except when you don't want that ;)  Other times you may want to 
override the error handler locally; e.g., maybe you have a section of 
the site where you want to use a different error handler that shows 
exceptions to the browser (e.g., a development section).  But presumably 
you could add an option to the middleware to force it to catch 
exceptions even when the environment advised not to.

> Another thing I noticed when writing the error handler is that if an 
> application or middleware component doesn't form a header or set the 
> status correctly it can be tricky to track down where the error 
> occurred. If the application used a special object for headers and 
> status in the start_response callable which raised an error when it was 
> set with an invalid value that would make life easier.
>
> (Alternatively, if you wanted to change the way things were programmed a 
> bit you could write your application as middleware and specify a 
> terminator which set the headers and status using these special objects. 
> Probably not necessary though!)

I'm not sure I understand you here.  What's the exact situation where 
you encounter this?

>> So I was thinking that status codes should be sufficient to 
>> communicate authorization: 401 for login required, 403 for forbidden.  
>> If you are doing cookie logins (which I generally prefer from a UI 
>> perspective) the middleware can translate the 401 into a redirect to 
>> the login page.  And the 403 can turn into a nicer error page --
> 
> 
> So in a new version the authentication middleware would display a sign 
> in box if no user was signed in, the authorization middleware would 
> provide objects for the application to test authorisation and would also 
> look for headers to determine whether the application thought the user 
> was authorised and would display a sign in if not.

Basically.  If REMOTE_USER wasn't set (or was empty) and the application 
required login (based on whatever criteria it has) then it should return 
a 401 code.  The authentication middleware doesn't know if login is 
required, but it would be nice if it can tell if you are logged in 
anyway (not possible with HTTP Basic auth, but ignoring that case).

>> web.wsgi.session: I'd like to have some sort of standard for these 
>> objects, at least some aspects.  Not the details of storage, but 
>> mostly access; along the lines of web.session.manager and/or .store.  
>> I'm not sure how I feel about the manager with multiple applications, 
>> each of which has a store -- I feel like this should be part of the 
>> configuration somehow, which isn't necessarily part of the standard 
>> user-visible API.
> 
> 
> I've been thinking about the way series of applications can work 
> together, which is what the web.wsgi.environment code is about. Perhaps 
> it would be better to specify the application name in 
> web.wsgi.environment (which is more to do with configuration) so that 
> the web.wsgi.session and web.wsgi.auth objects all use the same 
> application name and then the manager becomes more redundant because a 
> store for the particular application is already created.

OK, I was trying to figure out what wsgi.environment was about.  Is it 
basically a way of indication local configuration (like a configuration 
realm or something)?  I still lack a good intuition for how 
configuration should work.

>> web.wsgi.cgi: is this safe when a piece of middleware changes 
>> QUERY_STRING or otherwise rewrites the request?  You can test for this 
>> by saving the QUERY_STRING that you originally parsed alongside the 
>> resulting FieldStorage, and then reparsing if they don't match.  You 
>> can even test for matching with "is", since you're really checking for 
>> modifications instead of equality.  The same should be possible for 
>> wsgi.input and POST requests.
> 
> 
> The web.wsgi.cgi module actually builds the FieldStorage from the 
> environ dictionary, not QUERY_STRING so this should mean that middleware 
> can do what it likes and the underlying middleware and application will 
> respond to the changes.. is this not a good way of doing it?

Well, FieldStorage looks at particular keys, and I guess the result is 
derivative of all of those.  But the keys are fairly limited -- I think 
it's just QUERY_STRING, QUERY_METHOD, CONTENT_TYPE, and CONTENT_LENGTH, 
though this could be confirmed by reading the cgi module.  So even 
though you pass a complete environment, everytime you retrieve the value 
from the environment you want to check that these values haven't changed 
(along with wsgi.input).

If I did it, I'd lazily parse the query string, and then reparse if 
those keys had changed.  I guess wsgikit.wsgilib.get_cookies is an 
example of this: http://svn.colorstudy.com/trunk/WSGIKit/wsgikit/wsgilib.py

> One other thing I've been meaning to ask.. The WSGI specification 
> currently allows no way for an application or middleware components to 
> pass custom information back up the middleware chain so that an 
> application can ask a middleware component not to perform a certain task 
> if it needs to. Communication up the chain can only be provided through 
> status, headers, exc_info and content. There could very easily also be a 
> response dictionary added as another parameter to start_response, 
> similar to environ which sent information up the chain. Was this 
> deliberately avoided so that the system wouldn't get complicated?

I was thinking about this too.  It certainly makes it simpler to make 
the response fairly plain and HTTP-like, but I can imagine lots of 
useful information that doesn't fit well into headers or response codes. 
  E.g., if you are sending a 403 error message, maybe you want to pass 
some extra information along about why it happened.  You could write 
that out as the HTML response, but then it becomes somewhat opaque if 
that gets rewritten.  Something like the extension information that gets 
put in the request environment; it's always purely optional, but there 
to allow cooperation between components.  There's no escape mechanism 
like that for the response.

Well... there is a way, actually -- you can add callbacks to the 
request.  For instance, in my session handler I add a callable to the 
request that returns the session object.  If you don't call that at all 
then the session isn't even created, and no session ID is assigned 
(assuming you didn't already have a session).  If you do call it, then 
the middleware modifies the response to add a session ID.  So there's 
really some communication from the application that effects the 
response, but it isn't being expressed as part of the response stream 
(the status, headers, and body).

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org