From james at pythonweb.org  Tue Feb  1 00:17:11 2005
From: james at pythonweb.org (James Gardner)
Date: Tue Feb  1 00:17:31 2005
Subject: [Web-SIG] Python Web Modules - Version 0.5.0
In-Reply-To: <41FE839C.4030007@colorstudy.com>
References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com>
Message-ID: <41FEBC77.7090008@pythonweb.org>

Ian Bicking wrote:

> web.wsgi.error: one standard I'd like for middleware would be some key 
> you could set that would indicate that some error handler exists, and 
> applications further down the stack shouldn't catch unexpected 
> exceptions (of course expected exceptions are a different matter).  
> Then the best error handler available would eventually get the error, 
> and process it somehow (e.g., mailing a report, displaying an error, 
> starting a debugger, etc).  Anyway, something to think about for this.

That could be useful. Presumably the middleware component nearest the 
server is likely to have the best error handling (as you would put the 
best error handler in a position to catch the most errors). So this 
could be as simple as agreeing a variable name like wsgi.error for the 
environ dictionary which the highest middleware component up the chain 
would set to True and ones lower down wouldn't provide error handling if 
it was already set.

Another thing I noticed when writing the error handler is that if an 
application or middleware component doesn't form a header or set the 
status correctly it can be tricky to track down where the error 
occurred. If the application used a special object for headers and 
status in the start_response callable which raised an error when it was 
set with an invalid value that would make life easier.

(Alternatively, if you wanted to change the way things were programmed a 
bit you could write your application as middleware and specify a 
terminator which set the headers and status using these special objects. 
Probably not necessary though!)

> web.wsgi.auth: I've been thinking lot about this as well, particularly 
> about the external interface.  REMOTE_USER seems like a reasonable 
> enough place to put the login information.  I'd like to keep 
> authorization and authentication separate -- one middleware determines 
> who you are, another (might) determine if you are allowed access. 
> Frequently only the application really knows if you are authorized, 
> based on logic that's beyond any ability to make it generic.

Agreed, the underlying API makes this even more explicit than the 
web.wsgi.auth module.. I'll split web.wsgi.auth.Auth into two 
components, one for authentication and one for authorization. The 
existing web.wsgi.auth.Auth will just be a chain of the two components 
and then will have the same functionality.

> So I was thinking that status codes should be sufficient to 
> communicate authorization: 401 for login required, 403 for forbidden.  
> If you are doing cookie logins (which I generally prefer from a UI 
> perspective) the middleware can translate the 401 into a redirect to 
> the login page.  And the 403 can turn into a nicer error page --

So in a new version the authentication middleware would display a sign 
in box if no user was signed in, the authorization middleware would 
provide objects for the application to test authorisation and would also 
look for headers to determine whether the application thought the user 
was authorised and would display a sign in if not.

> a piece of middleware for indicating error pages would also be nice 
> (similar to Apache's ErrorDocument directive).

Agreed, I'll write one.

> web.wsgi.session: I'd like to have some sort of standard for these 
> objects, at least some aspects.  Not the details of storage, but 
> mostly access; along the lines of web.session.manager and/or .store.  
> I'm not sure how I feel about the manager with multiple applications, 
> each of which has a store -- I feel like this should be part of the 
> configuration somehow, which isn't necessarily part of the standard 
> user-visible API.

I've been thinking about the way series of applications can work 
together, which is what the web.wsgi.environment code is about. Perhaps 
it would be better to specify the application name in 
web.wsgi.environment (which is more to do with configuration) so that 
the web.wsgi.session and web.wsgi.auth objects all use the same 
application name and then the manager becomes more redundant because a 
store for the particular application is already created.

> web.wsgi.cgi: is this safe when a piece of middleware changes 
> QUERY_STRING or otherwise rewrites the request?  You can test for this 
> by saving the QUERY_STRING that you originally parsed alongside the 
> resulting FieldStorage, and then reparsing if they don't match.  You 
> can even test for matching with "is", since you're really checking for 
> modifications instead of equality.  The same should be possible for 
> wsgi.input and POST requests.

The web.wsgi.cgi module actually builds the FieldStorage from the 
environ dictionary, not QUERY_STRING so this should mean that middleware 
can do what it likes and the underlying middleware and application will 
respond to the changes.. is this not a good way of doing it?

One other thing I've been meaning to ask.. The WSGI specification 
currently allows no way for an application or middleware components to 
pass custom information back up the middleware chain so that an 
application can ask a middleware component not to perform a certain task 
if it needs to. Communication up the chain can only be provided through 
status, headers, exc_info and content. There could very easily also be a 
response dictionary added as another parameter to start_response, 
similar to environ which sent information up the chain. Was this 
deliberately avoided so that the system wouldn't get complicated?

Thanks again for your comments Ian, much appreciated.

James
--
James Gardner
http://www.pythonweb.org


From ianb at colorstudy.com  Tue Feb  1 00:28:13 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Feb  1 00:29:26 2005
Subject: [Web-SIG] WSGI middleware library
Message-ID: <41FEBF0D.2070807@colorstudy.com>

What do people think about collaborating on a kind of "standard" library 
of WSGI middleware?  (Not standard like distributed-with-Python, just 
well publicized.)  This is what I've tried to put together a little with 
WSGIKit, though not all parts of it would apply.  And other people are, 
I think, starting to develop the same things, perhaps with some overlap. 
  Maybe we can pool our efforts together.

The criteria I'd consider:

* Should be something we could do Right, i.e., can become "complete". 
E.g., a proxying WSGI application could be complete.  A commenting 
system can't.

* Shouldn't involve much UI.  Mostly because it can't be very complete.

* Shouldn't be tied to anything very specific.  E.g., if there's a 
templating middleware (um, don't ask me exactly what that would look 
like) it shouldn't be bound to any particular templating language. 
Those kind of bindings should probably be part of the upstream libraries.

* Provide robust architecture more than a pleasant API.  E.g., WSGIKit 
implements Webware (more or less), but when you use that you see very 
little of the middleware that WSGIKit uses.  And that middleware looks 
kind of ugly, what with the environment and string keys and the 
sometimes funny semantics.

* Be really well documented and stable (at least once we come to 
consensus on an interface), so that people could reliably and easily 
used these middleware components in their frameworks.

* Testable and tested.

Some candidates I imagine:

* Sessions middleware
* Logging middleware/library (based on the standard library of course)
* Error reporting middleware/library
* Test frameworks (?)
* A file application (handling If-Modified-Since, etc)
* A proxy application
* Libraries for parsing query strings and all that.  Most of what is in 
Phillip's wsgiref.
* Authentication (this would be on the ambitious end)
* URL parsers (several, but maybe we could distill this down to a few 
primary models for parsing)
* And maybe a few of the more boring servers, like the CGI server, which 
will otherwise be homeless (or widely repeated).

I'd expect everyone involved to have ulterior motives, i.e., they'd all 
have their own separate pet projects and whatnot, and wouldn't be 
looking to this library (alone) to solve all their needs.  And that 
would be good, another part of what would keep this from being Yet 
Another Framework.  Together this should be attractive to people who 
like to delete code ;)  (Code deleted is code debugged!)

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From ianb at colorstudy.com  Tue Feb  1 00:48:56 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Feb  1 00:50:14 2005
Subject: [Web-SIG] Python Web Modules - Version 0.5.0
In-Reply-To: <41FEBC77.7090008@pythonweb.org>
References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com>
	<41FEBC77.7090008@pythonweb.org>
Message-ID: <41FEC3E8.3000902@colorstudy.com>

James Gardner wrote:
> Ian Bicking wrote:
> 
>> web.wsgi.error: one standard I'd like for middleware would be some key 
>> you could set that would indicate that some error handler exists, and 
>> applications further down the stack shouldn't catch unexpected 
>> exceptions (of course expected exceptions are a different matter).  
>> Then the best error handler available would eventually get the error, 
>> and process it somehow (e.g., mailing a report, displaying an error, 
>> starting a debugger, etc).  Anyway, something to think about for this.
> 
> 
> That could be useful. Presumably the middleware component nearest the 
> server is likely to have the best error handling (as you would put the 
> best error handler in a position to catch the most errors). So this 
> could be as simple as agreeing a variable name like wsgi.error for the 
> environ dictionary which the highest middleware component up the chain 
> would set to True and ones lower down wouldn't provide error handling if 
> it was already set.

Right.  Except when you don't want that ;)  Other times you may want to 
override the error handler locally; e.g., maybe you have a section of 
the site where you want to use a different error handler that shows 
exceptions to the browser (e.g., a development section).  But presumably 
you could add an option to the middleware to force it to catch 
exceptions even when the environment advised not to.

> Another thing I noticed when writing the error handler is that if an 
> application or middleware component doesn't form a header or set the 
> status correctly it can be tricky to track down where the error 
> occurred. If the application used a special object for headers and 
> status in the start_response callable which raised an error when it was 
> set with an invalid value that would make life easier.
>
> (Alternatively, if you wanted to change the way things were programmed a 
> bit you could write your application as middleware and specify a 
> terminator which set the headers and status using these special objects. 
> Probably not necessary though!)

I'm not sure I understand you here.  What's the exact situation where 
you encounter this?

>> So I was thinking that status codes should be sufficient to 
>> communicate authorization: 401 for login required, 403 for forbidden.  
>> If you are doing cookie logins (which I generally prefer from a UI 
>> perspective) the middleware can translate the 401 into a redirect to 
>> the login page.  And the 403 can turn into a nicer error page --
> 
> 
> So in a new version the authentication middleware would display a sign 
> in box if no user was signed in, the authorization middleware would 
> provide objects for the application to test authorisation and would also 
> look for headers to determine whether the application thought the user 
> was authorised and would display a sign in if not.

Basically.  If REMOTE_USER wasn't set (or was empty) and the application 
required login (based on whatever criteria it has) then it should return 
a 401 code.  The authentication middleware doesn't know if login is 
required, but it would be nice if it can tell if you are logged in 
anyway (not possible with HTTP Basic auth, but ignoring that case).

>> web.wsgi.session: I'd like to have some sort of standard for these 
>> objects, at least some aspects.  Not the details of storage, but 
>> mostly access; along the lines of web.session.manager and/or .store.  
>> I'm not sure how I feel about the manager with multiple applications, 
>> each of which has a store -- I feel like this should be part of the 
>> configuration somehow, which isn't necessarily part of the standard 
>> user-visible API.
> 
> 
> I've been thinking about the way series of applications can work 
> together, which is what the web.wsgi.environment code is about. Perhaps 
> it would be better to specify the application name in 
> web.wsgi.environment (which is more to do with configuration) so that 
> the web.wsgi.session and web.wsgi.auth objects all use the same 
> application name and then the manager becomes more redundant because a 
> store for the particular application is already created.

OK, I was trying to figure out what wsgi.environment was about.  Is it 
basically a way of indication local configuration (like a configuration 
realm or something)?  I still lack a good intuition for how 
configuration should work.

>> web.wsgi.cgi: is this safe when a piece of middleware changes 
>> QUERY_STRING or otherwise rewrites the request?  You can test for this 
>> by saving the QUERY_STRING that you originally parsed alongside the 
>> resulting FieldStorage, and then reparsing if they don't match.  You 
>> can even test for matching with "is", since you're really checking for 
>> modifications instead of equality.  The same should be possible for 
>> wsgi.input and POST requests.
> 
> 
> The web.wsgi.cgi module actually builds the FieldStorage from the 
> environ dictionary, not QUERY_STRING so this should mean that middleware 
> can do what it likes and the underlying middleware and application will 
> respond to the changes.. is this not a good way of doing it?

Well, FieldStorage looks at particular keys, and I guess the result is 
derivative of all of those.  But the keys are fairly limited -- I think 
it's just QUERY_STRING, QUERY_METHOD, CONTENT_TYPE, and CONTENT_LENGTH, 
though this could be confirmed by reading the cgi module.  So even 
though you pass a complete environment, everytime you retrieve the value 
from the environment you want to check that these values haven't changed 
(along with wsgi.input).

If I did it, I'd lazily parse the query string, and then reparse if 
those keys had changed.  I guess wsgikit.wsgilib.get_cookies is an 
example of this: http://svn.colorstudy.com/trunk/WSGIKit/wsgikit/wsgilib.py

> One other thing I've been meaning to ask.. The WSGI specification 
> currently allows no way for an application or middleware components to 
> pass custom information back up the middleware chain so that an 
> application can ask a middleware component not to perform a certain task 
> if it needs to. Communication up the chain can only be provided through 
> status, headers, exc_info and content. There could very easily also be a 
> response dictionary added as another parameter to start_response, 
> similar to environ which sent information up the chain. Was this 
> deliberately avoided so that the system wouldn't get complicated?

I was thinking about this too.  It certainly makes it simpler to make 
the response fairly plain and HTTP-like, but I can imagine lots of 
useful information that doesn't fit well into headers or response codes. 
  E.g., if you are sending a 403 error message, maybe you want to pass 
some extra information along about why it happened.  You could write 
that out as the HTML response, but then it becomes somewhat opaque if 
that gets rewritten.  Something like the extension information that gets 
put in the request environment; it's always purely optional, but there 
to allow cooperation between components.  There's no escape mechanism 
like that for the response.

Well... there is a way, actually -- you can add callbacks to the 
request.  For instance, in my session handler I add a callable to the 
request that returns the session object.  If you don't call that at all 
then the session isn't even created, and no session ID is assigned 
(assuming you didn't already have a session).  If you do call it, then 
the middleware modifies the response to add a session ID.  So there's 
really some communication from the application that effects the 
response, but it isn't being expressed as part of the response stream 
(the status, headers, and body).

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From james at pythonweb.org  Tue Feb  1 17:30:01 2005
From: james at pythonweb.org (James Gardner)
Date: Tue Feb  1 17:30:08 2005
Subject: [Web-SIG] Python Web Modules - Version 0.5.0
In-Reply-To: <41FEC3E8.3000902@colorstudy.com>
References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com>
	<41FEBC77.7090008@pythonweb.org> <41FEC3E8.3000902@colorstudy.com>
Message-ID: <41FFAE89.6020602@pythonweb.org>

Ian Bicking wrote:

>> Another thing I noticed when writing the error handler is that if an 
>> application or middleware component doesn't form a header or set the 
>> status correctly it can be tricky to track down where the error 
>> occurred. If the application used a special object for headers and 
>> status in the start_response callable which raised an error when it 
>> was set with an invalid value that would make life easier.
>
> I'm not sure I understand you here.  What's the exact situation where 
> you encounter this?

Well, when I was programming the session middleware I appended a tuple 
of the wrong length to the headers used in start_response. This wasn't 
picked up until the error handling module by which time I had no idea 
which piece of middleware had appended the faulty header. If header was 
an object that behaved like a list but only allowed correctly formed 
headers to be appended this error would have been picked up where it 
happened.

>> One other thing I've been meaning to ask.. The WSGI specification 
>> currently allows no way for an application or middleware components 
>> to pass custom information back up the middleware chain so that an 
>> application can ask a middleware component not to perform a certain 
>> task if it needs to. Communication up the chain can only be provided 
>> through status, headers, exc_info and content. There could very 
>> easily also be a response dictionary added as another parameter to 
>> start_response, similar to environ which sent information up the 
>> chain. Was this deliberately avoided so that the system wouldn't get 
>> complicated?
>
> I was thinking about this too.  It certainly makes it simpler to make 
> the response fairly plain and HTTP-like, but I can imagine lots of 
> useful information that doesn't fit well into headers or response 
> codes.  E.g., if you are sending a 403 error message, maybe you want 
> to pass some extra information along about why it happened.  You could 
> write that out as the HTML response, but then it becomes somewhat 
> opaque if that gets rewritten.  Something like the extension 
> information that gets put in the request environment; it's always 
> purely optional, but there to allow cooperation between components.  
> There's no escape mechanism like that for the response.
>
> Well... there is a way, actually -- you can add callbacks to the 
> request.  For instance, in my session handler I add a callable to the 
> request that returns the session object.  If you don't call that at 
> all then the session isn't even created, and no session ID is assigned 
> (assuming you didn't already have a session).  If you do call it, then 
> the middleware modifies the response to add a session ID.  So there's 
> really some communication from the application that effects the 
> response, but it isn't being expressed as part of the response stream 
> (the status, headers, and body).

That's true and useful in the session case. In fact any middleware that 
needed the session store could still call the callable, they'd just need 
to check if it had already been called (or the callable itself could 
keep track of whether it had been called in fact). It does mean that 
other middleware components can't get access to the same information 
though unless they all chain callables down the middleware stack.

It doesn't really work for your first example with the error information 
though since the information should be available to all middleware 
components. In that example though couldn't the application send error 
information with exc_info and the auth middleware catch it or am I 
missing something?

Do you think there is mileage to be gained from adding a response 
dictionary to start_response as that would be a simple way of sending 
information back? It would break if existing WSGI apps didn't pass on 
the response dictionary though.

James
--
http://www.pythonweb.org/

From james at pythonweb.org  Tue Feb  1 17:53:01 2005
From: james at pythonweb.org (James Gardner)
Date: Tue Feb  1 17:53:02 2005
Subject: [Web-SIG] WSGI middleware library
In-Reply-To: <41FEBF0D.2070807@colorstudy.com>
References: <41FEBF0D.2070807@colorstudy.com>
Message-ID: <41FFB3ED.1070505@pythonweb.org>

Ian Bicking wrote:

> What do people think about collaborating on a kind of "standard" 
> library of WSGI middleware?  (Not standard like 
> distributed-with-Python, just well publicized.)  This is what I've 
> tried to put together a little with WSGIKit, though not all parts of 
> it would apply.  And other people are, I think, starting to develop 
> the same things, perhaps with some overlap.  Maybe we can pool our 
> efforts together.

I think this is a good idea. There are sometimes different approaches 
that can be taken to implementing similar functionality within WSGI and 
there is usually a best one. If we share ideas we are more likely to 
come up with the better solutions. There are also a lot of things which 
have only one good solution and there is no point in duplicating work.

> I'd expect everyone involved to have ulterior motives, i.e., they'd 
> all have their own separate pet projects and whatnot, and wouldn't be 
> looking to this library (alone) to solve all their needs.  And that 
> would be good, another part of what would keep this from being Yet 
> Another Framework.  Together this should be attractive to people who 
> like to delete code ;)  (Code deleted is code debugged!)

If middleware components are built as classes it would be easy for 
implementors to derive their own classes from the standard ones to 
implement say session storage for their particular framework so I think 
this would be of benefit to everyone and wouldn't necessarily result in 
a lot of people pulling in different directions. I'd certainly find it 
helpful.

What might also be useful is a guide to writing WSGI middleware and 
applications with examples of all the different ways of doing common 
things incorporating any ideas or tips people have found useful whilst 
writing their implementations. Perhaps we could start this on the wiki?

James
--
http://www.pythonweb.org/
From ianb at colorstudy.com  Tue Feb  1 18:15:47 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Feb  1 18:17:07 2005
Subject: [Web-SIG] Python Web Modules - Version 0.5.0
In-Reply-To: <41FFAE89.6020602@pythonweb.org>
References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com>
	<41FEBC77.7090008@pythonweb.org> <41FEC3E8.3000902@colorstudy.com>
	<41FFAE89.6020602@pythonweb.org>
Message-ID: <41FFB943.70001@colorstudy.com>

James Gardner wrote:
> Ian Bicking wrote:
> 
>>> Another thing I noticed when writing the error handler is that if an 
>>> application or middleware component doesn't form a header or set the 
>>> status correctly it can be tricky to track down where the error 
>>> occurred. If the application used a special object for headers and 
>>> status in the start_response callable which raised an error when it 
>>> was set with an invalid value that would make life easier.
>>
>>
>> I'm not sure I understand you here.  What's the exact situation where 
>> you encounter this?
> 
> Well, when I was programming the session middleware I appended a tuple 
> of the wrong length to the headers used in start_response. This wasn't 
> picked up until the error handling module by which time I had no idea 
> which piece of middleware had appended the faulty header. If header was 
> an object that behaved like a list but only allowed correctly formed 
> headers to be appended this error would have been picked up where it 
> happened.

That seems like a complicated way to deal with the problem.  If it's 
just for debugging you can add the wsgikit.lint middleware, and it 
checks for most of these issues without actually effecting the server or 
application.  It does specifically check for the headers being a list of 
tuples of length 2.

>>> One other thing I've been meaning to ask.. The WSGI specification 
>>> currently allows no way for an application or middleware components 
>>> to pass custom information back up the middleware chain so that an 
>>> application can ask a middleware component not to perform a certain 
>>> task if it needs to. Communication up the chain can only be provided 
>>> through status, headers, exc_info and content. There could very 
>>> easily also be a response dictionary added as another parameter to 
>>> start_response, similar to environ which sent information up the 
>>> chain. Was this deliberately avoided so that the system wouldn't get 
>>> complicated?
>>
>>
>> I was thinking about this too.  It certainly makes it simpler to make 
>> the response fairly plain and HTTP-like, but I can imagine lots of 
>> useful information that doesn't fit well into headers or response 
>> codes.  E.g., if you are sending a 403 error message, maybe you want 
>> to pass some extra information along about why it happened.  You could 
>> write that out as the HTML response, but then it becomes somewhat 
>> opaque if that gets rewritten.  Something like the extension 
>> information that gets put in the request environment; it's always 
>> purely optional, but there to allow cooperation between components.  
>> There's no escape mechanism like that for the response.
>>
>> Well... there is a way, actually -- you can add callbacks to the 
>> request.  For instance, in my session handler I add a callable to the 
>> request that returns the session object.  If you don't call that at 
>> all then the session isn't even created, and no session ID is assigned 
>> (assuming you didn't already have a session).  If you do call it, then 
>> the middleware modifies the response to add a session ID.  So there's 
>> really some communication from the application that effects the 
>> response, but it isn't being expressed as part of the response stream 
>> (the status, headers, and body).
> 
> 
> That's true and useful in the session case. In fact any middleware that 
> needed the session store could still call the callable, they'd just need 
> to check if it had already been called (or the callable itself could 
> keep track of whether it had been called in fact). 

Yes, it keeps track, and each time you call the session-creator that's 
in the environment it returns the same session object (but that object 
is created lazily).

> It does mean that 
> other middleware components can't get access to the same information 
> though unless they all chain callables down the middleware stack.

Perhaps instead of it being a callable it could be an object, and could 
support methods to check, for instance, if a session had been created 
without actually creating one.

> It doesn't really work for your first example with the error information 
> though since the information should be available to all middleware 
> components. In that example though couldn't the application send error 
> information with exc_info and the auth middleware catch it or am I 
> missing something?

It could work for error information.  E.g.:

def middleware(application):
     def error_app(environ, start_response):
         if not environ.has_key('wsgikit.errorchecker'):
             checker = environ['wsgikit.errorchecker'] = ErrorChecker()
         try:
             return application(environ, start_response)
         except:
             exc_info = sys.exc_info()
             return checker.respond_to_exception(exc_info)
     return error_app

Then ErrorChecker is an instance you could add information to at any 
level of the application.  ErrorChecker, in turn, could add information 
to another component, e.g., the ErrorDocument-like middleware.

> Do you think there is mileage to be gained from adding a response 
> dictionary to start_response as that would be a simple way of sending 
> information back? It would break if existing WSGI apps didn't pass on 
> the response dictionary though.

I think extending the request dictionary like this is sufficient.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From titus at caltech.edu  Tue Feb  1 19:23:29 2005
From: titus at caltech.edu (Titus Brown)
Date: Tue Feb  1 19:23:34 2005
Subject: [Web-SIG] Fun with WSGI -- commenting middleware.
In-Reply-To: <41FE7DC9.3080100@colorstudy.com>
References: <20050130024447.GA10409@caltech.edu>
	<41FE7DC9.3080100@colorstudy.com>
Message-ID: <20050201182329.GA24982@caltech.edu>

-> >I sat down today to hack out a simple commenting system for HTML
-> >articles, and ended up using WSGI to implement a pipe-style solution.
-> >
-> >You can see the results at
-> >
-> >	http://www.idyll.org/~t/articles.cgi/
-> >
-> >This CGI script serves HTML files from a directory hierarchy.  Anyone
-> >can attach a comment to any HTML file served by the script.
-> 
-> Spiffy.  It would be neat to plug this into a WSGI application that 
-> served as a proxy (redisplaying pages fetched from another location). 
-> Then you could point it at the Python documentation and get that 
-> php.net-like commenting that people are always asking for; it would 
-> probably be good to make the commenting more granular, but it's 
-> interesting to be able to develop the different parts so separately.

I thought about this a bit more.  I like the proxy idea (and will
implement it next time I have the urge to do some light coding).  For
the python docs, though, wouldn't it be better to just host the files
on the same machine?

I will probably develop a simple Quixote application to wrap the
commenting code, too; having all this in CGI will get annoying,
if I do anything more complex than what I'm doing now.

-> Actually, I was just going to convert this silly little web-based image 
-> viewer I have to WSGI, and with this I could get a free commenting 
-> system.  Hmm...

The back-end is pretty lousy -- it's just a pickled dictionary of
'Comment' classes -- but that's modular, of course.  I'll spruce
up the commenting middleware itself & document that, and then make
it directly available via DARCS.

I'd be interested in people's opinions on how to format the entries &
safeguard against XSS hacks.  Right now I'm just pushing the exact
HTML they wrote onto the pages, which strikes me as a Bad Idea.

cheers,
--titus
From ianb at colorstudy.com  Tue Feb  1 22:15:58 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Feb  1 22:17:16 2005
Subject: [Web-SIG] Fun with WSGI -- commenting middleware.
In-Reply-To: <20050201182329.GA24982@caltech.edu>
References: <20050130024447.GA10409@caltech.edu>
	<41FE7DC9.3080100@colorstudy.com>
	<20050201182329.GA24982@caltech.edu>
Message-ID: <41FFF18E.9080902@colorstudy.com>

Titus Brown wrote:
> I thought about this a bit more.  I like the proxy idea (and will
> implement it next time I have the urge to do some light coding).  For
> the python docs, though, wouldn't it be better to just host the files
> on the same machine?

Yes, that's possible too, especially since they are all completely 
static and fully rendered.  Probably easier, and also implemented 
already ;)   I'm sure there's others, but wsgikit.urlparser serves 
static files reasonably well (wsgikit.wsgilib.send_file could use some 
work to be more efficient).

> I will probably develop a simple Quixote application to wrap the
> commenting code, too; having all this in CGI will get annoying,
> if I do anything more complex than what I'm doing now.

At one time I did a lot of this kind of thing where you'd read a page 
then fiddle with the output.  It always had some holes, but it's an 
interesting technique, and one I come back to often.

It would be nice to have a mini-framework for this sort of thing, that 
hides a bit of the WSGI fiddling you have to do.  I.e., the framework 
packages up the request (which contains important information like the 
requested URL) and the response, and it gives it to some hook to munge 
the response (like adding comments).  Another one might run the output 
through tidy and tack errors and warnings at the bottom of the page.

Some sort of URL escape would also be good -- i.e., if your munging 
middleware is at /comment_system, then maybe you could tell it to 
redirect /comment_system/foo/* to another application, and that 
application would handle the form action for comments.  That's easy to 
imagine as a Quixote app or something; but the munging bit isn't as easy.

It would be easier if there was a function (which there might be) that 
could turn the WSGI request into a Quixote request object without 
bringing the rest of the framework in.  Then the munging portion 
wouldn't be a Quixote application, per se, but it would look quite similar.

Or, you could turn one request into two, sending the output of the first 
application as input to a second application, e.g., as a POST request 
where the body and headers are put into some fields.  Then it could be a 
normal application, but it seems like a complex way to get there. 
Though... maybe it actually is the best way.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From skink at evhr.net  Fri Feb  4 01:15:30 2005
From: skink at evhr.net (Fabien Schwob)
Date: Fri Feb  4 01:15:42 2005
Subject: [Web-SIG] Web Client
Message-ID: <4202BEA2.9000204@evhr.net>

Hello,

I'm currently trying to retrieve a webpage in order to extract 
information from it. The problem is that this page is _behind_ a POST 
formular.

Does someone know a module or a tutorial that can help me ?

Thanks

-- 
Fabien
From titus at caltech.edu  Fri Feb  4 02:32:52 2005
From: titus at caltech.edu (Titus Brown)
Date: Fri Feb  4 02:32:58 2005
Subject: [Web-SIG] Web Client
In-Reply-To: <4202BEA2.9000204@evhr.net>
References: <4202BEA2.9000204@evhr.net>
Message-ID: <20050204013251.GA31349@caltech.edu>

-> I'm currently trying to retrieve a webpage in order to extract 
-> information from it. The problem is that this page is _behind_ a POST 
-> formular.
-> 
-> Does someone know a module or a tutorial that can help me ?

There are several: urllib2 is probably the place to start.  Check
out this post, which is full of links ;).

http://mail.python.org/pipermail/python-list/2004-September/238739.html

--titus
From ianb at colorstudy.com  Fri Feb  4 05:14:21 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Feb  4 05:14:08 2005
Subject: [Web-SIG] WSGIKit/Webware/WSGI sprint
Message-ID: <4202F69D.9010004@colorstudy.com>

I've added a WSGIKit/Webware/WSGI sprint to the Wiki, thus officially 
registering us.  If you are interested in coming please sign your name:

   http://python.org/moin/WsgiKitSprint

If you are interested but you aren't sure if you can make it, then sign 
your name and say you aren't sure.  I'm not sure if I'll be able to do 
Saturday and Sunday (I can for sure do Monday and Tuesday) -- I'll see 
what other people can do, and then we can figure out the exact schedule 
later.

So far the interest in the sprint has come from the Webware list, but I 
want to give WSGIKit parity with Webware's features through 
framework-neutral WSGI middleware, so anyone who is interested in Python 
framework development and WSGI is very welcome.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From titus at caltech.edu  Fri Feb  4 18:45:41 2005
From: titus at caltech.edu (Titus Brown)
Date: Fri Feb  4 18:45:44 2005
Subject: [Web-SIG] WSGI middleware library
In-Reply-To: <41FEBF0D.2070807@colorstudy.com>
References: <41FEBF0D.2070807@colorstudy.com>
Message-ID: <20050204174541.GA20145@caltech.edu>

-> What do people think about collaborating on a kind of "standard" library 
-> of WSGI middleware?

Hi, Ian,

ok, here's another response ;).

I slept on it a bit, and I would like to suggest one modification: make
it a cookbook of examples, rather than a library.

This implies that we don't need to have a standard naming scheme or a
common coding style to the components, and there can be redundancy --
multiple examples overlapping in functionality.  It also means that
there is room for "incomplete" solutions, which are IMO of great
value even just as stubs.  Such code can be isolated and used piecemeal,
independently of the rest of the library. And, finally, it means that
code can be designed strictly for functionality rather than for
extensibility.

I make this suggestion for two reasons: first of all, I'd be more
interested in contributing code to a cookbook than to a library,
for the above reasons.  And, secondly, my limited experience with
example code I've posted suggests that people are primarily interested
in a complete, functioning example that's isolated from other
code.

I do think a test harness (to make sure that the middleware is WSGI
compliant) and a documentation standard (reST?  In each directory?
or ...?) would be a good idea.

As immediate candidates for inclusion I suggest:

* a simple wsgi-passthrough middleware, that "handles" the data without
	modifying it.  (The idea is to provide hooks where I/O *can*
	be modified.)  Most of my time in wsgiComment was spent figuring
	out how to get that functionality.

* the CGI server from the PEP.

I can submit nicely formatted versions of these if you're interested in
proceeding immediately; I'd also be happy to host a Darcs repository for
the stuff ;).

cheers,
--titus
From ianb at colorstudy.com  Fri Feb  4 19:04:57 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Feb  4 19:06:33 2005
Subject: [Web-SIG] WSGI middleware library
In-Reply-To: <20050204174541.GA20145@caltech.edu>
References: <41FEBF0D.2070807@colorstudy.com>
	<20050204174541.GA20145@caltech.edu>
Message-ID: <4203B949.9020100@colorstudy.com>

Titus Brown wrote:
> -> What do people think about collaborating on a kind of "standard" library 
> -> of WSGI middleware?
> 
> Hi, Ian,
> 
> ok, here's another response ;).
> 
> I slept on it a bit, and I would like to suggest one modification: make
> it a cookbook of examples, rather than a library.
> 
> This implies that we don't need to have a standard naming scheme or a
> common coding style to the components, and there can be redundancy --
> multiple examples overlapping in functionality.  It also means that
> there is room for "incomplete" solutions, which are IMO of great
> value even just as stubs.  Such code can be isolated and used piecemeal,
> independently of the rest of the library. And, finally, it means that
> code can be designed strictly for functionality rather than for
> extensibility.

Obviously some of the solutions will be incomplete for a while -- the 
development is a process.  And there's nothing keeping us from having a 
contrib/ directory in the project, which could contain any kind of 
example or tool that might seem useful.  There's no reason to exclude 
anything useful, but putting code in a library implies some committment 
to the API and functionality, which isn't appropriate for some code. 
That can largely be solved through documentation and other metadata 
(like the directory layout).

As for extensibility... well, hopefully some pieces won't require much 
extensibility besides really obvious hooks that you'd want to include 
anyway.  And hopefully those would stablize once a few people tried to 
use a piece of middleware and suggested improvements -- part of why I 
want to do this collaboratively is because predicting places for 
extension tends to be inaccurate, while waiting for people to use code 
and find they require a place for extension usually works better.

> I make this suggestion for two reasons: first of all, I'd be more
> interested in contributing code to a cookbook than to a library,
> for the above reasons.  And, secondly, my limited experience with
> example code I've posted suggests that people are primarily interested
> in a complete, functioning example that's isolated from other
> code.

To a degree, I would hope we'd have functioning examples by design -- 
certainly a smaller number of dependencies will make the libraries more 
accessible.  At the same time, though, I want to actually *use* the 
results.  E.g., there's things I'd like to move from WSGIKit to this 
library; but if this isn't a real library then all I can do is copy 
items and maybe keep them in sync in the future, but I can't every *use* 
them because a cookbook isn't stable or even packaged.

> I do think a test harness (to make sure that the middleware is WSGI
> compliant) and a documentation standard (reST?  In each directory?
> or ...?) would be a good idea.

wsgikit.lint does some compliance testing, when used in conjunction with 
other tests.  There's no general way to poke at middleware or 
applications, so we have to rely on specific code to do the poking while 
another piece of code (lint) makes sure everything goes through properly.

Other parts of a framework would certainly be useful.  Adding a wsgi: 
method to, say, mechanize or urllib2 would be nice; it would save you 
from having to do any server setup and test the WSGI application directly.

> As immediate candidates for inclusion I suggest:
> 
> * a simple wsgi-passthrough middleware, that "handles" the data without
> 	modifying it.  (The idea is to provide hooks where I/O *can*
> 	be modified.)  Most of my time in wsgiComment was spent figuring
> 	out how to get that functionality.
> 
> * the CGI server from the PEP.
> 
> I can submit nicely formatted versions of these if you're interested in
> proceeding immediately; I'd also be happy to host a Darcs repository for
> the stuff ;).

I was thinking of putting it on svn://w4py.org; I'm a bit partial to a 
centralized repository for this sort of thing, since it encourages 
continuous integration and maybe is a bit more transparent.  And svn is 
pretty common at this point.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From hex-dump at hotmail.com  Sat Feb  5 12:07:48 2005
From: hex-dump at hotmail.com (Mark Rees)
Date: Sat Feb  5 12:09:01 2005
Subject: [Web-SIG] Ann:ISAPI-WSGI 0.4 Beta
Message-ID: <BAY21-F23FC4BE8EFFA8E37256AC49B710@phx.gbl>

Hello everyone,

I am happy to announce the release of ISAPI-WSGI 0.4 beta.

ISAPI-WSGI will (hopefully) allow any WSGI application to run inside a 
windows webserver that supports ISAPI. I believe that it meets the 
requirements of the WSGI PEP. It has been only tested against the examples 
from the PEP, Ian Bickings' echo example from wsgi-webkit, and Titus Browns' 
WSGI enabled Simple Commenting System under IIS 5.1. It has one major 
limitation being that it is only single threaded. I am currently working on 
a fully threaded version, but wanted to release it now so others could have 
a look at it. I am interested in any feedback, suggestions or bug reports.

See http://isapi-wsgi.python-hosting.com/wiki/DocsPage for info and get the 
python source & some examples at  
http://isapi-wsgi.python-hosting.com/wiki/ISAPISimpleHandler-0.4-beta

Regards

Mark Rees

_________________________________________________________________
Sell your car for $9 on carpoint.com.au  
http://server-au.imrworldwide.com/cgi-bin/b?cg=link&ci=ninemsn&tu=http://carpoint.ninemsn.com.au?refid=hotmail_tagline

From titus at caltech.edu  Sun Feb 13 08:51:08 2005
From: titus at caltech.edu (Titus Brown)
Date: Sun Feb 13 08:51:11 2005
Subject: [Web-SIG] wsgiMemcached and wsgiAdvogato.
Message-ID: <20050213075108.GA2246@caltech.edu>

Hi all,

I continued my hobby of implementing simple WSGI apps for fun,
and implemented one piece of middleware, wsgiMemcached, and
once piece of endware, wsgiPullAdvogato.

wsgiMemcached uses the Python API to memcached,

	http://www.danga.com/memcached/

to cache pages by their URL (according to PATH_INFO).

wsgiPullAdvogato uses the XML-RPC API to advogato.org to pull down
diary entries from advogato.org, e.g.

	http://issola.caltech.edu/~t/qwsgi/wsgi-cgi-gw.cgi/titus/50

pulls down the 50th entry from my diary.  Warning -- it's not very
error tolerant ;).

Both are available off of
	
	http://darcs.idyll.org/~t/projects/

under the 'wsgiMisc' project.  See the 'wsgi-cgi-gw.cgi' script
for an example use.

--

The main problem I ran into with caching was determining when a cache
entry was stale.  At the moment I implemented a simple function
'app.fresher_than(path_info, time_val)' that returns True if the
cache entry should be discarded & regenerated.  Unfortunately
this function must then be implemented by the downstream app.  Any
thoughts/suggestions on this?

It sure is fun to be able to chain applications like this... it's also
nice to be able to switch between CGI and SCGI with no trouble
whatsoever.

cheers,
--titus
From ianb at colorstudy.com  Sun Feb 13 19:20:57 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun Feb 13 19:21:10 2005
Subject: [Web-SIG] wsgiMemcached and wsgiAdvogato.
In-Reply-To: <20050213075108.GA2246@caltech.edu>
References: <20050213075108.GA2246@caltech.edu>
Message-ID: <01A20F74-7DEC-11D9-AFD1-000393985968@colorstudy.com>

On Feb 13, 2005, at 1:51 AM, Titus Brown wrote:
> The main problem I ran into with caching was determining when a cache
> entry was stale.  At the moment I implemented a simple function
> 'app.fresher_than(path_info, time_val)' that returns True if the
> cache entry should be discarded & regenerated.  Unfortunately
> this function must then be implemented by the downstream app.  Any
> thoughts/suggestions on this?

memcached is multiprocess/multiserver, right?  If it is, that 
certainly makes things more complicated.

First, I guess there's all the cache-controlling headers.  I always 
found those a little crude, though, as they are all predictive (you 
have to guess how long the cache is valid).  The Vary header is 
interesting, though, since it allows you to indicate other headers that 
the content is derivative of.  Maybe also interesting if you also 
consider WSGI extension headers -- though if you allow that there's 
other issues, like will you hang onto references of objects, and will 
you compare with is or ==, etc.

Maybe a better method would be to emphasize forced expiration of the 
cache.  You could add something to the request that allowed the 
application to expire the value at another URL.  Of course, when you 
add Vary into the mix, you have to allow expiring a URL with specific 
headers.  Or even more complicated -- so there needs to be an interface 
to iterate through some portion of the cache and optionally expire 
things the application encounters.

And of course the expiration may not happen because of a request, it 
might happen outside of WSGI (like some timed task that updates some 
values on the backend), so there needs to be a non-WSGI way to expire 
the cache as well.  Hmm... that all probably makes it much more 
complicated...

--
Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org

From colin at owlfish.com  Tue Feb 15 21:48:01 2005
From: colin at owlfish.com (Colin Stewart)
Date: Tue Feb 15 21:48:18 2005
Subject: [Web-SIG] ANN: WSGI Utils 0.4
Message-ID: <1108500481.6779.58.camel@roll>

Hi,

I've released a new version of WSGI Utils.  This solves the problem
where the server port number would be given to the WSGI application as a
number instead of a string.  There's also a few other fixes, and the
WSGI Adaptor now supports redirection.

Available from: http://www.owlfish.com/software/wsgiutils/

Colin.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20050215/00b36bcd/attachment.htm
From tsoehnli at gmu.edu  Thu Feb 17 16:28:50 2005
From: tsoehnli at gmu.edu (Timothy Soehnlin)
Date: Thu Feb 17 16:20:28 2005
Subject: [Web-SIG] A new framework: PyTML
Message-ID: <200502171528.50449.tsoehnli@gmu.edu>

Hello All,

 I have currently created a new web framework that works off of xml (for data 
storage), and python for content manipulation.  The code and content are 
completely seperated, and html is rendered through merging different blocks 
together from the xml files.  It is growing quickly to maturity, but as 
always I would like the opinions/thoughts/ideas of other members in the field 
in order to make this project the best it possibly can be.  Thank you.

      Timothy Soehnlin
-- 
I would rather be known as a Christian
 and despised, than to be overlooked,
  and thought of as one of the world.
From tsoehnli at gmu.edu  Thu Feb 17 16:31:23 2005
From: tsoehnli at gmu.edu (Timothy Soehnlin)
Date: Thu Feb 17 16:22:58 2005
Subject: [Web-SIG] A new framework: PyTML(fixed)
Message-ID: <200502171531.23060.tsoehnli@gmu.edu>

Hello All,

  The name of the project is PyTML, and information about it can be found at 
pytml.arcsine.org, or sf.net/projects/pytml.

 I have currently created a new web framework that works off of xml (for data 
storage), and python for content manipulation.  The code and content are 
completely seperated, and html is rendered through merging different blocks 
together from the xml files.  It is growing quickly to maturity, but as 
always I would like the opinions/thoughts/ideas of other members in the field 
in order to make this project the best it possibly can be.  Thank you.

      Timothy Soehnlin
-- 
I would rather be known as a Christian
 and despised, than to be overlooked,
  and thought of as one of the world.
From sridharinfinity at gmail.com  Thu Feb 17 17:00:01 2005
From: sridharinfinity at gmail.com (Sridhar Ratna)
Date: Thu Feb 17 17:00:27 2005
Subject: [Web-SIG] A new framework: PyTML
In-Reply-To: <200502171528.50449.tsoehnli@gmu.edu>
References: <200502171528.50449.tsoehnli@gmu.edu>
Message-ID: <8816fcf805021708003b0a770b@mail.gmail.com>

> 
>  I have currently created a new web framework that works off of xml (for data
> storage), and python for content manipulation.  The code and content are
> completely seperated, and html is rendered through merging different blocks
> together from the xml files.  It is growing quickly to maturity, but as
> always I would like the opinions/thoughts/ideas of other members in the field
> in order to make this project the best it possibly can be.  Thank you.
> 

Sounds like http://nevow.com

-- 
Sridhar Ratna - http://srid.bsdnerds.org
From theman at eradman.com  Sun Feb 20 05:23:34 2005
From: theman at eradman.com (Eric Radman)
Date: Sun Feb 20 05:32:17 2005
Subject: [Web-SIG] CGI HTTP Proxy
Message-ID: <20050220042334.GA10388@us270-gl0.eradman.com>

Before mod_wsgi exists I think it's needful to have an efficient way to
proxy http requests and responses through CGI. Is there a small app
written in C that we can use to call a running WSGI Server? I found one
old app that would probably do the job if it were updated:

http://www.leerssen.com/cgiproxy.html

He's calling this a CGI proxy, but it's really a HTTP proxy, which is
what we need since the WSGI Server is a true HTTP server itself.

On dedicated servers with multiple IP addresses where I have
administrative control over the web server and DNS I can simply map
hostnames for each WSGI application like this:

 www.mycompany.com (10.0.0.100) > HTTP Server
app1.mycompany.com (10.0.0.101) > WSGI Server <> WSGI App 1
app2.mycompany.com (10.0.0.101) > WSGI Server <> WSGI App 2

But in most virtual-hosting environments there needs to be a way to map
HTTP requests to WSGI servers behind the HTTP Server listening on the
one public IPv4 address:

*.mycompany.com/      (0.0.0.0) > HTTP Server
*.mycompany.com/app1/ (0.0.0.0) > HTTP Server > WSGI Server <> WSGI App 1
*.mycompany.com/app2/ (0.0.0.0) > HTTP Server > WSGI Server <> WSGI App 2

What's the best way to deploy multiple applications along side a HTTP
server serving static content?

Eric Radman  |  http://eradman.com