From james at pythonweb.org Tue Feb 1 00:17:11 2005 From: james at pythonweb.org (James Gardner) Date: Tue Feb 1 00:17:31 2005 Subject: [Web-SIG] Python Web Modules - Version 0.5.0 In-Reply-To: <41FE839C.4030007@colorstudy.com> References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com> Message-ID: <41FEBC77.7090008@pythonweb.org> Ian Bicking wrote: > web.wsgi.error: one standard I'd like for middleware would be some key > you could set that would indicate that some error handler exists, and > applications further down the stack shouldn't catch unexpected > exceptions (of course expected exceptions are a different matter). > Then the best error handler available would eventually get the error, > and process it somehow (e.g., mailing a report, displaying an error, > starting a debugger, etc). Anyway, something to think about for this. That could be useful. Presumably the middleware component nearest the server is likely to have the best error handling (as you would put the best error handler in a position to catch the most errors). So this could be as simple as agreeing a variable name like wsgi.error for the environ dictionary which the highest middleware component up the chain would set to True and ones lower down wouldn't provide error handling if it was already set. Another thing I noticed when writing the error handler is that if an application or middleware component doesn't form a header or set the status correctly it can be tricky to track down where the error occurred. If the application used a special object for headers and status in the start_response callable which raised an error when it was set with an invalid value that would make life easier. (Alternatively, if you wanted to change the way things were programmed a bit you could write your application as middleware and specify a terminator which set the headers and status using these special objects. Probably not necessary though!) > web.wsgi.auth: I've been thinking lot about this as well, particularly > about the external interface. REMOTE_USER seems like a reasonable > enough place to put the login information. I'd like to keep > authorization and authentication separate -- one middleware determines > who you are, another (might) determine if you are allowed access. > Frequently only the application really knows if you are authorized, > based on logic that's beyond any ability to make it generic. Agreed, the underlying API makes this even more explicit than the web.wsgi.auth module.. I'll split web.wsgi.auth.Auth into two components, one for authentication and one for authorization. The existing web.wsgi.auth.Auth will just be a chain of the two components and then will have the same functionality. > So I was thinking that status codes should be sufficient to > communicate authorization: 401 for login required, 403 for forbidden. > If you are doing cookie logins (which I generally prefer from a UI > perspective) the middleware can translate the 401 into a redirect to > the login page. And the 403 can turn into a nicer error page -- So in a new version the authentication middleware would display a sign in box if no user was signed in, the authorization middleware would provide objects for the application to test authorisation and would also look for headers to determine whether the application thought the user was authorised and would display a sign in if not. > a piece of middleware for indicating error pages would also be nice > (similar to Apache's ErrorDocument directive). Agreed, I'll write one. > web.wsgi.session: I'd like to have some sort of standard for these > objects, at least some aspects. Not the details of storage, but > mostly access; along the lines of web.session.manager and/or .store. > I'm not sure how I feel about the manager with multiple applications, > each of which has a store -- I feel like this should be part of the > configuration somehow, which isn't necessarily part of the standard > user-visible API. I've been thinking about the way series of applications can work together, which is what the web.wsgi.environment code is about. Perhaps it would be better to specify the application name in web.wsgi.environment (which is more to do with configuration) so that the web.wsgi.session and web.wsgi.auth objects all use the same application name and then the manager becomes more redundant because a store for the particular application is already created. > web.wsgi.cgi: is this safe when a piece of middleware changes > QUERY_STRING or otherwise rewrites the request? You can test for this > by saving the QUERY_STRING that you originally parsed alongside the > resulting FieldStorage, and then reparsing if they don't match. You > can even test for matching with "is", since you're really checking for > modifications instead of equality. The same should be possible for > wsgi.input and POST requests. The web.wsgi.cgi module actually builds the FieldStorage from the environ dictionary, not QUERY_STRING so this should mean that middleware can do what it likes and the underlying middleware and application will respond to the changes.. is this not a good way of doing it? One other thing I've been meaning to ask.. The WSGI specification currently allows no way for an application or middleware components to pass custom information back up the middleware chain so that an application can ask a middleware component not to perform a certain task if it needs to. Communication up the chain can only be provided through status, headers, exc_info and content. There could very easily also be a response dictionary added as another parameter to start_response, similar to environ which sent information up the chain. Was this deliberately avoided so that the system wouldn't get complicated? Thanks again for your comments Ian, much appreciated. James -- James Gardner http://www.pythonweb.org From ianb at colorstudy.com Tue Feb 1 00:28:13 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Feb 1 00:29:26 2005 Subject: [Web-SIG] WSGI middleware library Message-ID: <41FEBF0D.2070807@colorstudy.com> What do people think about collaborating on a kind of "standard" library of WSGI middleware? (Not standard like distributed-with-Python, just well publicized.) This is what I've tried to put together a little with WSGIKit, though not all parts of it would apply. And other people are, I think, starting to develop the same things, perhaps with some overlap. Maybe we can pool our efforts together. The criteria I'd consider: * Should be something we could do Right, i.e., can become "complete". E.g., a proxying WSGI application could be complete. A commenting system can't. * Shouldn't involve much UI. Mostly because it can't be very complete. * Shouldn't be tied to anything very specific. E.g., if there's a templating middleware (um, don't ask me exactly what that would look like) it shouldn't be bound to any particular templating language. Those kind of bindings should probably be part of the upstream libraries. * Provide robust architecture more than a pleasant API. E.g., WSGIKit implements Webware (more or less), but when you use that you see very little of the middleware that WSGIKit uses. And that middleware looks kind of ugly, what with the environment and string keys and the sometimes funny semantics. * Be really well documented and stable (at least once we come to consensus on an interface), so that people could reliably and easily used these middleware components in their frameworks. * Testable and tested. Some candidates I imagine: * Sessions middleware * Logging middleware/library (based on the standard library of course) * Error reporting middleware/library * Test frameworks (?) * A file application (handling If-Modified-Since, etc) * A proxy application * Libraries for parsing query strings and all that. Most of what is in Phillip's wsgiref. * Authentication (this would be on the ambitious end) * URL parsers (several, but maybe we could distill this down to a few primary models for parsing) * And maybe a few of the more boring servers, like the CGI server, which will otherwise be homeless (or widely repeated). I'd expect everyone involved to have ulterior motives, i.e., they'd all have their own separate pet projects and whatnot, and wouldn't be looking to this library (alone) to solve all their needs. And that would be good, another part of what would keep this from being Yet Another Framework. Together this should be attractive to people who like to delete code ;) (Code deleted is code debugged!) -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Tue Feb 1 00:48:56 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Feb 1 00:50:14 2005 Subject: [Web-SIG] Python Web Modules - Version 0.5.0 In-Reply-To: <41FEBC77.7090008@pythonweb.org> References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com> <41FEBC77.7090008@pythonweb.org> Message-ID: <41FEC3E8.3000902@colorstudy.com> James Gardner wrote: > Ian Bicking wrote: > >> web.wsgi.error: one standard I'd like for middleware would be some key >> you could set that would indicate that some error handler exists, and >> applications further down the stack shouldn't catch unexpected >> exceptions (of course expected exceptions are a different matter). >> Then the best error handler available would eventually get the error, >> and process it somehow (e.g., mailing a report, displaying an error, >> starting a debugger, etc). Anyway, something to think about for this. > > > That could be useful. Presumably the middleware component nearest the > server is likely to have the best error handling (as you would put the > best error handler in a position to catch the most errors). So this > could be as simple as agreeing a variable name like wsgi.error for the > environ dictionary which the highest middleware component up the chain > would set to True and ones lower down wouldn't provide error handling if > it was already set. Right. Except when you don't want that ;) Other times you may want to override the error handler locally; e.g., maybe you have a section of the site where you want to use a different error handler that shows exceptions to the browser (e.g., a development section). But presumably you could add an option to the middleware to force it to catch exceptions even when the environment advised not to. > Another thing I noticed when writing the error handler is that if an > application or middleware component doesn't form a header or set the > status correctly it can be tricky to track down where the error > occurred. If the application used a special object for headers and > status in the start_response callable which raised an error when it was > set with an invalid value that would make life easier. > > (Alternatively, if you wanted to change the way things were programmed a > bit you could write your application as middleware and specify a > terminator which set the headers and status using these special objects. > Probably not necessary though!) I'm not sure I understand you here. What's the exact situation where you encounter this? >> So I was thinking that status codes should be sufficient to >> communicate authorization: 401 for login required, 403 for forbidden. >> If you are doing cookie logins (which I generally prefer from a UI >> perspective) the middleware can translate the 401 into a redirect to >> the login page. And the 403 can turn into a nicer error page -- > > > So in a new version the authentication middleware would display a sign > in box if no user was signed in, the authorization middleware would > provide objects for the application to test authorisation and would also > look for headers to determine whether the application thought the user > was authorised and would display a sign in if not. Basically. If REMOTE_USER wasn't set (or was empty) and the application required login (based on whatever criteria it has) then it should return a 401 code. The authentication middleware doesn't know if login is required, but it would be nice if it can tell if you are logged in anyway (not possible with HTTP Basic auth, but ignoring that case). >> web.wsgi.session: I'd like to have some sort of standard for these >> objects, at least some aspects. Not the details of storage, but >> mostly access; along the lines of web.session.manager and/or .store. >> I'm not sure how I feel about the manager with multiple applications, >> each of which has a store -- I feel like this should be part of the >> configuration somehow, which isn't necessarily part of the standard >> user-visible API. > > > I've been thinking about the way series of applications can work > together, which is what the web.wsgi.environment code is about. Perhaps > it would be better to specify the application name in > web.wsgi.environment (which is more to do with configuration) so that > the web.wsgi.session and web.wsgi.auth objects all use the same > application name and then the manager becomes more redundant because a > store for the particular application is already created. OK, I was trying to figure out what wsgi.environment was about. Is it basically a way of indication local configuration (like a configuration realm or something)? I still lack a good intuition for how configuration should work. >> web.wsgi.cgi: is this safe when a piece of middleware changes >> QUERY_STRING or otherwise rewrites the request? You can test for this >> by saving the QUERY_STRING that you originally parsed alongside the >> resulting FieldStorage, and then reparsing if they don't match. You >> can even test for matching with "is", since you're really checking for >> modifications instead of equality. The same should be possible for >> wsgi.input and POST requests. > > > The web.wsgi.cgi module actually builds the FieldStorage from the > environ dictionary, not QUERY_STRING so this should mean that middleware > can do what it likes and the underlying middleware and application will > respond to the changes.. is this not a good way of doing it? Well, FieldStorage looks at particular keys, and I guess the result is derivative of all of those. But the keys are fairly limited -- I think it's just QUERY_STRING, QUERY_METHOD, CONTENT_TYPE, and CONTENT_LENGTH, though this could be confirmed by reading the cgi module. So even though you pass a complete environment, everytime you retrieve the value from the environment you want to check that these values haven't changed (along with wsgi.input). If I did it, I'd lazily parse the query string, and then reparse if those keys had changed. I guess wsgikit.wsgilib.get_cookies is an example of this: http://svn.colorstudy.com/trunk/WSGIKit/wsgikit/wsgilib.py > One other thing I've been meaning to ask.. The WSGI specification > currently allows no way for an application or middleware components to > pass custom information back up the middleware chain so that an > application can ask a middleware component not to perform a certain task > if it needs to. Communication up the chain can only be provided through > status, headers, exc_info and content. There could very easily also be a > response dictionary added as another parameter to start_response, > similar to environ which sent information up the chain. Was this > deliberately avoided so that the system wouldn't get complicated? I was thinking about this too. It certainly makes it simpler to make the response fairly plain and HTTP-like, but I can imagine lots of useful information that doesn't fit well into headers or response codes. E.g., if you are sending a 403 error message, maybe you want to pass some extra information along about why it happened. You could write that out as the HTML response, but then it becomes somewhat opaque if that gets rewritten. Something like the extension information that gets put in the request environment; it's always purely optional, but there to allow cooperation between components. There's no escape mechanism like that for the response. Well... there is a way, actually -- you can add callbacks to the request. For instance, in my session handler I add a callable to the request that returns the session object. If you don't call that at all then the session isn't even created, and no session ID is assigned (assuming you didn't already have a session). If you do call it, then the middleware modifies the response to add a session ID. So there's really some communication from the application that effects the response, but it isn't being expressed as part of the response stream (the status, headers, and body). -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From james at pythonweb.org Tue Feb 1 17:30:01 2005 From: james at pythonweb.org (James Gardner) Date: Tue Feb 1 17:30:08 2005 Subject: [Web-SIG] Python Web Modules - Version 0.5.0 In-Reply-To: <41FEC3E8.3000902@colorstudy.com> References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com> <41FEBC77.7090008@pythonweb.org> <41FEC3E8.3000902@colorstudy.com> Message-ID: <41FFAE89.6020602@pythonweb.org> Ian Bicking wrote: >> Another thing I noticed when writing the error handler is that if an >> application or middleware component doesn't form a header or set the >> status correctly it can be tricky to track down where the error >> occurred. If the application used a special object for headers and >> status in the start_response callable which raised an error when it >> was set with an invalid value that would make life easier. > > I'm not sure I understand you here. What's the exact situation where > you encounter this? Well, when I was programming the session middleware I appended a tuple of the wrong length to the headers used in start_response. This wasn't picked up until the error handling module by which time I had no idea which piece of middleware had appended the faulty header. If header was an object that behaved like a list but only allowed correctly formed headers to be appended this error would have been picked up where it happened. >> One other thing I've been meaning to ask.. The WSGI specification >> currently allows no way for an application or middleware components >> to pass custom information back up the middleware chain so that an >> application can ask a middleware component not to perform a certain >> task if it needs to. Communication up the chain can only be provided >> through status, headers, exc_info and content. There could very >> easily also be a response dictionary added as another parameter to >> start_response, similar to environ which sent information up the >> chain. Was this deliberately avoided so that the system wouldn't get >> complicated? > > I was thinking about this too. It certainly makes it simpler to make > the response fairly plain and HTTP-like, but I can imagine lots of > useful information that doesn't fit well into headers or response > codes. E.g., if you are sending a 403 error message, maybe you want > to pass some extra information along about why it happened. You could > write that out as the HTML response, but then it becomes somewhat > opaque if that gets rewritten. Something like the extension > information that gets put in the request environment; it's always > purely optional, but there to allow cooperation between components. > There's no escape mechanism like that for the response. > > Well... there is a way, actually -- you can add callbacks to the > request. For instance, in my session handler I add a callable to the > request that returns the session object. If you don't call that at > all then the session isn't even created, and no session ID is assigned > (assuming you didn't already have a session). If you do call it, then > the middleware modifies the response to add a session ID. So there's > really some communication from the application that effects the > response, but it isn't being expressed as part of the response stream > (the status, headers, and body). That's true and useful in the session case. In fact any middleware that needed the session store could still call the callable, they'd just need to check if it had already been called (or the callable itself could keep track of whether it had been called in fact). It does mean that other middleware components can't get access to the same information though unless they all chain callables down the middleware stack. It doesn't really work for your first example with the error information though since the information should be available to all middleware components. In that example though couldn't the application send error information with exc_info and the auth middleware catch it or am I missing something? Do you think there is mileage to be gained from adding a response dictionary to start_response as that would be a simple way of sending information back? It would break if existing WSGI apps didn't pass on the response dictionary though. James -- http://www.pythonweb.org/ From james at pythonweb.org Tue Feb 1 17:53:01 2005 From: james at pythonweb.org (James Gardner) Date: Tue Feb 1 17:53:02 2005 Subject: [Web-SIG] WSGI middleware library In-Reply-To: <41FEBF0D.2070807@colorstudy.com> References: <41FEBF0D.2070807@colorstudy.com> Message-ID: <41FFB3ED.1070505@pythonweb.org> Ian Bicking wrote: > What do people think about collaborating on a kind of "standard" > library of WSGI middleware? (Not standard like > distributed-with-Python, just well publicized.) This is what I've > tried to put together a little with WSGIKit, though not all parts of > it would apply. And other people are, I think, starting to develop > the same things, perhaps with some overlap. Maybe we can pool our > efforts together. I think this is a good idea. There are sometimes different approaches that can be taken to implementing similar functionality within WSGI and there is usually a best one. If we share ideas we are more likely to come up with the better solutions. There are also a lot of things which have only one good solution and there is no point in duplicating work. > I'd expect everyone involved to have ulterior motives, i.e., they'd > all have their own separate pet projects and whatnot, and wouldn't be > looking to this library (alone) to solve all their needs. And that > would be good, another part of what would keep this from being Yet > Another Framework. Together this should be attractive to people who > like to delete code ;) (Code deleted is code debugged!) If middleware components are built as classes it would be easy for implementors to derive their own classes from the standard ones to implement say session storage for their particular framework so I think this would be of benefit to everyone and wouldn't necessarily result in a lot of people pulling in different directions. I'd certainly find it helpful. What might also be useful is a guide to writing WSGI middleware and applications with examples of all the different ways of doing common things incorporating any ideas or tips people have found useful whilst writing their implementations. Perhaps we could start this on the wiki? James -- http://www.pythonweb.org/ From ianb at colorstudy.com Tue Feb 1 18:15:47 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Feb 1 18:17:07 2005 Subject: [Web-SIG] Python Web Modules - Version 0.5.0 In-Reply-To: <41FFAE89.6020602@pythonweb.org> References: <41FE7A51.1010205@pythonweb.org> <41FE839C.4030007@colorstudy.com> <41FEBC77.7090008@pythonweb.org> <41FEC3E8.3000902@colorstudy.com> <41FFAE89.6020602@pythonweb.org> Message-ID: <41FFB943.70001@colorstudy.com> James Gardner wrote: > Ian Bicking wrote: > >>> Another thing I noticed when writing the error handler is that if an >>> application or middleware component doesn't form a header or set the >>> status correctly it can be tricky to track down where the error >>> occurred. If the application used a special object for headers and >>> status in the start_response callable which raised an error when it >>> was set with an invalid value that would make life easier. >> >> >> I'm not sure I understand you here. What's the exact situation where >> you encounter this? > > Well, when I was programming the session middleware I appended a tuple > of the wrong length to the headers used in start_response. This wasn't > picked up until the error handling module by which time I had no idea > which piece of middleware had appended the faulty header. If header was > an object that behaved like a list but only allowed correctly formed > headers to be appended this error would have been picked up where it > happened. That seems like a complicated way to deal with the problem. If it's just for debugging you can add the wsgikit.lint middleware, and it checks for most of these issues without actually effecting the server or application. It does specifically check for the headers being a list of tuples of length 2. >>> One other thing I've been meaning to ask.. The WSGI specification >>> currently allows no way for an application or middleware components >>> to pass custom information back up the middleware chain so that an >>> application can ask a middleware component not to perform a certain >>> task if it needs to. Communication up the chain can only be provided >>> through status, headers, exc_info and content. There could very >>> easily also be a response dictionary added as another parameter to >>> start_response, similar to environ which sent information up the >>> chain. Was this deliberately avoided so that the system wouldn't get >>> complicated? >> >> >> I was thinking about this too. It certainly makes it simpler to make >> the response fairly plain and HTTP-like, but I can imagine lots of >> useful information that doesn't fit well into headers or response >> codes. E.g., if you are sending a 403 error message, maybe you want >> to pass some extra information along about why it happened. You could >> write that out as the HTML response, but then it becomes somewhat >> opaque if that gets rewritten. Something like the extension >> information that gets put in the request environment; it's always >> purely optional, but there to allow cooperation between components. >> There's no escape mechanism like that for the response. >> >> Well... there is a way, actually -- you can add callbacks to the >> request. For instance, in my session handler I add a callable to the >> request that returns the session object. If you don't call that at >> all then the session isn't even created, and no session ID is assigned >> (assuming you didn't already have a session). If you do call it, then >> the middleware modifies the response to add a session ID. So there's >> really some communication from the application that effects the >> response, but it isn't being expressed as part of the response stream >> (the status, headers, and body). > > > That's true and useful in the session case. In fact any middleware that > needed the session store could still call the callable, they'd just need > to check if it had already been called (or the callable itself could > keep track of whether it had been called in fact). Yes, it keeps track, and each time you call the session-creator that's in the environment it returns the same session object (but that object is created lazily). > It does mean that > other middleware components can't get access to the same information > though unless they all chain callables down the middleware stack. Perhaps instead of it being a callable it could be an object, and could support methods to check, for instance, if a session had been created without actually creating one. > It doesn't really work for your first example with the error information > though since the information should be available to all middleware > components. In that example though couldn't the application send error > information with exc_info and the auth middleware catch it or am I > missing something? It could work for error information. E.g.: def middleware(application): def error_app(environ, start_response): if not environ.has_key('wsgikit.errorchecker'): checker = environ['wsgikit.errorchecker'] = ErrorChecker() try: return application(environ, start_response) except: exc_info = sys.exc_info() return checker.respond_to_exception(exc_info) return error_app Then ErrorChecker is an instance you could add information to at any level of the application. ErrorChecker, in turn, could add information to another component, e.g., the ErrorDocument-like middleware. > Do you think there is mileage to be gained from adding a response > dictionary to start_response as that would be a simple way of sending > information back? It would break if existing WSGI apps didn't pass on > the response dictionary though. I think extending the request dictionary like this is sufficient. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From titus at caltech.edu Tue Feb 1 19:23:29 2005 From: titus at caltech.edu (Titus Brown) Date: Tue Feb 1 19:23:34 2005 Subject: [Web-SIG] Fun with WSGI -- commenting middleware. In-Reply-To: <41FE7DC9.3080100@colorstudy.com> References: <20050130024447.GA10409@caltech.edu> <41FE7DC9.3080100@colorstudy.com> Message-ID: <20050201182329.GA24982@caltech.edu> -> >I sat down today to hack out a simple commenting system for HTML -> >articles, and ended up using WSGI to implement a pipe-style solution. -> > -> >You can see the results at -> > -> > http://www.idyll.org/~t/articles.cgi/ -> > -> >This CGI script serves HTML files from a directory hierarchy. Anyone -> >can attach a comment to any HTML file served by the script. -> -> Spiffy. It would be neat to plug this into a WSGI application that -> served as a proxy (redisplaying pages fetched from another location). -> Then you could point it at the Python documentation and get that -> php.net-like commenting that people are always asking for; it would -> probably be good to make the commenting more granular, but it's -> interesting to be able to develop the different parts so separately. I thought about this a bit more. I like the proxy idea (and will implement it next time I have the urge to do some light coding). For the python docs, though, wouldn't it be better to just host the files on the same machine? I will probably develop a simple Quixote application to wrap the commenting code, too; having all this in CGI will get annoying, if I do anything more complex than what I'm doing now. -> Actually, I was just going to convert this silly little web-based image -> viewer I have to WSGI, and with this I could get a free commenting -> system. Hmm... The back-end is pretty lousy -- it's just a pickled dictionary of 'Comment' classes -- but that's modular, of course. I'll spruce up the commenting middleware itself & document that, and then make it directly available via DARCS. I'd be interested in people's opinions on how to format the entries & safeguard against XSS hacks. Right now I'm just pushing the exact HTML they wrote onto the pages, which strikes me as a Bad Idea. cheers, --titus From ianb at colorstudy.com Tue Feb 1 22:15:58 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Feb 1 22:17:16 2005 Subject: [Web-SIG] Fun with WSGI -- commenting middleware. In-Reply-To: <20050201182329.GA24982@caltech.edu> References: <20050130024447.GA10409@caltech.edu> <41FE7DC9.3080100@colorstudy.com> <20050201182329.GA24982@caltech.edu> Message-ID: <41FFF18E.9080902@colorstudy.com> Titus Brown wrote: > I thought about this a bit more. I like the proxy idea (and will > implement it next time I have the urge to do some light coding). For > the python docs, though, wouldn't it be better to just host the files > on the same machine? Yes, that's possible too, especially since they are all completely static and fully rendered. Probably easier, and also implemented already ;) I'm sure there's others, but wsgikit.urlparser serves static files reasonably well (wsgikit.wsgilib.send_file could use some work to be more efficient). > I will probably develop a simple Quixote application to wrap the > commenting code, too; having all this in CGI will get annoying, > if I do anything more complex than what I'm doing now. At one time I did a lot of this kind of thing where you'd read a page then fiddle with the output. It always had some holes, but it's an interesting technique, and one I come back to often. It would be nice to have a mini-framework for this sort of thing, that hides a bit of the WSGI fiddling you have to do. I.e., the framework packages up the request (which contains important information like the requested URL) and the response, and it gives it to some hook to munge the response (like adding comments). Another one might run the output through tidy and tack errors and warnings at the bottom of the page. Some sort of URL escape would also be good -- i.e., if your munging middleware is at /comment_system, then maybe you could tell it to redirect /comment_system/foo/* to another application, and that application would handle the form action for comments. That's easy to imagine as a Quixote app or something; but the munging bit isn't as easy. It would be easier if there was a function (which there might be) that could turn the WSGI request into a Quixote request object without bringing the rest of the framework in. Then the munging portion wouldn't be a Quixote application, per se, but it would look quite similar. Or, you could turn one request into two, sending the output of the first application as input to a second application, e.g., as a POST request where the body and headers are put into some fields. Then it could be a normal application, but it seems like a complex way to get there. Though... maybe it actually is the best way. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From skink at evhr.net Fri Feb 4 01:15:30 2005 From: skink at evhr.net (Fabien Schwob) Date: Fri Feb 4 01:15:42 2005 Subject: [Web-SIG] Web Client Message-ID: <4202BEA2.9000204@evhr.net> Hello, I'm currently trying to retrieve a webpage in order to extract information from it. The problem is that this page is _behind_ a POST formular. Does someone know a module or a tutorial that can help me ? Thanks -- Fabien From titus at caltech.edu Fri Feb 4 02:32:52 2005 From: titus at caltech.edu (Titus Brown) Date: Fri Feb 4 02:32:58 2005 Subject: [Web-SIG] Web Client In-Reply-To: <4202BEA2.9000204@evhr.net> References: <4202BEA2.9000204@evhr.net> Message-ID: <20050204013251.GA31349@caltech.edu> -> I'm currently trying to retrieve a webpage in order to extract -> information from it. The problem is that this page is _behind_ a POST -> formular. -> -> Does someone know a module or a tutorial that can help me ? There are several: urllib2 is probably the place to start. Check out this post, which is full of links ;). http://mail.python.org/pipermail/python-list/2004-September/238739.html --titus From ianb at colorstudy.com Fri Feb 4 05:14:21 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Feb 4 05:14:08 2005 Subject: [Web-SIG] WSGIKit/Webware/WSGI sprint Message-ID: <4202F69D.9010004@colorstudy.com> I've added a WSGIKit/Webware/WSGI sprint to the Wiki, thus officially registering us. If you are interested in coming please sign your name: http://python.org/moin/WsgiKitSprint If you are interested but you aren't sure if you can make it, then sign your name and say you aren't sure. I'm not sure if I'll be able to do Saturday and Sunday (I can for sure do Monday and Tuesday) -- I'll see what other people can do, and then we can figure out the exact schedule later. So far the interest in the sprint has come from the Webware list, but I want to give WSGIKit parity with Webware's features through framework-neutral WSGI middleware, so anyone who is interested in Python framework development and WSGI is very welcome. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From titus at caltech.edu Fri Feb 4 18:45:41 2005 From: titus at caltech.edu (Titus Brown) Date: Fri Feb 4 18:45:44 2005 Subject: [Web-SIG] WSGI middleware library In-Reply-To: <41FEBF0D.2070807@colorstudy.com> References: <41FEBF0D.2070807@colorstudy.com> Message-ID: <20050204174541.GA20145@caltech.edu> -> What do people think about collaborating on a kind of "standard" library -> of WSGI middleware? Hi, Ian, ok, here's another response ;). I slept on it a bit, and I would like to suggest one modification: make it a cookbook of examples, rather than a library. This implies that we don't need to have a standard naming scheme or a common coding style to the components, and there can be redundancy -- multiple examples overlapping in functionality. It also means that there is room for "incomplete" solutions, which are IMO of great value even just as stubs. Such code can be isolated and used piecemeal, independently of the rest of the library. And, finally, it means that code can be designed strictly for functionality rather than for extensibility. I make this suggestion for two reasons: first of all, I'd be more interested in contributing code to a cookbook than to a library, for the above reasons. And, secondly, my limited experience with example code I've posted suggests that people are primarily interested in a complete, functioning example that's isolated from other code. I do think a test harness (to make sure that the middleware is WSGI compliant) and a documentation standard (reST? In each directory? or ...?) would be a good idea. As immediate candidates for inclusion I suggest: * a simple wsgi-passthrough middleware, that "handles" the data without modifying it. (The idea is to provide hooks where I/O *can* be modified.) Most of my time in wsgiComment was spent figuring out how to get that functionality. * the CGI server from the PEP. I can submit nicely formatted versions of these if you're interested in proceeding immediately; I'd also be happy to host a Darcs repository for the stuff ;). cheers, --titus From ianb at colorstudy.com Fri Feb 4 19:04:57 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Fri Feb 4 19:06:33 2005 Subject: [Web-SIG] WSGI middleware library In-Reply-To: <20050204174541.GA20145@caltech.edu> References: <41FEBF0D.2070807@colorstudy.com> <20050204174541.GA20145@caltech.edu> Message-ID: <4203B949.9020100@colorstudy.com> Titus Brown wrote: > -> What do people think about collaborating on a kind of "standard" library > -> of WSGI middleware? > > Hi, Ian, > > ok, here's another response ;). > > I slept on it a bit, and I would like to suggest one modification: make > it a cookbook of examples, rather than a library. > > This implies that we don't need to have a standard naming scheme or a > common coding style to the components, and there can be redundancy -- > multiple examples overlapping in functionality. It also means that > there is room for "incomplete" solutions, which are IMO of great > value even just as stubs. Such code can be isolated and used piecemeal, > independently of the rest of the library. And, finally, it means that > code can be designed strictly for functionality rather than for > extensibility. Obviously some of the solutions will be incomplete for a while -- the development is a process. And there's nothing keeping us from having a contrib/ directory in the project, which could contain any kind of example or tool that might seem useful. There's no reason to exclude anything useful, but putting code in a library implies some committment to the API and functionality, which isn't appropriate for some code. That can largely be solved through documentation and other metadata (like the directory layout). As for extensibility... well, hopefully some pieces won't require much extensibility besides really obvious hooks that you'd want to include anyway. And hopefully those would stablize once a few people tried to use a piece of middleware and suggested improvements -- part of why I want to do this collaboratively is because predicting places for extension tends to be inaccurate, while waiting for people to use code and find they require a place for extension usually works better. > I make this suggestion for two reasons: first of all, I'd be more > interested in contributing code to a cookbook than to a library, > for the above reasons. And, secondly, my limited experience with > example code I've posted suggests that people are primarily interested > in a complete, functioning example that's isolated from other > code. To a degree, I would hope we'd have functioning examples by design -- certainly a smaller number of dependencies will make the libraries more accessible. At the same time, though, I want to actually *use* the results. E.g., there's things I'd like to move from WSGIKit to this library; but if this isn't a real library then all I can do is copy items and maybe keep them in sync in the future, but I can't every *use* them because a cookbook isn't stable or even packaged. > I do think a test harness (to make sure that the middleware is WSGI > compliant) and a documentation standard (reST? In each directory? > or ...?) would be a good idea. wsgikit.lint does some compliance testing, when used in conjunction with other tests. There's no general way to poke at middleware or applications, so we have to rely on specific code to do the poking while another piece of code (lint) makes sure everything goes through properly. Other parts of a framework would certainly be useful. Adding a wsgi: method to, say, mechanize or urllib2 would be nice; it would save you from having to do any server setup and test the WSGI application directly. > As immediate candidates for inclusion I suggest: > > * a simple wsgi-passthrough middleware, that "handles" the data without > modifying it. (The idea is to provide hooks where I/O *can* > be modified.) Most of my time in wsgiComment was spent figuring > out how to get that functionality. > > * the CGI server from the PEP. > > I can submit nicely formatted versions of these if you're interested in > proceeding immediately; I'd also be happy to host a Darcs repository for > the stuff ;). I was thinking of putting it on svn://w4py.org; I'm a bit partial to a centralized repository for this sort of thing, since it encourages continuous integration and maybe is a bit more transparent. And svn is pretty common at this point. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From hex-dump at hotmail.com Sat Feb 5 12:07:48 2005 From: hex-dump at hotmail.com (Mark Rees) Date: Sat Feb 5 12:09:01 2005 Subject: [Web-SIG] Ann:ISAPI-WSGI 0.4 Beta Message-ID: Hello everyone, I am happy to announce the release of ISAPI-WSGI 0.4 beta. ISAPI-WSGI will (hopefully) allow any WSGI application to run inside a windows webserver that supports ISAPI. I believe that it meets the requirements of the WSGI PEP. It has been only tested against the examples from the PEP, Ian Bickings' echo example from wsgi-webkit, and Titus Browns' WSGI enabled Simple Commenting System under IIS 5.1. It has one major limitation being that it is only single threaded. I am currently working on a fully threaded version, but wanted to release it now so others could have a look at it. I am interested in any feedback, suggestions or bug reports. See http://isapi-wsgi.python-hosting.com/wiki/DocsPage for info and get the python source & some examples at http://isapi-wsgi.python-hosting.com/wiki/ISAPISimpleHandler-0.4-beta Regards Mark Rees _________________________________________________________________ Sell your car for $9 on carpoint.com.au http://server-au.imrworldwide.com/cgi-bin/b?cg=link&ci=ninemsn&tu=http://carpoint.ninemsn.com.au?refid=hotmail_tagline From titus at caltech.edu Sun Feb 13 08:51:08 2005 From: titus at caltech.edu (Titus Brown) Date: Sun Feb 13 08:51:11 2005 Subject: [Web-SIG] wsgiMemcached and wsgiAdvogato. Message-ID: <20050213075108.GA2246@caltech.edu> Hi all, I continued my hobby of implementing simple WSGI apps for fun, and implemented one piece of middleware, wsgiMemcached, and once piece of endware, wsgiPullAdvogato. wsgiMemcached uses the Python API to memcached, http://www.danga.com/memcached/ to cache pages by their URL (according to PATH_INFO). wsgiPullAdvogato uses the XML-RPC API to advogato.org to pull down diary entries from advogato.org, e.g. http://issola.caltech.edu/~t/qwsgi/wsgi-cgi-gw.cgi/titus/50 pulls down the 50th entry from my diary. Warning -- it's not very error tolerant ;). Both are available off of http://darcs.idyll.org/~t/projects/ under the 'wsgiMisc' project. See the 'wsgi-cgi-gw.cgi' script for an example use. -- The main problem I ran into with caching was determining when a cache entry was stale. At the moment I implemented a simple function 'app.fresher_than(path_info, time_val)' that returns True if the cache entry should be discarded & regenerated. Unfortunately this function must then be implemented by the downstream app. Any thoughts/suggestions on this? It sure is fun to be able to chain applications like this... it's also nice to be able to switch between CGI and SCGI with no trouble whatsoever. cheers, --titus From ianb at colorstudy.com Sun Feb 13 19:20:57 2005 From: ianb at colorstudy.com (Ian Bicking) Date: Sun Feb 13 19:21:10 2005 Subject: [Web-SIG] wsgiMemcached and wsgiAdvogato. In-Reply-To: <20050213075108.GA2246@caltech.edu> References: <20050213075108.GA2246@caltech.edu> Message-ID: <01A20F74-7DEC-11D9-AFD1-000393985968@colorstudy.com> On Feb 13, 2005, at 1:51 AM, Titus Brown wrote: > The main problem I ran into with caching was determining when a cache > entry was stale. At the moment I implemented a simple function > 'app.fresher_than(path_info, time_val)' that returns True if the > cache entry should be discarded & regenerated. Unfortunately > this function must then be implemented by the downstream app. Any > thoughts/suggestions on this? memcached is multiprocess/multiserver, right? If it is, that certainly makes things more complicated. First, I guess there's all the cache-controlling headers. I always found those a little crude, though, as they are all predictive (you have to guess how long the cache is valid). The Vary header is interesting, though, since it allows you to indicate other headers that the content is derivative of. Maybe also interesting if you also consider WSGI extension headers -- though if you allow that there's other issues, like will you hang onto references of objects, and will you compare with is or ==, etc. Maybe a better method would be to emphasize forced expiration of the cache. You could add something to the request that allowed the application to expire the value at another URL. Of course, when you add Vary into the mix, you have to allow expiring a URL with specific headers. Or even more complicated -- so there needs to be an interface to iterate through some portion of the cache and optionally expire things the application encounters. And of course the expiration may not happen because of a request, it might happen outside of WSGI (like some timed task that updates some values on the backend), so there needs to be a non-WSGI way to expire the cache as well. Hmm... that all probably makes it much more complicated... -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org From colin at owlfish.com Tue Feb 15 21:48:01 2005 From: colin at owlfish.com (Colin Stewart) Date: Tue Feb 15 21:48:18 2005 Subject: [Web-SIG] ANN: WSGI Utils 0.4 Message-ID: <1108500481.6779.58.camel@roll> Hi, I've released a new version of WSGI Utils. This solves the problem where the server port number would be given to the WSGI application as a number instead of a string. There's also a few other fixes, and the WSGI Adaptor now supports redirection. Available from: http://www.owlfish.com/software/wsgiutils/ Colin. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20050215/00b36bcd/attachment.htm From tsoehnli at gmu.edu Thu Feb 17 16:28:50 2005 From: tsoehnli at gmu.edu (Timothy Soehnlin) Date: Thu Feb 17 16:20:28 2005 Subject: [Web-SIG] A new framework: PyTML Message-ID: <200502171528.50449.tsoehnli@gmu.edu> Hello All, I have currently created a new web framework that works off of xml (for data storage), and python for content manipulation. The code and content are completely seperated, and html is rendered through merging different blocks together from the xml files. It is growing quickly to maturity, but as always I would like the opinions/thoughts/ideas of other members in the field in order to make this project the best it possibly can be. Thank you. Timothy Soehnlin -- I would rather be known as a Christian and despised, than to be overlooked, and thought of as one of the world. From tsoehnli at gmu.edu Thu Feb 17 16:31:23 2005 From: tsoehnli at gmu.edu (Timothy Soehnlin) Date: Thu Feb 17 16:22:58 2005 Subject: [Web-SIG] A new framework: PyTML(fixed) Message-ID: <200502171531.23060.tsoehnli@gmu.edu> Hello All, The name of the project is PyTML, and information about it can be found at pytml.arcsine.org, or sf.net/projects/pytml. I have currently created a new web framework that works off of xml (for data storage), and python for content manipulation. The code and content are completely seperated, and html is rendered through merging different blocks together from the xml files. It is growing quickly to maturity, but as always I would like the opinions/thoughts/ideas of other members in the field in order to make this project the best it possibly can be. Thank you. Timothy Soehnlin -- I would rather be known as a Christian and despised, than to be overlooked, and thought of as one of the world. From sridharinfinity at gmail.com Thu Feb 17 17:00:01 2005 From: sridharinfinity at gmail.com (Sridhar Ratna) Date: Thu Feb 17 17:00:27 2005 Subject: [Web-SIG] A new framework: PyTML In-Reply-To: <200502171528.50449.tsoehnli@gmu.edu> References: <200502171528.50449.tsoehnli@gmu.edu> Message-ID: <8816fcf805021708003b0a770b@mail.gmail.com> > > I have currently created a new web framework that works off of xml (for data > storage), and python for content manipulation. The code and content are > completely seperated, and html is rendered through merging different blocks > together from the xml files. It is growing quickly to maturity, but as > always I would like the opinions/thoughts/ideas of other members in the field > in order to make this project the best it possibly can be. Thank you. > Sounds like http://nevow.com -- Sridhar Ratna - http://srid.bsdnerds.org From theman at eradman.com Sun Feb 20 05:23:34 2005 From: theman at eradman.com (Eric Radman) Date: Sun Feb 20 05:32:17 2005 Subject: [Web-SIG] CGI HTTP Proxy Message-ID: <20050220042334.GA10388@us270-gl0.eradman.com> Before mod_wsgi exists I think it's needful to have an efficient way to proxy http requests and responses through CGI. Is there a small app written in C that we can use to call a running WSGI Server? I found one old app that would probably do the job if it were updated: http://www.leerssen.com/cgiproxy.html He's calling this a CGI proxy, but it's really a HTTP proxy, which is what we need since the WSGI Server is a true HTTP server itself. On dedicated servers with multiple IP addresses where I have administrative control over the web server and DNS I can simply map hostnames for each WSGI application like this: www.mycompany.com (10.0.0.100) > HTTP Server app1.mycompany.com (10.0.0.101) > WSGI Server <> WSGI App 1 app2.mycompany.com (10.0.0.101) > WSGI Server <> WSGI App 2 But in most virtual-hosting environments there needs to be a way to map HTTP requests to WSGI servers behind the HTTP Server listening on the one public IPv4 address: *.mycompany.com/ (0.0.0.0) > HTTP Server *.mycompany.com/app1/ (0.0.0.0) > HTTP Server > WSGI Server <> WSGI App 1 *.mycompany.com/app2/ (0.0.0.0) > HTTP Server > WSGI Server <> WSGI App 2 What's the best way to deploy multiple applications along side a HTTP server serving static content? Eric Radman | http://eradman.com