From janssen at parc.com Wed Sep 1 03:16:19 2004 From: janssen at parc.com (Bill Janssen) Date: Wed Sep 1 03:16:45 2004 Subject: [Web-SIG] SIG charter In-Reply-To: Your message of "Fri, 27 Aug 2004 10:51:08 PDT." <20040827175108.GA29376@rogue.amk.ca> Message-ID: <04Aug31.181620pdt."58612"@synergy1.parc.xerox.com> > I think the charter was written by Bill Janssen, who doesn't seem to > be actively participating on the list any more. The charter doesn't > necessarily bear any relevance to what the individuals in the SIG are > actually doing. Oh, I'm here, but I've been on vacation the last couple of weeks. I'd say, keep the current charter, and let's keep up the great conversation that's been going on. Bill From pje at telecommunity.com Wed Sep 1 04:39:26 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 1 04:38:59 2004 Subject: [Web-SIG] Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040831223709.02368cf0@mail.telecommunity.com> At 05:56 PM 8/31/04 -0400, Phillip J. Eby wrote: >I'm just about to check in a major update to the PEP, per the details >below. It will be a while before it shows up in the HTML version of the >PEP or the sourceforge ViewCVS, though. FYI: these changes have now propagated to the HTML version at: http://www.python.org/peps/pep-0333.html and the CVS history at: http://cvs.sourceforge.net/viewcvs.py/python/python/nondist/peps/pep-0333.txt From andrew at andreweland.org Wed Sep 1 11:17:05 2004 From: andrew at andreweland.org (Andrew Eland) Date: Wed Sep 1 11:27:38 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <4134CB04.2010803@xhaus.com> References: <4134CB04.2010803@xhaus.com> Message-ID: <41359391.5000108@andreweland.org> Alan Kennedy wrote: > Problem is that jython doesn't support file descriptors, or the fileno() > method. If you invoke fileno() on an org.python.core.PyFile, you get an > Py.IOError("fileno() is not supported in jpython") exception. I guess the fileno() method could be renamed something like os_file() or os_stream(). CPython could return a file descriptor, Jython could return something like a java.nio.Channel, IronPython could return a System.IO.Stream, or something like that. -- Andrew (http://www.andreweland.org) From py-web-sig at xhaus.com Wed Sep 1 12:52:02 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 1 12:47:28 2004 Subject: [Web-SIG] Pending modifications to PEP 333 In-Reply-To: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com> References: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com> Message-ID: <4135A9D2.9060800@xhaus.com> [Phillip J. Eby] > I'm just about to check in a major update to the PEP, per the details > below. Phillip, Thanks for all your hard work: I think you're doing a great job, and I think that the WSGI initiative is the best thing ever to happen to python web APIs. But I do have one problem :-( [Phillip J. Eby] > I've also clarified that 'fileno()', if present, *must* be an OS file > descriptor, and is only relevant to servers on platforms where file > descriptors exist. This will break portability across jython and IronPython, and any other platforms that don't have the concept of file descriptor tables: thus it prevents WSGI applications from returning file-like objects on these platforms. The requirement, as is, can only work on platforms that use file descriptor tables, i.e. where every process has an array of open files/file-likes, where the "fileno()" is an integer index into that table. Granted, all *nixes, Windows, MacOS, etc, etc, all have per-process file descriptor tables, thus belying their C/unix heritage. Neither jython nor ironpython have file descriptor tables. Since the concept of file descriptor tables is platform specific, both the JVM and the .Net CLR eliminated them, and modelled all file-like objects as specific object classes, e.g. java.io.OutputStream, java.nio.SelectableChannel, System.IO.*, etc. If you want to create a file-like object, you must use one of platform-supplied classes: there is no global table of such file-like objects. You can no longer pass around "file descriptors", i.e. indexes into a table of file objects, because the semantics of what you can with various file-like objects varies between those objects. Some pythonistas don't like this object specialization for file-like objects, and prefer the *nix file descriptor approach, since it is comparable to python's late-binding approach to datatypes. However, lack of file descriptor tables is an unavoidable reality on the JVM and CLR: the two most widespread virtual-machines in the world. Insisting on the "fileno()" method returning a file descriptor makes it impossible to return a file like object to a jython or ironpython implemented WSGI container. IMHO, the correct approach is for the appplication to return an actual file-like object, e.g. one with a read() method, and for the server/framework to then map that file-like object to whatever high-performance byte-stream-type object is appropriate on the platform. On java, for example, this could be a java.nio.FileChannel. Once one of these had been obtained from the returned file-like object, the high performance FileChannel.transferTo() could then be used to transfer the file contents to the socket return stream. http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/FileChannel.html#transferTo(long,%20long,%20java.nio.channels.WritableByteChannel) So, please can we have WSGI require the return of a file-like object, which the WSGI server/framework is then free to map to a high-performance channel in whatever way is appropriate? The "must return a file descriptor approach" is broken. Kind regards, Alan. From py-web-sig at xhaus.com Wed Sep 1 12:59:33 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 1 12:54:58 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <41359391.5000108@andreweland.org> References: <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> Message-ID: <4135AB95.8040108@xhaus.com> [Alan Kennedy] >> Problem is that jython doesn't support file descriptors, or the >> fileno() method. If you invoke fileno() on an org.python.core.PyFile, >> you get an Py.IOError("fileno() is not supported in jpython") exception. [Andrew Eland] > I guess the fileno() method could be renamed something like os_file() or > os_stream(). CPython could return a file descriptor, Jython could return > something like a java.nio.Channel, IronPython could return a > System.IO.Stream, or something like that. Hmm, I'm not sure I understand what you are saying here Andrew. The use-case we're trying to cover is where the application wants to return a file-like object to the WSGI server/framework. The applications intention should be that the contents of the file-like object, from the current file-pointer onwards, should be transferred to the return socket for the HTTP request. On jython, and I'm guessing on ironpython, file-like objects don't have a fileno() method, or an os_file() method or an os_stream() method. They just have file like methods, e.g. read(), readline(), write(), etc. What we need is a way for the application to return a file-like object, in a platform-independent way, so that whatever platform/framework the application is running in can 1. Simply read the file contents and transfer that back to the user 2. Possibly do so using a high-performance channel or stream. Regards, Alan. From andrew at andreweland.org Wed Sep 1 13:30:47 2004 From: andrew at andreweland.org (Andrew Eland) Date: Wed Sep 1 13:41:15 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <4135AB95.8040108@xhaus.com> References: <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> Message-ID: <4135B2E7.5060708@andreweland.org> Alan Kennedy wrote: > Hmm, I'm not sure I understand what you are saying here Andrew. > The use-case we're trying to cover is where the application wants to > return a file-like object to the WSGI server/framework. The applications > intention should be that the contents of the file-like object, from the > current file-pointer onwards, should be transferred to the return socket > for the HTTP request. The intent, I think, is to special-case the sending of static files, allowing a server to use the most efficient method of transferring data from a file to a socket that the platform provides. Under CPython, the server could use something like sendfile() or epoll() to transfer data, if it has access to the underlying file descriptor. Under Jython, with a server written in Java, it would be nice to allow the use the most efficient Java mechanism to transfer data from the file to the client, which I guess is the functionality under java.nio. To do this, the server would need to access the underlying Java object representing the file, a java.nio.Channel or similar. -- Andrew (http://www.andreweland.org) From py-web-sig at xhaus.com Wed Sep 1 14:39:46 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 1 14:35:11 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <4135B2E7.5060708@andreweland.org> References: <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> Message-ID: <4135C312.2060009@xhaus.com> [Alan Kennedy] >> Hmm, I'm not sure I understand what you are saying here Andrew. The >> use-case we're trying to cover is where the application wants to >> return a file-like object to the WSGI server/framework. The >> applications intention should be that the contents of the file-like >> object, from the current file-pointer onwards, should be transferred >> to the return socket for the HTTP request. [Andrew Eland] > The intent, I think, is to special-case the sending of static files, > allowing a server to use the most efficient method of transferring data > from a file to a socket that the platform provides. Agreed that special-casing static files for performance reasons is a good thing. But we also need to consider what happens when the application returns, for example, a StringIO.StringIO, or a gzip.GzipFile. I'm trying to come up with a scheme whereby applications can do those things transparently across cpython, jython and ironpython. So when I said "I'm not sure I understand", I should have said "I don't understand how your proposed os_file() or os_stream() approach would work, without forcing application authors to detect the platform they are running on and alter their applications behaviour accordingly". > Under CPython, the server could use something like sendfile() or epoll() > to transfer data, if it has access to the underlying file descriptor. > Under Jython, with a server written in Java, it would be nice to allow > the use the most efficient Java mechanism to transfer data from the file > to the client, which I guess is the functionality under java.nio. To do > this, the server would need to access the underlying Java object > representing the file, a java.nio.Channel or similar. Precisely: maximizing efficiency is high on my priority list. As a datapoint, using java.nio.Channel would currently not be possible under most existing J2EE containers, since they tend to use the old java.net APIs for socket creation. Such java.net-created sockets don't have java.nio.Channel's: you have to use the java.nio APIs to get java.nio.Channels. Which will be a breeze pythonistas when I'm finished my jynio modules, e.g. non-blocking support for jython: e.g. select, asyncore, etc, which is completely based on java.nio. Hopefully we will then see the cpython asynch frameworks, e.g. Medusa, Twisted, etc, running on java as well. I would then expect to see some serious performance competition between cpython and jython, especially since jython is not restricted by a GIL. Regards, Alan. From pje at telecommunity.com Wed Sep 1 15:17:13 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 1 15:16:56 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <4135C312.2060009@xhaus.com> References: <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> Message-ID: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> At 01:39 PM 9/1/04 +0100, Alan Kennedy wrote: >[Alan Kennedy] >>>Hmm, I'm not sure I understand what you are saying here Andrew. The >>>use-case we're trying to cover is where the application wants to return >>>a file-like object to the WSGI server/framework. The applications >>>intention should be that the contents of the file-like object, from the >>>current file-pointer onwards, should be transferred to the return socket >>>for the HTTP request. > >[Andrew Eland] >>The intent, I think, is to special-case the sending of static files, >>allowing a server to use the most efficient method of transferring data >>from a file to a socket that the platform provides. > >Agreed that special-casing static files for performance reasons is a good >thing. > >But we also need to consider what happens when the application returns, >for example, a StringIO.StringIO, or a gzip.GzipFile. No, we don't. WSGI does not support that. You must return an *iterable*. As Andrew says, 'fileno()' was added to allow special-casing operating system file descriptors on platforms that have them, and have APIs like 'sendfile()' that can copy data directly from one descriptor to another. If you would like to support special Java stuff, or CLR stuff, you can always have your server look for some other attribute name and support that as a platform-specific, optional extension for higher performance. But that's *all* the 'fileno()' support is: a *platform-specific* *optional extension* to boost performance in certain cases. The server isn't even required to *check* for a fileno attribute, and the application certainly isn't required to provide it. The application is required to return an iterable. That's the protocol. You want to return a "file-like" object, you *must* wrap it in an iterable of some kind. For example: return [some_io.getvalue()] is a perfectly reasonable way to return a StringIO. From pje at telecommunity.com Wed Sep 1 15:19:50 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 1 15:19:30 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <41359391.5000108@andreweland.org> References: <4134CB04.2010803@xhaus.com> <4134CB04.2010803@xhaus.com> Message-ID: <5.1.1.6.0.20040901091738.03116190@mail.telecommunity.com> At 10:17 AM 9/1/04 +0100, Andrew Eland wrote: >Alan Kennedy wrote: > >>Problem is that jython doesn't support file descriptors, or the fileno() >>method. If you invoke fileno() on an org.python.core.PyFile, you get an >>Py.IOError("fileno() is not supported in jpython") exception. > >I guess the fileno() method could be renamed something like os_file() or >os_stream(). CPython could return a file descriptor, Jython could return >something like a java.nio.Channel, IronPython could return a >System.IO.Stream, or something like that. No; if developers on those platforms want to support optional platform-specific performance boosting, they should define platform-specific names for the attribute. This improves the ease of portability for applications: they just provide what they know how to provide, and the server only invokes the attribute appropriate to the platform, if it invokes any attribute at all. From py-web-sig at xhaus.com Wed Sep 1 15:40:00 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 1 15:35:24 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> References: <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> Message-ID: <4135D130.4090108@xhaus.com> [Alan Kennedy] >> But we also need to consider what happens when the application >> returns, for example, a StringIO.StringIO, or a gzip.GzipFile. [Phillip J. Eby] > No, we don't. WSGI does not support that. You must return an > *iterable*. As Andrew says, 'fileno()' was added to allow > special-casing operating system file descriptors on platforms that have > them, and have APIs like 'sendfile()' that can copy data directly from > one descriptor to another. > > If you would like to support special Java stuff, or CLR stuff, you can > always have your server look for some other attribute name and support > that as a platform-specific, optional extension for higher performance. But that is explicitly forbidden: "Finally, servers must not directly use any other attributes of the iterable returned by the application. For example, it[sic] the iterable is a file object, it may have a read() method, but the server must not utilize it. Only attributes specified here, or accessed via e.g. the PEP 234 iteration APIs are acceptable." > But that's *all* the 'fileno()' support is: a *platform-specific* > *optional extension* to boost performance in certain cases. The server > isn't even required to *check* for a fileno attribute, and the > application certainly isn't required to provide it. Fair enough, it is good to support recognition of file-like objects on platforms that have file descriptor tables. But I don't see any WSGI compliant way in jython that I can take a static file object returned by a WSGI application and do anything with it at all. For example, if the application works like this, which I'd imagine is a common expected usage pattern, then I can do nothing def app_object(environ, start_response): start_response("200 OK", [ ('content-type', 'image/jpg') ]) return open("%s.jpg" % environ['PATH_INFO'], 'rb') This will work on cpython, of course, because of implicit fileno() method on the (cpython) file object. But will fail on jython, which will confuse the hell out of appliction authors. Regards, Alan. From pje at telecommunity.com Wed Sep 1 15:54:06 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 1 15:53:46 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <4135D130.4090108@xhaus.com> References: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> At 02:40 PM 9/1/04 +0100, Alan Kennedy wrote: >[Alan Kennedy] >>>But we also need to consider what happens when the application returns, >>>for example, a StringIO.StringIO, or a gzip.GzipFile. > >[Phillip J. Eby] >>No, we don't. WSGI does not support that. You must return an >>*iterable*. As Andrew says, 'fileno()' was added to allow special-casing >>operating system file descriptors on platforms that have them, and have >>APIs like 'sendfile()' that can copy data directly from one descriptor to >>another. >>If you would like to support special Java stuff, or CLR stuff, you can >>always have your server look for some other attribute name and support >>that as a platform-specific, optional extension for higher performance. > >But that is explicitly forbidden: "Finally, servers must not directly use >any other attributes of the iterable returned by the application. For >example, it[sic] the iterable is a file object, it may have a read() >method, but the server must not utilize it. Only attributes specified >here, or accessed via e.g. the PEP 234 iteration APIs are acceptable." I've changed the spec now to allow authors to define a platform-specific special method name for this purpose. >But I don't see any WSGI compliant way in jython that I can take a static >file object returned by a WSGI application and do anything with it at all. > >For example, if the application works like this, which I'd imagine is a >common expected usage pattern, then I can do nothing > >def app_object(environ, start_response): > start_response("200 OK", [ ('content-type', 'image/jpg') ]) > return open("%s.jpg" % environ['PATH_INFO'], 'rb') > >This will work on cpython, of course, because of implicit fileno() method >on the (cpython) file object. But will fail on jython, which will confuse >the hell out of appliction authors. If they want to support Python versions prior to 2.2, they can't return a file object. The above code simply isn't portable to Python 2.1. But, since your use case is, "try to allow 2.2 code to run anyway", it's also reasonable for you to hack in support for objects of type 'file' (and whatever type Jython uses for pipes) and pretend they're iterables. You're specifically trying to support some 2.2 idioms rather than deal with 2.1 limitations, so this is just another one for you. Don't let the spec stop you from supporting your use case. The problem is simply that your use case is outside the spec's scope, and I don't want to expand the spec's scope to make *everybody else* have to implement the extras you're implementing. I don't want to force everybody else to try to support 2.2 features in a 2.1 Python. Does that make sense? From py-web-sig at xhaus.com Wed Sep 1 16:50:52 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 1 16:46:16 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> References: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> Message-ID: <4135E1CC.1060605@xhaus.com> [Phillip J. Eby] > Does that make sense? Phillip, sorry to be such a PITA, but no, it doesn't. [Phillip J. Eby] >>> If you would like to support special Java stuff, or CLR stuff, you >>> can always have your server look for some other attribute name and >>> support that as a platform-specific, optional extension for higher >>> performance. [Alan Kennedy] >> But that is explicitly forbidden [Phillip J. Eby] > I've changed the spec now to allow authors to define a platform-specific > special method name for this purpose. But there is no special method name or attribute on file-like objects that I can look for: file methods such as read() are the only options. Jython file objects have an identical interface to cpython file objects, except that they don't have fileno() methods. Though I suppose could check the class of the returned object, e.g. if isinstance(app_return, types.FileType): # Attempt high-performance stuff [Alan Kennedy] >> For example, if the application works like this, which I'd imagine is >> a common expected usage pattern, then I can do nothing >> >> def app_object(environ, start_response): >> start_response("200 OK", [ ('content-type', 'image/jpg') ]) >> return open("%s.jpg" % environ['PATH_INFO'], 'rb') >> >> This will work on cpython, of course, because of implicit fileno() >> method on the (cpython) file object. But will fail on jython, which >> will confuse the hell out of appliction authors. [Phillip J. Eby] > If they want to support Python versions prior to 2.2, they can't return > a file object. The above code simply isn't portable to Python 2.1. A couple of points to make here 1. I see nothing 2.2 specific in my above code sample: it works on all pythons. I don't see what differs between 2.1 vs. 2.2 in this case. 2. The spec, as is, explicitly permits authors of cpython applications to return file-like objects, due to the cpython-specific special case "your application object may have a fileno()". Of course, most application authors won't know that the reason why their file return is succeeding is because the file object has a fileno() method, and then wonder why their app doesn't work on jython. > But, since your use case is, "try to allow 2.2 code to run anyway", it's > also reasonable for you to hack in support for objects of type 'file' > (and whatever type Jython uses for pipes) and pretend they're > iterables. You're specifically trying to support some 2.2 idioms rather > than deal with 2.1 limitations, so this is just another one for you. Sorry, I'm confused: what 2.2 idioms do you mean? > Don't let the spec stop you from supporting your use case. The problem > is simply that your use case is outside the spec's scope, and I don't > want to expand the spec's scope to make *everybody else* have to > implement the extras you're implementing. I don't want to force > everybody else to try to support 2.2 features in a 2.1 Python. Sorry, Phillip, I'm confused. I don't see that this has anything to do with 2.1 vs. 2.2: it's got to do with how to recognise the case where the application returns a file-like object, which can then be treated specially, e.g. for high-performance reasons. I think we should explicitly allow return of a file-like object, and thus freedom to use the read() method, etc. That's the platform-independent way to solve this problem. Then each server/framework author can map that to a high-performance descriptor/stream/channel in whatever way is appropriate for their platform. Or not bother with high-performance, and just read() all the file contents and transmit that. Is there a specific reason, perhaps relating to python 2.2, that you want to prevent appplication authors from returning files? Regards, Alan. From pje at telecommunity.com Wed Sep 1 17:59:30 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 1 17:59:15 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <4135E1CC.1060605@xhaus.com> References: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com> At 03:50 PM 9/1/04 +0100, Alan Kennedy wrote: >But there is no special method name or attribute on file-like objects that >I can look for: file methods such as read() are the only options. Jython >file objects have an identical interface to cpython file objects, except >that they don't have fileno() methods. "File-like" is a complete red herring: the spec has never supported them (and IMO never will). What the spec calls for is an *iterable*: an object that can be used in a "for" loop. In Python 2.2 and up, file objects are iterable. In older versions of Python, they are not. Thus, an application that returns a file object implicitly requires Python 2.2 or up. (This issue is mentioned in the spec, where it warns that if you are using an older version of Python, you may *not* return a file object.) WSGI does not support returning files or file-like objects: it is simply an artifact of Python 2.2 and up that returning a file works at all! Appealing to the 'fileno()' case as supporting file-like objects is also a red herring: the object must *still* be an iterable, because not every server or gateway will support 'fileno()'. Thus, code that relies on an object that has a 'fileno()' but isn't iterable, is in violation of the spec and is inherently non-portable. But, because file objects are iterable in 2.2 or up, if the application doesn't care about older versions, it is free to return file objects. The fact that you would like such code to run in a Jython 2.1 server doesn't mean that the spec should expand its scope to cover even *file objects*, let alone "file-like" objects. It simply means that you'll have to deal with the special cases that entails, until Jython 2.2 is ready for prime-time. >1. I see nothing 2.2 specific in my above code sample: it works on all >pythons. I don't see what differs between 2.1 vs. 2.2 in this case. 2.1 doesn't allow iteration over file objects. >2. The spec, as is, explicitly permits authors of cpython applications to >return file-like objects, Only if they are *iterable*, which is only true of the 'file' object in 2.2 and up. >due to the cpython-specific special case "your application object may have >a fileno()". You misunderstand: in CPython 2.1 returning a file is *not* acceptable under the spec. It is purely coincidental that it will happen to work if the server checks for 'fileno()' and supports doing something with it. But it's not portable behavior for 2.1 and the spec has said that as soon as the "Supporting Older Versions" section was added. >Of course, most application authors won't know that the reason why their >file return is succeeding is because the file object has a fileno() >method, and then wonder why their app doesn't work on jython. You're effectively arguing for removing the 'fileno()' special case altogether, or else adding language to require the server to *first* check for iterability and raise an error if the return isn't iterable, so that running a 2.2 app in a 2.1 server won't "accidentally" succeed when the 2.1 server supports 'fileno()'. However, this is such an obscure use case as to be ludicrous to worry about. So far, yours is the only server that has suggested that supporting 2.2 apps under Python 2.1 is anything even approaching a good idea. I find it hard to imagine any reason to do that, other than the lack of availability of a Python 2.2 implementation. Other than Jython, I'm not aware of any other platforms where this is the case. I applaud your bravery in trying to make it work for Jython, but changing the spec to allow other kinds of objects isn't going to decrease the amount of work you have to do, only increase it for other people who *aren't* trying to support 2.2 apps in a server running under Python 2.1. > > Don't let the spec stop you from supporting your use case. The problem > > is simply that your use case is outside the spec's scope, and I don't > > want to expand the spec's scope to make *everybody else* have to > > implement the extras you're implementing. I don't want to force > > everybody else to try to support 2.2 features in a 2.1 Python. > >Sorry, Phillip, I'm confused. I don't see that this has anything to do >with 2.1 vs. 2.2: it's got to do with how to recognise the case where the >application returns a file-like object, which can then be treated >specially, e.g. for high-performance reasons. It's not about "file-like" objects, only *actual* file objects. Returning a "file-like" object offers no meaningful performance boost, and it is *not* supported -- and never was. >I think we should explicitly allow return of a file-like object, and thus >freedom to use the read() method, etc. I disagree. In 2.2, you can return a file-like object thus: return iter(lambda: filelike.read(bufsize), "") In 2.1 and prior, you can do this: class Reader: def __init__(self,filelike,bufsize=4096): self.stream = filelike self.bufsize = bufsize if hasattr(filelike,'fileno'): self.fileno = filelike.fileno def __getitem__(self,ind): data = self.stream.read(self.bufsize) if data: return data raise IndexError return Reader(filelike) or even: return xreadlines.xreadlines(filelike) Any of these approaches results in a spec-compliant iterable for the applicable or higher version of Python. You are trying to let 2.2 code run in a 2.1 Python. But you don't need to support "file-like" objects to do that. You need only special case for an *actual* file object, because such an object *would* be iterable under 2.2. The issue there isn't high-performance, it's merely that file objects are unacceptable return values in Python 2.1, but code written for 2.2. will expect that returning a file object is valid. There are other objects that technically would need to be special-cased for this. For example, a dictionary object is iterable in 2.2 but not in 2.1. In practice, it would be silly to bother since nobody in their right mind is going to use a dictionary as a WSGI return value... unless of course it had only one key. The point is that trying to run 2.2 code in a 2.1 Python is necessarily a collection of special case hacks. The spec calls for *iterability*, and 2.2 code may return objects of built-in types that are iterable in 2.2, but not in 2.1. That is why this is a Python versioning issue, and specific to your attempt to run 2.2 code in a 2.1 Python. It has absolutely nothing to do with accepting "file-like" objects in the spec, which never accepted them, nor is it intended to ever do so. Is this getting any clearer? From py-web-sig at xhaus.com Wed Sep 1 19:08:46 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 1 19:04:11 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com> References: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com> Message-ID: <4136021E.6070907@xhaus.com> Phillip, I'm fairly sure I understand your position now. But I think I don't agree with it ;-) [Phillip J. Eby] > "File-like" is a complete red herring: the spec has never supported them > (and IMO never will). > > What the spec calls for is an *iterable*: an object that can be used in > a "for" loop. In Python 2.2 and up, file objects are iterable. In > older versions of Python, they are not. > > Thus, an application that returns a file object implicitly requires > Python 2.2 or up. (This issue is mentioned in the spec, where it warns > that if you are using an older version of Python, you may *not* return a > file object.) > > WSGI does not support returning files or file-like objects: it is simply > an artifact of Python 2.2 and up that returning a file works at all! My position here is that the iterator-ness of a returned file object is secondary when the returned object has a fileno() method: most cpython framework code is going to do this if (hasattr(app_object, 'fileno') and callable(app_object.fileno): send_file(app_object) else: treat_app_object_as_iterable(app_object) I would summarise the position of the current spec as "you must return an iterable, except when you want to return a file object, which will work fine under cpython 2.2+, because files are iterable under cpython 2.2+, even though they don't need to be iterable when they have fileno()". > The fact that you would like such code to run in a Jython 2.1 server > doesn't mean that the spec should expand its scope to cover even *file > objects*, let alone "file-like" objects. It simply means that you'll > have to deal with the special cases that entails, until Jython 2.2 is > ready for prime-time. Call me old-fashioned, but I'm a great believer in "practicality beats purity". I think we should be seeking to be as inclusive as possible, which means supporting as wide a software base as possible. I'm just afraid that people will steam ahead writing WSGI middleware applications which return file-objects, which will fail on jython simply because putting the following lines in my code is a violation of the spec if type(app_return) is types.FileType: do_file_stuff(app_return) [Alan Kennedy] >> 2. The spec, as is, explicitly permits authors of cpython applications >> to return file-like objects, [Phillip J. Eby] > Only if they are *iterable*, which is only true of the 'file' object in > 2.2 and up. Which seems to me an arbitrary criterion, especially in the light that the iterator nature of the file object will possibly (likely) not be actually used, as described in the snippet above. > You're effectively arguing for removing the 'fileno()' special case > altogether, or else adding language to require the server to *first* > check for iterability and raise an error if the return isn't iterable, > so that running a 2.2 app in a 2.1 server won't "accidentally" succeed > when the 2.1 server supports 'fileno()'. Not at all. I'm arguing for us to be practical about applications returning file objects. 1. It's a very common use case 2. It's trivial to deal with 3. There are no python version dependency issues In cpython frameworks, the code would look like this if hasattr(app_object, 'fileno'): do_file_stuff(app_object.fileno()) else: do_iterator_stuff(app_object) On jython if type(app_object) is types.FileType: do_file_stuff(app_object) else: do_iterator_stuff(app_object) Is that so difficult to accept? [Phillip J. Eby] > I applaud your bravery in trying to make it work for Jython, but > changing the spec to allow other kinds of objects isn't going to > decrease the amount of work you have to do, only increase it for other > people who *aren't* trying to support 2.2 apps in a server running under > Python 2.1. It's not really about bravery, it's about wanting to maximize portability between available python platforms. I hope to achieve that through the application of a little pythonic simplicity. After all, we're just trying to move byte streams from one place to another: do we have to be this complex about it? > It's not about "file-like" objects, only *actual* file objects. > Returning a "file-like" object offers no meaningful performance boost, > and it is *not* supported -- and never was. Except when it is supported, for whatever complicated reasons, e.g. iterable objects with fileno()s. [Alan Kennedy] >> I think we should explicitly allow return of a file-like object, and >> thus freedom to use the read() method, etc. [Phillip J. Eby] > You are trying to let 2.2 code run in a 2.1 Python. Well, I see it as WSGI forcing me to jump through hoops in order to support the notion of iterability, even when that notion is NOT universally applicable, as the fileno() exception proves. > That is why this is a Python versioning issue, and specific to your > attempt to run 2.2 code in a 2.1 Python. It has absolutely nothing to > do with accepting "file-like" objects in the spec, which never accepted > them, nor is it intended to ever do so. > > Is this getting any clearer? Crystal. However, I think the absolute insistence on return objects being iterable is slightly arbitrary and unnecessarily constraining. I understand your desire to keep the spec clean and simple, and also your desire to use modern python facilities to do it. But those modern python facilities are not universally available, and, strictly speaking, not absolutely required. I'm suppose I'm just pleading for a little pythonic practicality. Maybe I'm just wasting my time? Maybe I'm the only one who is interested in seeing a jython WSGI server into which users can drop universal WSGI components and have them just work? Is anyone else interested in such a jython WSGI container? Or should I just toddle off back to J2EE servlets? Lastly, since the spec is still potentially a moving target, I've translated as much of my java as possible into jython, which will greatly speed up the prototyping process. Once the spec is finalized, I may translate it back to java, if there is a sufficient performance/other requirement for that. (I should have prototyped it in jython from the start, and saved myself a load of time). Kind regards, Alan. From pje at telecommunity.com Wed Sep 1 19:42:04 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 1 19:41:53 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <4136021E.6070907@xhaus.com> References: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com> <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040901131431.0329baf0@mail.telecommunity.com> At 06:08 PM 9/1/04 +0100, Alan Kennedy wrote: >I would summarise the position of the current spec as "you must return an >iterable, except when you want to return a file object, which will work >fine under cpython 2.2+, because files are iterable under cpython 2.2+, >even though they don't need to be iterable when they have fileno()". No, it's just that you must return an iterable, *period*. The fact that 2.2 allows this to be a file object is irrelevant, as is the fact that 2.1 doesn't allow this to be a file object. There are thousands of classes out there for both 2.1 and 2.2 that either are, or aren't, iterable, and that's equally irrelevant. >I'm just afraid that people will steam ahead writing WSGI middleware >applications which return file-objects, The part you keep leaving out is that such middleware is thereby targeted at Python 2.2, not 2.1. The spec explicitly mentions this. >which will fail on jython simply because putting the following lines in my >code is a violation of the spec > >if type(app_return) is types.FileType: > do_file_stuff(app_return) What you're doing here isn't a "violation", IMO, merely "out of scope". It's not up to the spec to explain how to make Python 2.1 support 2.2 features; IMO, that's all you're doing here, and it doesn't hurt anybody. >[Alan Kennedy] > >> 2. The spec, as is, explicitly permits authors of cpython applications > >> to return file-like objects, > >[Phillip J. Eby] > > Only if they are *iterable*, which is only true of the 'file' object in > > 2.2 and up. > >Which seems to me an arbitrary criterion, especially in the light that the >iterator nature of the file object will possibly (likely) not be actually >used, as described in the snippet above. The CGI runner won't use fileno(), and neither will many other servers. I don't see how the "iterable" criterion is arbitrary because some objects are iterable and others aren't. Any criterion we choose will by definition include some objects and not others. >In cpython frameworks, the code would look like this > >if hasattr(app_object, 'fileno'): > do_file_stuff(app_object.fileno()) >else: > do_iterator_stuff(app_object) > >On jython > >if type(app_object) is types.FileType: > do_file_stuff(app_object) >else: > do_iterator_stuff(app_object) > >Is that so difficult to accept? But that's exactly what the spec says to do *now*, except that it doesn't explicitly bless the type check. If you really want to have that blessing written into the spec, so be it. I just don't see it as a matter that's in scope of the spec, because it's not only Jython-specific, but specific to your server as well. Document your extension as you would any such extension. There's no law against being more *permissive* than the spec requires. I do not see any reason to burden *other* server authors by requiring them to support your extension, because no use cases have been presented for this for any situation *other* than a Jython 2.1 server trying to run a Python 2.2 application. >[Alan Kennedy] > >> I think we should explicitly allow return of a file-like object, and > >> thus freedom to use the read() method, etc. > >[Phillip J. Eby] > > You are trying to let 2.2 code run in a 2.1 Python. > >Well, I see it as WSGI forcing me to jump through hoops in order to >support the notion of iterability, even when that notion is NOT >universally applicable, as the fileno() exception proves. It's stretching a 2.2 spec to work with older versions of Python, largely intended for your benefit, as you were the first person who presented a strong use case for supporting *any* pre-2.2 version of Python. >However, I think the absolute insistence on return objects being iterable >is slightly arbitrary and unnecessarily constraining. For whom? I've given numerous examples of how trivial it is for code targeted to 2.1 or earlier to support making files and even file-like objects into iterables. This is a small burden for those who want their code to be portable to such versions. Similarly, it's not an unreasonable burden for your server to support extensions to 2.1 behavior in order to accommodate code not written for Python 2.1 compatibility. It *is* unreasonable to expand the spec to place those burdens on people who don't care about supporting 2.1, or who don't care about supporting 2.2 code under 2.1 >Maybe I'm just wasting my time? Maybe I'm the only one who is interested >in seeing a jython WSGI server into which users can drop universal WSGI >components and have them just work? Is anyone else interested in such a >jython WSGI container? Or should I just toddle off back to J2EE servlets? I agree with your intentions; I just don't agree that *other* server authors should be forced to duplicate your efforts if they don't have that use case. Iterability is the single simplest protocol that is universally accessible in any Python used in the last several years. It doesn't require any introspection. Currently, the common case code for a server looks like this: result = application(environ, start_response) try: for data in result: write(data) finally: if hasattr(result,'close'): result.close() This is perfectly valid implementation under the spec. Changing the spec to allow the application to return anything *but* iterables means complicating *every* server, for the sole benefit of applications that want to use 2.2 idioms under Python 2.1. If I were a server author targeting 2.2 and up (and I will be), I would rightly object to adding extra introspection to the above, when it will not benefit me or any user in my target audience. If my server requires 2.2, then obviously applications running under it can safely use 2.2 idioms. And if they're written for 2.1 they also work. So here's the resolution: I will slightly expand the section on supporting older versions of Python, to explicitly allow a 2.1 server to "forward-compatibly" check for 2.2 idioms such as returning a file object. I'd prefer not to do that, but not because I dislike the approach. We're in "violent agreement" on what *your* server should do about this, and I encourage you to implement it. Our disagreement (as I understand it) is that: 1. I think this is a "server-specific extension" that's outside the spec's scope to rule on the validity of, and 2. I don't think that requiring others to do what your code will be doing is a good idea, because they don't *need* to, unless they're trying to run 2.2 code on a 2.1 Python, which should *definitely* not be a requirement of the spec. From py-web-sig at xhaus.com Wed Sep 1 20:19:27 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 1 20:14:51 2004 Subject: [Web-SIG] Returned application object and fileno. In-Reply-To: <5.1.1.6.0.20040901131431.0329baf0@mail.telecommunity.com> References: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com> <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org> <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com> <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com> <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com> <5.1.1.6.0.20040901131431.0329baf0@mail.telecommunity.com> Message-ID: <413612AF.8040605@xhaus.com> Phillip, [Phillip J. Eby] > So here's the resolution: I will slightly expand the section on > supporting older versions of Python, to explicitly allow a 2.1 server to > "forward-compatibly" check for 2.2 idioms such as returning a file object. > > I'd prefer not to do that, but not because I dislike the approach. > We're in "violent agreement" on what *your* server should do about this, > and I encourage you to implement it. Our disagreement (as I understand > it) is that: > > 1. I think this is a "server-specific extension" that's outside the > spec's scope to rule on the validity of, and > > 2. I don't think that requiring others to do what your code will be > doing is a good idea, because they don't *need* to, unless they're > trying to run 2.2 code on a 2.1 Python, which should *definitely* not be > a requirement of the spec. That solution works for me. Although it may seem that we're in disagreement, I like to see that as a necessary part of moving forward :-) Possibly the reason why our points are slipping by each other somewhat is because you're making technical arguments and I'm making a primarily community/social argument: supporting the most up-to-date jython available (which is sadly out-of-date wrt cpython). And I've got to say you've got a much better and cleaner handle on the technics than I: I'm just a simple implementer who wants to make his framework as useful as possible to as wide an audience as possible. Just a last few points 1. It was never my intention to force complication of other people's frameworks, but I see now that that would be unavoidable if returning a file object was a part of the spec, and that would be a bad thing. 2. This whole problem *will* finally go away when jython 2.(2|3|4) appears (which I believe it will, though to do this properly will require Sun to open their chequebook). If I had the time or resources, I'd be putting all my efforts into getting jython 2.2 out the door. But I don't have that time or resource, so I'm falling back to doing the best that I can with what's available. And jython 2.1 is *rock-solid*, and in use all over the place: people trust it. 3. Your solution allows me to address the most common case that I believe would cause problems: that of framework authors returning a file-object (without realising that cpython 2.2+ was creating an iterator for them behind the scenes). I think this is going to be a very common design paradigm for WSGI middleware. 4. I'll be doing my level best to get all python code to run under modjy, regardless of the version it was written for. There might be a lot of frantic paddling going on underneath the surface, but above the waterline hopefully everything will be calm and serene ..... Thanks again for this initiative: I believe that WSGI is *definitely* the future for python web servers. Great job! Kind regards, Alan. From janssen at parc.com Thu Sep 2 03:07:19 2004 From: janssen at parc.com (Bill Janssen) Date: Thu Sep 2 03:07:40 2004 Subject: [Web-SIG] Bill's comments on WSGI draft 1.4 Message-ID: <04Sep1.180724pdt."58612"@synergy1.parc.xerox.com> Well, thanks to Andrew's comment about my non-participation, I've finally read PEP 333, version 1.4, and have a few comments. Phillip, great job, nice reasoning. I like the general design. I think the project as a whole is quite useful. I've been using a custom framework together with Medusa, and as I read I tried to imagine how my framework could be implemented under WSGI. There seem to be no show-stoppers, though I have yet to try it. A meta comment on commenting on PEP drafts: Without numbered sections, paragraphs, and lines, there's no effective way to point back to specific wording in the draft without quoting it. A few nits about WSGI: 1. The "environ" parameter must be a Python dict: I think subclasses should be allowed. A true subclass supports all methods of its ancestors, so the rationale presented in the back of the PEP for excluding them doesn't hold water. I think the appropriate check would be to see if the returned class is a subclass of the "dict" class. That is, "isinstance(e, dict)" should return True. 2. The "fileno" attribute on the returned iterable. I'm a bit concerned about using operating system file descriptors, due to resource constraints; I think a better check would be to see if the returned iterable is a subclass of the "file" class. That is, "isinstance(f, file)" should return true. 3. Comments about "The [status-line] string must be 7-bit ASCII...containing no control characters." That's overly restrictive; I think it would be better to simply refer to RFC 2616 and say that it should follow the rules defined there for "Reason-Phrase". 4. Similarly, the rules about header values are more restrictive than HTTP; they therefore prevent perfectly valid HTTP header values from being returned. That's bad. Again, I think the PEP should simply refer to RFC 2616 and say, "Use those rules". 5. The phrase about "if a server or gateway discards or overrides any application header for any reason, it must record this in a log"; that should be "should" instead of "must". Otherwise you'll have your log cluttered with innocuous header re-write messages, and no way to turn that off. 6. The "write()" callable is important; it should not be deprecated or in some other way made a poor stepchild of the iterable. 7. If an application returns an iterable after calling write(), are the strings produced by iteration written after those written by calls to write? 8. The note on Unicode: Unfortunately, Web standards like HTTP rely on using proper character sets. By *not* using Unicode strings, and by *not* specifying the character set encoding of the "raw" byte strings, we open the door for disastrous misunderstandings. The safest thing to do would be to require the framework to traffic in Unicode strings for things like header values, which the WSGI middleware would translate to or from the various required encodings used by the server and external protocols. At least with Unicode strings you know what encoding is being used. A riskier, more error-prone option would be to require the byte strings to be in particular encodings. The content strings, those written to the "write()" calls, or returned by the iterable, should in fact be byte vectors, exactly as they are currently specified. 9. There should be a non-optional way of indicating the URL scheme, whether it is "http", "https", or "ftp". I'd suggest "wsgi.scheme" in the environ. Bill From fumanchu at amor.org Thu Sep 2 04:12:12 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Sep 2 04:17:50 2004 Subject: [Web-SIG] Bill's comments on WSGI draft 1.4 Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> Bill Janssen wrote: > ... > 6. The "write()" callable is important; it should not be deprecated > or in some other way made a poor stepchild of the iterable. That's been my only question so far. I'd like to at least hear the rationale behind favoring iterables so heavily over write(). Robert Brewer MIS Amor Ministries fumanchu@amor.org From pje at telecommunity.com Thu Sep 2 05:25:56 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 2 05:25:53 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <04Sep1.180724pdt."58612"@synergy1.parc.xerox.com> Message-ID: <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> At 06:07 PM 9/1/04 -0700, Bill Janssen wrote: >1. The "environ" parameter must be a Python dict: I think subclasses >should be allowed. A true subclass supports all methods of its >ancestors, so the rationale presented in the back of the PEP for >excluding them doesn't hold water. I think the appropriate check >would be to see if the returned class is a subclass of the "dict" >class. That is, "isinstance(e, dict)" should return True. Paradoxically, allowing subclasses eliminates the usefulness of allowing subclasses. Presumably, the purpose of using a subclass is to provide some extended behavior, e.g. as an attribute/method, or as a byproduct of requesting particular keys or values. In both cases, these extended behaviors would be destroyed the minute that a piece of middleware decides to use its *own* dictionary subclass. This also ignores the issue that creating a dictionary subclass that *consistently* enforces some extended behavior (e.g. lazy evaluation of a key) is intrinsically difficult and fragile, because new versions of Python often introduce new dictionary methods that are not implemented in terms of other existing methods, thus breaking a previously "perfect" subclass when a new Python version is released. These are "practicality beats purity" argument, so I need to see some *practical* applications of dictionary subclasses that would be useful enough to outweigh both of the above issues. >2. The "fileno" attribute on the returned iterable. I'm a bit >concerned about using operating system file descriptors, due to >resource constraints; I think a better check would be to see if the >returned iterable is a subclass of the "file" class. That is, >"isinstance(f, file)" should return true. The purpose of 'fileno' is specifically to allow the use of operating system APIs that copy data from one file descriptor to another. Many Python objects have valid 'fileno' attributes besides files, including sockets and pipes. Many non-stdlib objects in common use have 'fileno' attributes that serve this purpose. 'select.select' takes objects with 'fileno', and so on. Because 'file' has a 'fileno' attribute, 'isinstance(f,file)' implies 'hasattr(f,"fileno")'. Therefore, the latter is the preferred behavior here, because it doesn't unnecessarily exclude other valid wrappers of file descriptors. >3. Comments about "The [status-line] string must be 7-bit >ASCII...containing no control characters." That's overly restrictive; >I think it would be better to simply refer to RFC 2616 and say that it >should follow the rules defined there for "Reason-Phrase". > >4. Similarly, the rules about header values are more restrictive than >HTTP; they therefore prevent perfectly valid HTTP header values from >being returned. That's bad. Again, I think the PEP should simply >refer to RFC 2616 and say, "Use those rules". These restrictions are intended to simplify servers and middleware; nobody has yet presented an example of a scenario where this imposed any practical limitation. The fallback position would be that the status string and headers must not be CR or CRLF terminated. But, I'd prefer to stick with a "no embedded control characters" approach, mainly to avoid situations where people embed '\n' and think that will be correct. Here's what RFC 2616 has to say about TEXT, which is the format of the status message and of header values: The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14]. TEXT = A CRLF is allowed in the definition of TEXT only as part of a header field continuation. It is expected that the folding LWS will be replaced with a single SP before interpretation of the TEXT value. In other words, no control characters except for folding, and 7-bit ASCII with optional ISO-8859-1. In practice, however, RFC 2047 allows for encoding ISO-8859-1 *in* 7-bit ASCII as well. So, the only actual limitation being imposed by the PEP is on folding, and on the necessary encoding of non-ASCII characters. Again, this is a practicality v. purity issue. Are you aware of any applications that currently fold their headers, or transmit ISO-8859-1 characters without using the encoding prescribed by RFC 2047? Is there a practical use case for either one? I'm willing to listen on this point, but as of the moment I find it hard to imagine what the use case for either of these features is. By contrast, I do have very specific use cases in mind where supporting those features causes problems: * Applications creating broken headers (e.g. with '\n' instead of '\r\n') or broken folds * Applications mistakenly transmitting Unicode without considering encoding issues * Middleware and servers forgetting to factor out folds when parsing data for interpretation * In order to ensure safe interpretation, smart middleware and server developers will have to write routines to *unfold* potentially-folded headers; why not just disallow folding to begin with? >5. The phrase about "if a server or gateway discards or overrides any >application header for any reason, it must record this in a log"; that >should be "should" instead of "must". Otherwise you'll have your log >cluttered with innocuous header re-write messages, and no way to turn >that off. How about "must provide the *option*" and "must be enabled by default"? Or, leave it as is, but add something like, "may provide the user with the option of suppressing this output, so that users who cannot fix a broken application are not forced to bear the pain of its error." >6. The "write()" callable is important; it should not be deprecated >or in some other way made a poor stepchild of the iterable. But it *is* one. The presence of the 'write()' facility significantly increases the implementation complexity for middleware and server authors. If it weren't necessary to support existing streaming APIs, it wouldn't exist. Earlier drafts treated it as a peer, which led to people making bad assumptions about its proper use. Making it a "poor stepchild" encourages people to investigate it only if they really need it, and only a very few applications actually need it. >7. If an application returns an iterable after calling write(), are >the strings produced by iteration written after those written by calls >to write? Yes. This is implicit in the way 'write()' and the iterable are defined, because the server must transmit a block yielded or passed to write() before returning control to the application. The only way to meet this constraint is for them to occur in sequence. However, the language should perhaps be clarified to be explicit about this point, and to address what happens if code *within* the iterator calls 'write()'. (I don't think it should be allowed to, but I'm open to arguments either way.) >8. The note on Unicode: Unfortunately, Web standards like HTTP rely >on using proper character sets. By *not* using Unicode strings, and >by *not* specifying the character set encoding of the "raw" byte >strings, we open the door for disastrous misunderstandings. The >safest thing to do would be to require the framework to traffic in >Unicode strings for things like header values, which the WSGI >middleware would translate to or from the various required encodings >used by the server and external protocols. At least with Unicode >strings you know what encoding is being used. This seems at odds with your previous desire to use RFC 2616, which is pretty clear that it's ISO-8859-1 or RFC 2047. PEP 333 goes further and says, it's ASCII, dammit, and use MIME header encodings (RFC 2047) if you need to do something special, because God help you if you're trying to mess with non-ASCII in HTTP headers and you don't know how to deal with that stuff. Granted, that part could be more explicit in the PEP, so I'll work on that. :) (Maybe not this week; I expect to spend tomorrow putting hurricane panels on my house, just ahead of Frances' arrival...) >A riskier, more error-prone option would be to require the byte >strings to be in particular encodings. That's actually what's required, it's merely implied by the PEP rather than explicitly stated. But it's a fully RFC-compliant way to do it. >The content strings, those written to the "write()" calls, or returned >by the iterable, should in fact be byte vectors, exactly as they are >currently specified. Glad there was something you liked. ;) (j/k) >9. There should be a non-optional way of indicating the URL scheme, >whether it is "http", "https", or "ftp". I'd suggest "wsgi.scheme" in >the environ. I rather like this, although I don't at all see how FTP gets into this. What the heck would CGI variables for FTP look like, I wonder? Anyway, it's handy for "http" and "https" at the very least. I'd prefer "wsgi.url_scheme" for the name, though, as it's otherwise a somewhat ambiguous name. From pje at telecommunity.com Thu Sep 2 05:32:12 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 2 05:32:10 2004 Subject: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20040901232754.02323050@mail.telecommunity.com> At 07:12 PM 9/1/04 -0700, Robert Brewer wrote: >Bill Janssen wrote: > > ... > > 6. The "write()" callable is important; it should not be deprecated > > or in some other way made a poor stepchild of the iterable. > >That's been my only question so far. I'd like to at least hear the >rationale behind favoring iterables so heavily over write(). One important reason: the server can suspend an iterable's execution without tying up a thread. It can therefore potentially use a much smaller thread pool to handle a given number of connections, because the threads are only tied up while they're executing an iterator 'next()' call. By contrast, 'write()' occurs *within* the application execution, so the only way to suspend execution is to suspend the thread (e.g. waiting for a lock). From ianb at colorstudy.com Thu Sep 2 07:48:56 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Sep 2 07:49:04 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> References: <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> Message-ID: <4136B448.8070707@colorstudy.com> Phillip J. Eby wrote: > At 06:07 PM 9/1/04 -0700, Bill Janssen wrote: > >> 1. The "environ" parameter must be a Python dict: I think subclasses >> should be allowed. A true subclass supports all methods of its >> ancestors, so the rationale presented in the back of the PEP for >> excluding them doesn't hold water. I think the appropriate check >> would be to see if the returned class is a subclass of the "dict" >> class. That is, "isinstance(e, dict)" should return True. > > > Paradoxically, allowing subclasses eliminates the usefulness of allowing > subclasses. Presumably, the purpose of using a subclass is to provide > some extended behavior, e.g. as an attribute/method, or as a byproduct > of requesting particular keys or values. In both cases, these extended > behaviors would be destroyed the minute that a piece of middleware > decides to use its *own* dictionary subclass. I agree strongly with you on this. Subclassing built in types is almost only useful for showing off clever tricks and distracting people who want to change the language. Code constantly contains assumptions that you can recreate built in types from their components, and then you lose the subclass. I also don't see any advantage, beyond theoretical. Any attempt to leverage a subclass is just as likely to cause problems as be a help. >> 9. There should be a non-optional way of indicating the URL scheme, >> whether it is "http", "https", or "ftp". I'd suggest "wsgi.scheme" in >> the environ. > > > I rather like this, although I don't at all see how FTP gets into this. > What the heck would CGI variables for FTP look like, I wonder? Anyway, > it's handy for "http" and "https" at the very least. I'd prefer > "wsgi.url_scheme" for the name, though, as it's otherwise a somewhat > ambiguous name. This sounds good to me too. I wanted HTTPS=on to be required, but wsgi.url_scheme would be more general anyway. It's pretty easy to imagine translating FTP to CGI variables, really. The requested URL (SCRIPT_NAME+PATH_INFO) is the file you are getting or putting, the REQUEST_METHOD is maybe GET or PUT (or maybe STOR and RETR, but GET and PUT would be more natural). Most of the other commands map to WeDAV methods. Obviously the server has to keep track of some state, but typically that state is boring to the application anyway. But that's all an aside. I can imagine mailto as well, when you pipe email to your application. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Thu Sep 2 08:24:55 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Sep 2 08:25:02 2004 Subject: [Web-SIG] Status code, status header In-Reply-To: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> Message-ID: <4136BCB7.8090309@colorstudy.com> Phillip J. Eby wrote: > At 10:01 PM 8/30/04 -0500, Ian Bicking wrote: > >> After a little thought, I'm -1 on a status header, even with >> email.Message. > > > I think email.Message is also dead, due to its absence in Python > versions prior to 2.2. > > >> I'm also +1 on turning status into an integer. I think it makes >> things a little simpler, and those message strings are just a >> distraction. The final server can put that string in ("200 OK", etc) >> if it wants to, but if it doesn't it doesn't matter. > > > I'm still -1 on this, for the reasons stated previously. I might be > convinced if you can show me that a significant number of popular > servers already have the necessary table(s) to do this with; e.g. > Twisted, ZServer, Apache (CGI/FastCGI), mod_python, etc. * Twisted does, in twisted.protocols.http * mod_python must somewhere; I don't think it allows you to provide a reason, you can only provide an integer code. * Zope does in ZPublisher.HTTPResponse * Apache does not add the reason string to CGI scripts that provide an explicit Status header but no reason. But it provides reasons for any status that it generates. I don't know about FastCGI. Part of why I think it's not useful is that in many cases the reason string is hard coded. In that case the reason string is synonymous with the code, and cannot be changed. Nor is anyone paying attention if you do change it, and there's nothing constructive that can be done with that string. > In theory, the "reason-phrase" can be null. In practice, I wonder. > Also, I don't think the message strings are "just a distraction": they > clarify the intent of the code that contains them. No one would ever pay attention to the string when there's that pleasant integer code to parser out. Plus the spec says not to. The names are fine, but the code and the reason string are redundant. The names are better represented with Python names, not a string that gets tacked on. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Thu Sep 2 08:28:11 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Sep 2 08:33:02 2004 Subject: [Web-SIG] wsgi.fatal_errors In-Reply-To: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> Message-ID: <4136BD7B.90308@colorstudy.com> Phillip J. Eby wrote: > At 11:15 PM 8/30/04 -0700, tony@lownds.com wrote: > >> > Here are some changes I've proposed in the last few days to resolve >> issues >> > people brought up, but which I haven't gotten much feedback on: >> > >> > * 'wsgi.fatal_errors' key for exceptions that apps and middleware >> > shouldn't >> > trap >> > >> >> What about defining an exception class that applications can raise >> with an >> HTML payload, which servers are supposed to send the to the client? >> Middleware should be free to alter the payload as much as they like. The >> server should not send the payload when content-type is not html. >> >> By using exceptions as a backchannel, the application and middleware do >> not have to keep track of the state to sanely handle an error. > > > Interesting. But I think you've just given me an idea for a possibly > simpler way to do this, with some other advantages. > > Suppose that instead of 'start_response(status,headers)' we had > 'set_response(status,headers,body=None)'. And the difference would be > that our 'set_response' does nothing until/unless you call write() or > yield a result from the return iterable. Therefore, you could call > 'set_response' multiple times, with only the last such call taking > effect. (If you supply a non-None 'body', then calling write() or > returning an iterable is an error.) This seems pretty reasonable. How necessary is that optional body argument? Couldn't you just use the write argument or return an iterable? > Now consider error handling middleware: it simply calls > 'set_response(error_status,error_headers,error_body)', and returns None. > > At this point, we've isolated the complexity to exist only for streaming > responses once the first body chunk has been generated. We can handle > this by making a call to 'set_response()' a fatal error if a body chunk > has been generated. Thus, no special handling is needed by an exception > handler: it just tries to do 'set_response()', and allows the fatal > error (if any) to propagate. Now, the server can catch the fatal error > and deal with it. > > I think this will let us keep all of the complications in the server, > where they always have to exist, no matter what else we do. > Exception-handling middleware is then delightfully simple. > > On the other hand, output-transforming middleware becomes somewhat more > complex, as it would now have three output sources to transform (body > param to set_response(), write(), and output iterable). > > This is a fairly significant change to the spec, that introduces lots of > new angles to cover. But, I think it could be an "exceptionally" clean > solution to the problem. ;) It sounded good until then; now I don't know. I think I'm -1 on that pun. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From neel at mediapulse.com Thu Sep 2 15:19:27 2004 From: neel at mediapulse.com (Michael C. Neel) Date: Thu Sep 2 15:19:04 2004 Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> Message-ID: <1094131167.4727.27.camel@mike.mediapulse.com> Well, I've seen alot of back and forth on file objects, write(), etc. I think it's of little issue myself, not that hard to return an interface that will support both methods. Let the programming working on the middlware/application decide against the tradeoffs from one method to another. In the framework I use, I've actually altered it to allow it's context object (which is connected to the output stream, among other things) to be used as a file object. The first need for this was to allow me to pass the object off to a cvs.writer object, when I then called with the result of a DB-API 2.0 fetchall(); and made a "Download as CSV" button work in no more than 4 lines of code. I could also see doing this with XML classes for a WSDL/SOAP system. Really off the wall, you could do this with the logging module, and send your logging statments to another server. I suppose with any of these I could grab the StringIO module and add a few extra lines to my code. Then again, a WSGI system could also do that in it's implementation and ever offer me the options of buffered or non-buffered output. As it's been said here before, adoption of the frameworks and server is going to be critical to WSGI. So I'd opt for more choice and flexibility; we're all smart guys here and I don't think we would turn down a good idea because of complexity. Mike From pje at telecommunity.com Thu Sep 2 15:31:45 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 2 15:31:45 2004 Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <1094131167.4727.27.camel@mike.mediapulse.com> References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> Message-ID: <5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com> At 09:19 AM 9/2/04 -0400, Michael C. Neel wrote: >Well, I've seen alot of back and forth on file objects, write(), etc. I >think it's of little issue myself, not that hard to return an interface >that will support both methods. Let the programming working on the >middlware/application decide against the tradeoffs from one method to >another. > >In the framework I use, I've actually altered it to allow it's context >object (which is connected to the output stream, among other things) to >be used as a file object. The first need for this was to allow me to >pass the object off to a cvs.writer object, when I then called with the >result of a DB-API 2.0 fetchall(); and made a "Download as CSV" button >work in no more than 4 lines of code. I could also see doing this with >XML classes for a WSDL/SOAP system. Really off the wall, you could do >this with the logging module, and send your logging statments to another >server. > >I suppose with any of these I could grab the StringIO module and add a >few extra lines to my code. Then again, a WSGI system could also do >that in it's implementation and ever offer me the options of buffered or >non-buffered output. Sorry, I've read through the above a few times and I haven't been able to figure out exactly what it is that you're proposing, or if you're proposing something at all. :( >As it's been said here before, adoption of the frameworks and server is >going to be critical to WSGI. So I'd opt for more choice and >flexibility; we're all smart guys here and I don't think we would turn >down a good idea because of complexity. These sentences seem diametrically opposed to me; choice and flexibility is precisely what we *don't* want in WSGI, as it dramatically increases the opportunity for breaking interoperability. Right now, it's still possible to write "dirt simple" implementations, because the requirements are minimal even though there are some options for improved performance. There's a *big* difference between an option and a choice. Choices double the work for everybody, while options only affect people who want to use them. To the greatest extent possible, we should eliminate choices, and keep the number of options reasonable. (For example, a few revisions ago, we dropped the "choice" of not returning an iterable.) From pje at telecommunity.com Thu Sep 2 15:39:12 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 2 15:39:10 2004 Subject: [Web-SIG] Status code, status header In-Reply-To: <4136BCB7.8090309@colorstudy.com> References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040902093214.033888f0@mail.telecommunity.com> At 01:24 AM 9/2/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>At 10:01 PM 8/30/04 -0500, Ian Bicking wrote: >>>I'm also +1 on turning status into an integer. I think it makes things >>>a little simpler, and those message strings are just a distraction. The >>>final server can put that string in ("200 OK", etc) if it wants to, but >>>if it doesn't it doesn't matter. >> >>I'm still -1 on this, for the reasons stated previously. I might be >>convinced if you can show me that a significant number of popular servers >>already have the necessary table(s) to do this with; e.g. Twisted, >>ZServer, Apache (CGI/FastCGI), mod_python, etc. > >* Twisted does, in twisted.protocols.http >* mod_python must somewhere; I don't think it allows you to provide a >reason, you can only provide an integer code. >* Zope does in ZPublisher.HTTPResponse Technically, ZPublisher is part of the *application* side, not the server side, which is a point in favor of the application side setting the reason. >* Apache does not add the reason string to CGI scripts that provide an >explicit Status header but no reason. So, a CGI gateway would have to have a table, or else generate messages like "502 Dude, this is whack!". :) >>In theory, the "reason-phrase" can be null. In practice, I wonder. >>Also, I don't think the message strings are "just a distraction": they >>clarify the intent of the code that contains them. > >No one would ever pay attention to the string when there's that pleasant >integer code to parser out. Plus the spec says not to. Huh? Are you saying that: start_response(405,headers) is more readable than: start_response("405 Method Not Allowed",headers) ???? From pje at telecommunity.com Thu Sep 2 15:42:51 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 2 15:42:48 2004 Subject: [Web-SIG] wsgi.fatal_errors In-Reply-To: <4136BD7B.90308@colorstudy.com> References: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040902093923.03386ec0@mail.telecommunity.com> At 01:28 AM 9/2/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>At 11:15 PM 8/30/04 -0700, tony@lownds.com wrote: >>>What about defining an exception class that applications can raise with an >>>HTML payload, which servers are supposed to send the to the client? >>>Middleware should be free to alter the payload as much as they like. The >>>server should not send the payload when content-type is not html. >>> >>>By using exceptions as a backchannel, the application and middleware do >>>not have to keep track of the state to sanely handle an error. >> >>Interesting. But I think you've just given me an idea for a possibly >>simpler way to do this, with some other advantages. >>Suppose that instead of 'start_response(status,headers)' we had >>'set_response(status,headers,body=None)'. And the difference would be >>that our 'set_response' does nothing until/unless you call write() or >>yield a result from the return iterable. Therefore, you could call >>'set_response' multiple times, with only the last such call taking >>effect. (If you supply a non-None 'body', then calling write() or >>returning an iterable is an error.) > >This seems pretty reasonable. How necessary is that optional body >argument? Couldn't you just use the write argument or return an iterable? The idea was to use it as a way to bypass non-exception middleware, without raising a fatal error. OTOH, maybe Tony's approach is actually better. >>This is a fairly significant change to the spec, that introduces lots of >>new angles to cover. But, I think it could be an "exceptionally" clean >>solution to the problem. ;) > >It sounded good until then; now I don't know. I think I'm -1 on that pun. I get the humor of the second sentence; is the first sentence also humor, or is it serious? From neel at mediapulse.com Thu Sep 2 15:55:48 2004 From: neel at mediapulse.com (Michael C. Neel) Date: Thu Sep 2 15:55:27 2004 Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com> References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com> Message-ID: <1094133348.4727.45.camel@mike.mediapulse.com> On Thu, 2004-09-02 at 09:31, Phillip J. Eby wrote: > Sorry, I've read through the above a few times and I haven't been able to > figure out exactly what it is that you're proposing, or if you're proposing > something at all. :( Sorry, I guess i'm not clear, but I was making a case for file objects based upon my past use of them. > These sentences seem diametrically opposed to me; choice and flexibility is > precisely what we *don't* want in WSGI, as it dramatically increases the > opportunity for breaking interoperability. Right now, it's still possible > to write "dirt simple" implementations, because the requirements are > minimal even though there are some options for improved performance. At the risk of angering a mob; what's on the table isn't perl level of 'there is more than one way to do it'; it's a object that supports two interfaces. Python's standard lib is full of objects that are file-like, so I don't even see this as something that is a stretch from the norm. > There's a *big* difference between an option and a choice. Choices double > the work for everybody, while options only affect people who want to use > them. To the greatest extent possible, we should eliminate choices, and > keep the number of options reasonable. (For example, a few revisions ago, > we dropped the "choice" of not returning an iterable.) Again, I don't see how this is alot of work or enough work that it prevents anyone from using it. The WSGI can simple state that the return can be used both as a file object and an iterable (which isn't that a bit redundant, I'll have to check but file objects are iterable correct?) I think this is the only issue over the PEP, at least the only major one from the amount of posts. Allowing both interfaces would be acceptable I think here, and solves the problem. Also, this is just a pre-PEP on a SIG atm; from PEPs I've followed in the past things are going to get worse when it's before the python community, and you'll really want the support of your SIG to help keep your sanity though the process, lol. Mike From py-web-sig at xhaus.com Thu Sep 2 16:33:26 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 2 16:28:46 2004 Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <1094133348.4727.45.camel@mike.mediapulse.com> References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com> <1094133348.4727.45.camel@mike.mediapulse.com> Message-ID: <41372F36.2020806@xhaus.com> [Michael C. Neel] > The WSGI can simple state that the > return can be used both as a file object and an iterable (which isn't > that a bit redundant, I'll have to check but file objects are iterable > correct?) I spent yesterday discussing this with Phillip, and now that I understand his design decision, I think it's the right one. Having frameworks and *all* middleware components deal with both files and iterables is an extra and unnecessary complication. And under python 2.2+, it's irrelevant anyway, because files *are* iterables. A problem only arises on <= 2.1 interpreters, which don't support iterators nearly as well as 2.2. And that's only a problem because of jython being 2.1 only: a problem I seem determined to make my own ;-) The strength of returning an iterable is that the framework can then control *when* the output is generated and sent. This fits perfectly with python's greatest strength in the web arena: it's simple and powerful mechanisms for event-driven processing. Robert Oschler asked earlier about the write callable vs. returning an iterator. I was going to reply, but Phillip got there before me. I would only add the following to his excellent explanation. 1. The write callable is only there to support "push" applications, where the application generates output and then pushes it through a channel set-up by the server/framework, thus relegating the framework to a kind of dumb switchboard. This sort of design is usually used in threaded servers, which can present scalability problems. 2. The main focus on iterators is the right one because it not only supports "push", as described above, but it also supports "pull", i.e. where the framework "pulls" output from the application when the time is right. The reason why this is a good thing is because the framework is in the best position to know when the client is ready to actually receive the output, through the use of events/readiness-notification on the client socket. The output is only transiently created when required and transmitted immediately to the user (potentially with no copying or buffering at all!): you don't have large lumps of output hanging around, consuming memory. If you want to create an architecture that works for both "push" and "pull", iterators are the way to go I do find it interesting that we've had no comments from the Zope or Twisted people. Glad to see Medusa people here though :-) Kind regards, Alan. P.S. Phillip, I hope you're not affected by that hurricane! I have friends in Tampa who counted themselves lucky to have escaped Charley: now here comes another one! It appears on the surface that the frequency of hurricanes in the gulf is increasing. From pje at telecommunity.com Thu Sep 2 16:36:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 2 16:36:29 2004 Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <41372F36.2020806@xhaus.com> References: <1094133348.4727.45.camel@mike.mediapulse.com> <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com> <1094133348.4727.45.camel@mike.mediapulse.com> Message-ID: <5.1.1.6.0.20040902103439.0244d5f0@mail.telecommunity.com> At 03:33 PM 9/2/04 +0100, Alan Kennedy wrote: >The strength of returning an iterable is that the framework can then >control *when* the output is generated and sent. This fits perfectly with >python's greatest strength in the web arena: it's simple and powerful >mechanisms for event-driven processing. For clarity's sake, please don't call gateways and servers "frameworks"; we're reserving that term for the application side. >P.S. Phillip, I hope you're not affected by that hurricane! I'm directly in its path, and have still not yet obtained anything to cover my windows with, which is why I'm now signing off discussion indefinitely. :( From ianb at colorstudy.com Thu Sep 2 17:47:10 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Sep 2 17:47:38 2004 Subject: [Web-SIG] wsgi.fatal_errors In-Reply-To: <5.1.1.6.0.20040902093923.03386ec0@mail.telecommunity.com> References: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com> <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com> <5.1.1.6.0.20040902093923.03386ec0@mail.telecommunity.com> Message-ID: <4137407E.7060308@colorstudy.com> Phillip J. Eby wrote: >>> This is a fairly significant change to the spec, that introduces lots >>> of new angles to cover. But, I think it could be an "exceptionally" >>> clean solution to the problem. ;) >> >> It sounded good until then; now I don't know. I think I'm -1 on that >> pun. > > I get the humor of the second sentence; is the first sentence also > humor, or is it serious? No, I was just commenting on the pun ;) -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Thu Sep 2 17:58:47 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Sep 2 17:59:21 2004 Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <1094131167.4727.27.camel@mike.mediapulse.com> References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net> <1094131167.4727.27.camel@mike.mediapulse.com> Message-ID: <41374337.4090807@colorstudy.com> Michael C. Neel wrote: > Well, I've seen alot of back and forth on file objects, write(), etc. I > think it's of little issue myself, not that hard to return an interface > that will support both methods. Let the programming working on the > middlware/application decide against the tradeoffs from one method to > another. > > In the framework I use, I've actually altered it to allow it's context > object (which is connected to the output stream, among other things) to > be used as a file object. The first need for this was to allow me to > pass the object off to a cvs.writer object, when I then called with the > result of a DB-API 2.0 fetchall(); and made a "Download as CSV" button > work in no more than 4 lines of code. I could also see doing this with > XML classes for a WSDL/SOAP system. Really off the wall, you could do > this with the logging module, and send your logging statments to another > server. FWIW, using WSGI I've handled like: class FakeFile: pass write = start_response(status, headers) f = FakeFile() f.write = write # now f is my file-like object... Or, it was suggested: start_response(status, headers) lst = [] f = FakeFile() f.write = lst.append # use f... return lst This way you are missing a couple methods that files typically have (writelines I guess); then again, you could add those to FakeFile easily enough. I find it feels a little hackish, but I think it should be reliable enough. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From ianb at colorstudy.com Thu Sep 2 18:19:11 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Sep 2 18:19:39 2004 Subject: [Web-SIG] Status code, status header In-Reply-To: <5.1.1.6.0.20040902093214.033888f0@mail.telecommunity.com> References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com> <5.1.1.6.0.20040902093214.033888f0@mail.telecommunity.com> Message-ID: <413747FF.4030803@colorstudy.com> Phillip J. Eby wrote: > At 01:24 AM 9/2/04 -0500, Ian Bicking wrote: > >> Phillip J. Eby wrote: >> >>> At 10:01 PM 8/30/04 -0500, Ian Bicking wrote: >>> >>>> I'm also +1 on turning status into an integer. I think it makes >>>> things a little simpler, and those message strings are just a >>>> distraction. The final server can put that string in ("200 OK", >>>> etc) if it wants to, but if it doesn't it doesn't matter. >>> >>> >>> I'm still -1 on this, for the reasons stated previously. I might be >>> convinced if you can show me that a significant number of popular >>> servers already have the necessary table(s) to do this with; e.g. >>> Twisted, ZServer, Apache (CGI/FastCGI), mod_python, etc. >> >> >> * Twisted does, in twisted.protocols.http >> * mod_python must somewhere; I don't think it allows you to provide a >> reason, you can only provide an integer code. >> * Zope does in ZPublisher.HTTPResponse > > Technically, ZPublisher is part of the *application* side, not the > server side, which is a point in favor of the application side setting > the reason. > > >> * Apache does not add the reason string to CGI scripts that provide an >> explicit Status header but no reason. > > > So, a CGI gateway would have to have a table, or else generate messages > like "502 Dude, this is whack!". :) It could generate no message, which would work just fine. Or it could include the table, which is finite and known. >>> In theory, the "reason-phrase" can be null. In practice, I wonder. >>> Also, I don't think the message strings are "just a distraction": >>> they clarify the intent of the code that contains them. >> >> >> No one would ever pay attention to the string when there's that >> pleasant integer code to parser out. Plus the spec says not to. > > > Huh? Are you saying that: > > start_response(405,headers) > > is more readable than: > > start_response("405 Method Not Allowed",headers) I would say that start_response(http.METHOD_NOT_ALLOWED, headers) is more readable. Or: start_response(405, headers) # method not allowed is just as readable. "Method not allowed" is just a comment, it isn't program information. Why propagate a comment through the system? Especially a comment that's assumed to be fixed and derivative? Or if it's not derivative, then you are just messing with people's heads. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From py-web-sig at xhaus.com Thu Sep 2 19:15:39 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 2 19:20:28 2004 Subject: [Web-SIG] Integer status codes. Message-ID: <4137553B.5000208@xhaus.com> Dear Web-Sig, Just a datapoint on status codes about J2EE. J2EE uses integer status codes, with human readable constants available in the javax.servlet.http.HttpServletRequest class, which works well. http://java.sun.com/j2ee/1.4/docs/api/index.html But I suppose that since WSGI has no classes to hang such constants on, it cannot use that tidy approach. Perhaps an environ variable called "wsgi.status"? Which could be a dictionary mapping integers to status strings? E.G. applications would write code like this def handler(environ, start_response): start_response(environ['wsgi.status']['FILE_NOT_FOUND'], [] ) Or maybe just a simple object containing integer constants? def handler(environ, start_response): start_response(environ['wsgi.status'].FILE_NOT_FOUND, [] ) I don't think I'd find the management of such a table/mapping that onerous. After all, there's only a few tens of status codes, and they don't change very often. And the code to implement it would be universal, i.e. easily copyable and pastable. If I can paste it into email, is it that much of a code management hassle? #---------------------------------------- status_to_int = { 'CONTINUE' : 100, 'SWITCHING_PROTOCOLS' : 101, 'OK' : 200, 'CREATED' : 201, 'ACCEPTED' : 202, 'NON_AUTHORITATIVE_INFORMATION' : 203, 'NO_CONTENT' : 204, 'RESET_CONTENT' : 205, 'PARTIAL_CONTENT' : 206, 'MULTIPLE_CHOICES' : 300, 'MOVED_PERMANENTLY' : 301, 'MOVED_TEMPORARILY' : 302, 'SEE_OTHER' : 303, 'NOT_MODIFIED' : 304, 'USE_PROXY' : 305, 'TEMPORARY_REDIRECT' : 307, 'BAD_REQUEST' : 400, 'UNAUTHORIZED' : 401, 'PAYMENT_REQUIRED' : 402, 'FORBIDDEN' : 403, 'NOT_FOUND' : 404, 'METHOD_NOT_ALLOWED' : 405, 'NOT_ACCEPTABLE' : 406, 'PROXY_AUTHENTICATION_REQUIRED' : 407, 'REQUEST_TIMEOUT' : 408, 'CONFLICT' : 409, 'GONE' : 410, 'LENGTH_REQUIRED' : 411, 'PRECONDITION_FAILED' : 412, 'REQUEST_ENTITY_TOO_LARGE' : 413, 'REQUEST_URI_TOO_LONG' : 414, 'UNSUPPORTED_MEDIA_TYPE' : 415, 'REQUESTED_RANGE_NOT_SATISFIABLE' : 416, 'EXPECTATION_FAILED' : 417, 'INTERNAL_SERVER_ERROR' : 500, 'NOT_IMPLEMENTED' : 501, 'BAD_GATEWAY' : 502, 'SERVICE_UNAVAILABLE' : 503, 'GATEWAY_TIMEOUT' : 504, 'HTTP_VERSION_NOT_SUPPORTED' : 505, } #---------------------------------------- I'm also happy to see things remain as they are. Having a human readable version of the code is handy for code self-documentation purposes. So I suppose it works just as well for authors to write the following examples in their own code start_response("200 Au quay!", [] ) start_response("200 Cool", [] ) start_response("200 That hoopy frood knows where his towel is", [] ) As long as the integer bit actually evaluates to an integer, it wouldn't be a problem. I sort of like the ability to play with these strings: think of the (Monty) pythonisms we could use in our middleware! start_response("404 Defenestrated", [] ) start_response("410 It's pining for the fjords!", [] ) start_response("414 Who are you calling big-nose, big-nose?", [] ) start_response("417 But I thought this was a cheese shop?", [] ) Regards, Alan. From py-web-sig at xhaus.com Thu Sep 2 19:33:02 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 2 19:28:20 2004 Subject: [Web-SIG] Integer status codes. In-Reply-To: <4137553B.5000208@xhaus.com> References: <4137553B.5000208@xhaus.com> Message-ID: <4137594E.3000109@xhaus.com> [Alan Kennedy] > J2EE uses integer status codes, with human readable constants available > in the javax.servlet.http.HttpServletRequest class, which works well. > > http://java.sun.com/j2ee/1.4/docs/api/index.html D'oh! That link should have been http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/http/HttpServletResponse.html Regards, Alan. From janssen at parc.com Thu Sep 2 21:47:03 2004 From: janssen at parc.com (Bill Janssen) Date: Thu Sep 2 21:47:59 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: Your message of "Wed, 01 Sep 2004 20:25:56 PDT." <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> Message-ID: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com> I think we need some terminology that I don't remember seeing. There are two sides to WSGI, the server side, which I'll call the "socket", and the framework side, which I'll call the "plug". If there are other terms already in use, please let me know. Let me ask first, has anyone written a "socket" layer for Medusa? > >1. The "environ" parameter must be a Python dict: I think subclasses > >should be allowed. > [...various reasons why this might be a bad idea are introducted...] > These are "practicality beats purity" argument, so I need to see some > *practical* applications of dictionary subclasses that would be useful > enough to outweigh both of the above issues. Phillip, these are good engineering reasons for socket developers not to use subclasses, but that restriction doesn't belong in WSGI. They may have other reasons for using subclasses that we haven't thought of (perhaps because they're using these dicts for additional purposes besides WSGI), and they should be allowed to use them. You don't want to try to fix things out of scope of this work. > Because 'file' has a 'fileno' attribute, 'isinstance(f,file)' implies > 'hasattr(f,"fileno")'. Therefore, the latter is the preferred behavior > here, because it doesn't unnecessarily exclude other valid wrappers of file > descriptors. I'm not familiar with all the ins and outs of files on Python and Jython and IronPython, so I'll just say, reasonable enough. Though I'd prefer to say, a file-like object (whatever that means). > These restrictions are intended to simplify servers and middleware; nobody > has yet presented an example of a scenario where this imposed any practical > limitation. Here's a scenario for you: I want to return a valid HTTP header that your WSGI layer doesn't allow! For example, accented Latin-1 characters, which are valid in the Reason-Phrase. Or for another example, a multi-line header value, which I actually use quite a bit, and which is perfectly valid in HTTP, and which your prohibition on control characters in header values breaks. > The fallback position would be that the status string and headers must not > be CR or CRLF terminated. The fallback position would be fine. > Are you aware of any > applications that currently fold their headers, or transmit ISO-8859-1 > characters without using the encoding prescribed by RFC 2047? Is there a > practical use case for either one? Whether or not our limited group currently knows of such a case is immaterial. This is an overly restrictive limitation with nothing, I'm afraid, but religion for its justification. Aside from clueless implementors (against which the gods themselves strive in vain), why would allowing any valid header value be a problem? > * In order to ensure safe interpretation, smart middleware and server > developers will have to write routines to *unfold* potentially-folded > headers; why not just disallow folding to begin with? Because it's allowed in the HTTP spec, and this is a general-purpose HTTP framework layer. > How about "must provide the *option*" and "must be enabled by default"? Or, > leave it as is, but add something like, "may provide the user with the > option of suppressing this output, so that users who cannot fix a broken > application are not forced to bear the pain of its error." That's fine with me. > >6. The "write()" callable is important; it should not be deprecated > >or in some other way made a poor stepchild of the iterable. > > But it *is* one. The presence of the 'write()' facility significantly > increases the implementation complexity for middleware and server > authors. If it weren't necessary to support existing streaming APIs, it > wouldn't exist. But supporting streaming APIs is an important consideration, from the point of view of authors actually writing code against a framework. It should be a peer methodology (or completely removed). Again, WSGI is a very general mechanism, which should provide mechanism, not enforce policy. That's the only way to get it widely accepted in all the server and framework projects. If you don't like the streaming model, write editorials about it, but don't try to cripple other people's software. > However, the language should perhaps be clarified to be explicit about this > point Yes. > and to address what happens if code *within* the iterator calls > 'write()'. (I don't think it should be allowed to, but I'm open to > arguments either way.) Good point. I tend to agree with you here. > This seems at odds with your previous desire to use RFC 2616, which is > pretty clear that it's ISO-8859-1 or RFC 2047. PEP 333 goes further and > says, it's ASCII, dammit, and use MIME header encodings (RFC 2047) if you > need to do something special, because God help you if you're trying to mess > with non-ASCII in HTTP headers and you don't know how to deal with that stuff. My problem here is not with PEP 333, but with Python strings in general. The only string type which carries an associated charset tag is Unicode. The byte strings are *some* string encoded in *some* character set encoding, but no one knows which encoding, for any given byte string. I meant to say that the characters used should be restricted to those specified in RFC 2616, but those characters should be passed in Unicode strings, so that we can safely apply the .encode() method to them. But simply specifying that the byte strings conform to RFC 2616 would be OK with me. As I say, with the current Python, our options are limited. > Glad there was something you liked. ;) (j/k) Hey, there was lots I liked! Most of my suggestions were about removing restrictions on areas outside of WSGI, I think. > I rather like this, although I don't at all see how FTP gets into > this. What the heck would CGI variables for FTP look like, I > wonder? Anyway, it's handy for "http" and "https" at the very least. I'd > prefer "wsgi.url_scheme" for the name, though, as it's otherwise a somewhat > ambiguous name. Sure, that's fine with me. As for "ftp", I was thinking of Medusa, which supports serving a number of protocols with the same framework. Bill From janssen at parc.com Thu Sep 2 21:52:46 2004 From: janssen at parc.com (Bill Janssen) Date: Thu Sep 2 21:53:43 2004 Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: Your message of "Thu, 02 Sep 2004 07:33:26 PDT." <41372F36.2020806@xhaus.com> Message-ID: <04Sep2.125247pdt."58612"@synergy1.parc.xerox.com> > 1. The write callable is only there to support "push" applications, > where the application generates output and then pushes it through a > channel set-up by the server/framework, thus relegating the framework to > a kind of dumb switchboard. This sort of design is usually used in > threaded servers, which can present scalability problems. It's also heavily used in CGI scripts. Bill From janssen at parc.com Thu Sep 2 21:55:38 2004 From: janssen at parc.com (Bill Janssen) Date: Thu Sep 2 21:56:49 2004 Subject: [Web-SIG] Status code, status header In-Reply-To: Your message of "Thu, 02 Sep 2004 09:19:11 PDT." <413747FF.4030803@colorstudy.com> Message-ID: <04Sep2.125542pdt."58612"@synergy1.parc.xerox.com> > I would say that start_response(http.METHOD_NOT_ALLOWED, headers) is > more readable. Or: > start_response(405, headers) # method not allowed > is just as readable. "Method not allowed" is just a comment, it isn't > program information. Why propagate a comment through the system? While I tend to prefer the integer codes, I could just point out that http.METHOD_NOT_ALLOWED could map to "405 Method Not Allowed" as easily as to 405. Bill From ianb at colorstudy.com Thu Sep 2 22:38:31 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Thu Sep 2 22:39:08 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com> References: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com> Message-ID: <413784C7.9040708@colorstudy.com> Bill Janssen wrote: > I think we need some terminology that I don't remember seeing. There > are two sides to WSGI, the server side, which I'll call the "socket", > and the framework side, which I'll call the "plug". If there are > other terms already in use, please let me know. Generally we're using the terms "server" and "application". And "middleware" is both a server and application. > Let me ask first, has anyone written a "socket" layer for Medusa? > > >>>1. The "environ" parameter must be a Python dict: I think subclasses >>>should be allowed. >> >>[...various reasons why this might be a bad idea are introducted...] >>These are "practicality beats purity" argument, so I need to see some >>*practical* applications of dictionary subclasses that would be useful >>enough to outweigh both of the above issues. > > > Phillip, these are good engineering reasons for socket developers not > to use subclasses, but that restriction doesn't belong in WSGI. They > may have other reasons for using subclasses that we haven't thought of > (perhaps because they're using these dicts for additional purposes > besides WSGI), and they should be allowed to use them. You don't want > to try to fix things out of scope of this work. The restriction is kind of there for the benefit of middleware, so that middleware can rewrite the environment without having to worry about losing anything (except parts it explicitly leaves out). By requiring it to be a dictionary, you can be sure that there are no side effects, no unusual requirements, it's consistent, and you can recreate a completely equivalent object. It means the environment is required to be a dumb container. The restriction that isinstance(environ, dict) be true isn't much of a requirement at all, because subclasses of dictionaries can override pretty much everything they care to. If isinstance was the only requirement, it might as well be required that the environment has a dictionary interface. >>These restrictions are intended to simplify servers and middleware; nobody >>has yet presented an example of a scenario where this imposed any practical >>limitation. > > > Here's a scenario for you: I want to return a valid HTTP header that > your WSGI layer doesn't allow! For example, accented Latin-1 > characters, which are valid in the Reason-Phrase. Or for another > example, a multi-line header value, which I actually use quite a bit, > and which is perfectly valid in HTTP, and which your prohibition on > control characters in header values breaks. Is an accented Latin-1 character a control character? I would have though a control character meant a character with a code less than 32. >>Are you aware of any >>applications that currently fold their headers, or transmit ISO-8859-1 >>characters without using the encoding prescribed by RFC 2047? Is there a >>practical use case for either one? > > > Whether or not our limited group currently knows of such a case is > immaterial. This is an overly restrictive limitation with nothing, > I'm afraid, but religion for its justification. Aside from clueless > implementors (against which the gods themselves strive in vain), why > would allowing any valid header value be a problem? Because it requires more work to parse and manipulate a more permissive standard. You have to worry about corner cases. >>* In order to ensure safe interpretation, smart middleware and server >>developers will have to write routines to *unfold* potentially-folded >>headers; why not just disallow folding to begin with? > > > Because it's allowed in the HTTP spec, and this is a general-purpose > HTTP framework layer. But it doesn't *matter*. And the HTTP spec very clearly *says* that it doesn't matter. Folded headers are allowed, but they don't *add* any functionality. So why allow it? In those cases where you are interfacing with something that allows folded headers, they would have to be normalized; but most Python frameworks don't allow folded headers (at least intentionally). I don't know if it would make a big difference if headers could be folded. But there should be *some* use case for it if it were allowed. >>>6. The "write()" callable is important; it should not be deprecated >>>or in some other way made a poor stepchild of the iterable. >> >>But it *is* one. The presence of the 'write()' facility significantly >>increases the implementation complexity for middleware and server >>authors. If it weren't necessary to support existing streaming APIs, it >>wouldn't exist. > > > But supporting streaming APIs is an important consideration, from the > point of view of authors actually writing code against a framework. > It should be a peer methodology (or completely removed). It effectively is a peer methodology. It's part of the standard and it will work with any server; it's not optional. The language Phillip wants to use is simply to encourage authors to prefer the iterable if that is an option. > Again, WSGI is a very general mechanism, which should provide > mechanism, not enforce policy. That's the only way to get it widely > accepted in all the server and framework projects. If you don't like > the streaming model, write editorials about it, but don't try to > cripple other people's software. There's no crippling, it is specifically allowed for. It's not the primary interface that frameworks require, so Phillip wants to encourage those framework to use the iterable when they can. For instance, in Webware the response object has a flush method. When that is called, the accumulated response will have to be written out via the write method. But in most cases a response is never flushed, it is cached completely until the request is over, and the whole page is sent at once. The language is there to encourage someone to go to the extra length to return an iterable in the common case, instead of doing the easier thing and always using write. Note that streaming can be implemented with the iterator interface. It's just a different streaming that wouldn't be compatible with all current frameworks. If you aren't streaming then there's no real difference between the two, except that the iterator gives the server more leeway in implementation. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From fumanchu at amor.org Thu Sep 2 22:36:32 2004 From: fumanchu at amor.org (Robert Brewer) Date: Thu Sep 2 22:42:15 2004 Subject: [Web-SIG] Bill's comments on WSGI draft 1.4 Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022EB0@exchange.hqamor.amorhq.net> Phillip J. Eby wrote: > > I'd like to at least hear the rationale behind > > favoring iterables so heavily over write(). > > One important reason: the server can suspend an iterable's execution > without tying up a thread. It can therefore potentially use > a much smaller thread pool to handle a given number of connections, > because the threads are only tied up while they're executing an > iterator 'next()' call. > > By contrast, 'write()' occurs *within* the application execution, > so the only way to suspend execution is to suspend the thread (e.g. > waiting for a lock). Hmm. I still don't get it--why would the server not simply "suspend execution" of the framework within the write() call? In my naive estimation, it would be the difference between: for chunk in framework.data: output(chunk) do_out_of_band_stuff() ...and: def write(chunk): output(chunk) do_out_of_band_stuff() ...and in fact, I see most existing servers having to do both when they grow WSGI interfaces, since both are allowed in the WSGI spec (even if one is deprecated). Maybe you could add a line or two of pseudocode to help me understand...? (Assuming you're not fleeing for your life from hurricanes, that is ;) Stay safe, Robert Brewer MIS Amor Ministries fumanchu@amor.org From janssen at parc.com Fri Sep 3 01:15:09 2004 From: janssen at parc.com (Bill Janssen) Date: Fri Sep 3 01:15:34 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: Your message of "Thu, 02 Sep 2004 13:38:31 PDT." <413784C7.9040708@colorstudy.com> Message-ID: <04Sep2.161513pdt."58612"@synergy1.parc.xerox.com> > The restriction that isinstance(environ, dict) be true isn't much of a > requirement at all, because subclasses of dictionaries can override > pretty much everything they care to. If isinstance was the only > requirement, it might as well be required that the environment has a > dictionary interface. Except that "a dictionary interface" is very poorly defined, while the isinstance check is very well defined. But this is a small point; I won't argue it further. > > Here's a scenario for you: I want to return a valid HTTP header that > > your WSGI layer doesn't allow! For example, accented Latin-1 > > characters, which are valid in the Reason-Phrase. Or for another > > example, a multi-line header value, which I actually use quite a bit, > > and which is perfectly valid in HTTP, and which your prohibition on > > control characters in header values breaks. > > Is an accented Latin-1 character a control character? I would have > though a control character meant a character with a code less than 32. You're right. I was confusing the requirements on headers with the "status" argument, which is unnecessarily restricted to ASCII. > Because it requires more work to parse and manipulate a more permissive > standard. You have to worry about corner cases. How much more work? Why is this restriction in particular a good one? > There's no crippling, it [streaming] is specifically allowed for. It's not the > primary interface that frameworks require, so Phillip wants to encourage > those framework to use the iterable when they can. Why? Why is an editorial opinion in the technology spec? And, which frameworks are you talking about? Isn't this on the "server" or "socket" side of things, rather than the "application" or "plug" or "framework" side of things? Bill From andrew at andreweland.org Fri Sep 3 11:44:50 2004 From: andrew at andreweland.org (Andrew Eland) Date: Fri Sep 3 11:55:52 2004 Subject: [Web-SIG] Integer status codes. In-Reply-To: <4137553B.5000208@xhaus.com> References: <4137553B.5000208@xhaus.com> Message-ID: <41383D12.6030601@andreweland.org> Alan Kennedy wrote: > But I suppose that since WSGI has no classes to hang such constants on, > it cannot use that tidy approach. Maybe we could try to have the constants added to another module in the standard library. httplib would be an obvious choice. -- Andrew (http://www.andreweland.org) From py-web-sig at xhaus.com Fri Sep 3 14:07:12 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Fri Sep 3 14:02:30 2004 Subject: [Web-SIG] Iterators, generators and threads. Message-ID: <41385E70.20507@xhaus.com> Dear Sig, With the focus on iterables in WSGI, I think we may need to put something into the WSGI spec about generators and threading. As I'm sure you're all aware, generators are an excellent mechanism for generating content on demand: a perfect fit for memory efficient WSGI "pull" processing and for event driven servers. However, generator-iterators are different from other iterables, in that they cannot be resumed/iterated simultaneously from multiple threads (without external locking anyway). Pep 255 is specific on the topic: "Restriction: A generator cannot be resumed while it is actively running". Which effectively means that a generator cannot be used from multiple threads without some form of external synchronization/locking. Offhand, I can't think of scenarios where a WSGI server or application would *need* to iterate over an iterable across multiple threads. But I can certainly think of multiple server architectures where the request and its related response will pass through multiple threads before completion. Whether or not it would make sense for such architectures to iterate an iterable from multiple threads: well, I don't know: is it possible some server designer might attempt something like this? Which would probably work as long as the iterable is not a generator. But if it is: *boom*, the generator could be resumed simultaneously from multiple threads, thus resulting in a ValueError. Perhaps we need to describe this problem in the PEP? Or are python programmers suppoed to be big and old enough to know these things? I find myself wondering: is this a cpython specific thing? Does resuming a generator from multiple threads have any meaning? Obviously, calling a standard function/method from different threads works because each thread gets an independent stack frame, i.e. local variables, etc. So if there is no (unsynchronized) shared state between the threads, everything will work fine. Since a generator is a single resumable stack frame, resuming it multiple times simultaneously from multiple threads won't work, from an isolation point-of-view. Or am I mis-understanding it? Is the restriction somehow related to the cpython's GIL? Obviously, resuming general iterators from multiple threads is related. Pep 234 makes no statements about threads (well, one unrelated reference to modifying dictionaries while they are being iterated). So I take this to mean that iterating iterables from multiple threads is acceptable. Regards, Alan. P.S. I hope Phillip is OK. He said yesterday that he was right in the Frances path, although obviously that path will have a significant margin for error. But Frances is *huge*: see this stunning picture from NASA. http://antwrp.gsfc.nasa.gov/apod/ap040903.html From janssen at parc.com Fri Sep 3 21:34:15 2004 From: janssen at parc.com (Bill Janssen) Date: Fri Sep 3 21:34:41 2004 Subject: [Web-SIG] Integer status codes. In-Reply-To: Your message of "Fri, 03 Sep 2004 02:44:50 PDT." <41383D12.6030601@andreweland.org> Message-ID: <04Sep3.123418pdt."58612"@synergy1.parc.xerox.com> I think submitting a bug report ("httplib doesn't define constants for standard HTTP status messages"), plus a patch, would probably get it done. Bill > Alan Kennedy wrote: > > > But I suppose that since WSGI has no classes to hang such constants on, > > it cannot use that tidy approach. > > Maybe we could try to have the constants added to another module in the > standard library. httplib would be an obvious choice. > > -- Andrew (http://www.andreweland.org) From jjl at pobox.com Sat Sep 4 18:38:45 2004 From: jjl at pobox.com (John J Lee) Date: Sat Sep 4 18:38:18 2004 Subject: [Web-SIG] Integer status codes. In-Reply-To: <04Sep3.123418pdt."58612"@synergy1.parc.xerox.com> References: <04Sep3.123418pdt."58612"@synergy1.parc.xerox.com> Message-ID: [Andrew] > > Maybe we could try to have the constants added to another module in the > > standard library. httplib would be an obvious choice. [Bill Janssen] > I think submitting a bug report ("httplib doesn't define constants for > standard HTTP status messages"), plus a patch, would probably get it > done. +1 John From py-web-sig at xhaus.com Sun Sep 5 23:56:10 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Sun Sep 5 23:51:16 2004 Subject: [Web-SIG] Standardised configuration and temporary directories. Message-ID: <413B8B7A.4090401@xhaus.com> Dear Sig, While thinking about writing middleware, two issues occurred to me that may need to be addressed in the WSGI spec. 1. Temporary storage/scratch directory. It is common in servers and frameworks to provide a particular location for applications to store temporary files, etc: a temporary directory. This prevents applications from picking their own temporary directories, which provides platform independence, security and isolation. I think that this is a such a common thing that may be worth requiring a WSGI environment variable for it, e.g. environ['wsgi.temp_dir'] I realise that this could be considered a server specific thing, but server-specific variables mean lack of portability. Perhaps some containers will not be able to provide the temporary area: in that case it is better for the application or middleware to check for environ['wsgi.temp_dir'] == None than to check for perhaps a dozen or more possible server variables. 2. Standardised parameter configuration and specification. When I am plugging middleware into a server, it often has need of its own configuration. For example, session handling middleware may need to retrieve the name of file system directory to persist session files into, or connection details for an RDBMS, etc. Obviously, such configuration values need to be configured somewhere. 1. It could be done in the middleware source file itself, e.g. in global variables. However, I really don't like this, since it would mean changing source files, instead of leaving a standard versioned distribution untouched and read-only. 2. The session middleware could have its own configuration mechanism. It would define a standard way for it, and it alone, to be configured, e.g. it names the location of its configuration file. I think that this also is problematic, primarily becuase lots of different middleware authors will pick lots of different ways of configuring their stuff, leading to platform-specific errors, need for debugging, code rewriting, etc. And I think that the purpose of WSGI is to help prevent this kind of wheel re-invention. A more promising place to put it is in the WSGI environment. The next two methods are different ways of doing that. 3. It could perhaps be set by another middleware component that is prior to the session handler in the middleware stack: some form of general configuration component for example. I like this more than the above options, because it concentrates configuration into one place. Or rather two places, because there is also the server specific configuration file, whose contents actually configure how the server drives the request through the middleware stack. In my case, that is a Tomcat server.xml file, where I have several parameters which configure my wsgi servlet. 4. It could be configured in the server configuration file, e.g. the Tomcat server.xml with modjy, the Apache httpd.conf with mod_python, environment variables with CGI, etc, etc. I like this one the most because it means that there is only one configuration environment to manage. So, as an example, let's say my session middleware is looking for the following variables my_fancy_sessions.cookies my_fancy_sessions.storage_dir Ideally, it would be nice to be able to have a standardised way of specifying these variables in a centralised location. Why? Because when the middleware authors are writing documentation for their module, they could write something like """ Make sure to set values for the following WSGI variables, in whatever way is appropriate for your chosen WSGI server. my_fancy_sessions.cookies = True my_fancy_sessions.storage_dir = '/var/modjy/session_dir' """ So, if I was configuring it to run under modjy, my servlet description would look something like this modjy com.xhaus.wsgi.Modjy python.home C:/jython21 my_fancy_sessions.cookies True my_fancy_sessions.storage_dir /var/modjy/session_dir A CGI implementation could examine the contents of say a WSGI_ENVIRON os.environ variable, which might contain """ my_fancy_sessions.cookies = True my_fancy_sessions.storage_dir = '/var/modjy/session_dir' """ Etc, etc. I'm still not sure about having such a standard configuration mechanism, or how such a thing would be presented inside the WSGI environment. But it does seem to me to be an area that needs addressing. Perhaps a simple solution would be to add wording like the following to the PEP: """ WSGI compliant servers must provide a simple mechanism for users to place name/value pairs in the WSGI environment, without modification or transformation. This is to make it easy for users to gather all middleware (i.e. server-independent) configuration under one centralized configuration mechanism. """ Or maybe I'm off base. Maybe session handling middleware is not the sort of thing that is meant to be universally portable? Regards, Alan. From paul.boddie at ementor.no Mon Sep 6 10:16:29 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 6 10:16:37 2004 Subject: [Web-SIG] Standardised configuration and temporary directories. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net> Alan Kennedy wrote: > > While thinking about writing middleware, two issues occurred to me that > may need to be addressed in the WSGI spec. > > 1. Temporary storage/scratch directory. I've been thinking about this at the level above frameworks, and I do wonder how far up in the applications stack this information would remain useful. If you consider something like Zope, I think the only place where this kind of thing is exposed to applications is in the machinery around file uploads, but I don't necessarily think you'd want applications directly interfering with such directories. That said, for both applications and frameworks, it is interesting to define concepts such as shared and private storage, and at a low enough level I can imagine that things like temporary directories are relevant. (It is almost shocking to see what cgi.FieldStorage does with temporary files, I might add.) [...] > 2. Standardised parameter configuration and specification. As you've said, various frameworks provide mechanisms for specifying parameters, yet this means that there isn't a single method of administration for developers or users who don't care enough about those frameworks to know how to deal with them all. I'm inclined to think that better tools could be the answer here - if you have a simple configuration file reminiscent of Webware's .config files (which are Python modules with simple dictionaries or attributes) then different tools could produce Apache .conf files or Java Servlet web.xml files, for example. [...] > I'm still not sure about having such a standard configuration mechanism, > or how such a thing would be presented inside the WSGI environment. But > it does seem to me to be an area that needs addressing. I've avoided this issue with WebStack so far, mostly because the configuration done at the adapter level (the glue code between frameworks and WebStack applications/frameworks) mainly covers things like the server port number and other things that aren't particularly interesting at higher levels. Moreover, applications can often be configured through things like modules acting as configuration files, and such things are clearly separate from issues of framework configuration. Paul From py-web-sig at xhaus.com Mon Sep 6 14:02:45 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Sep 6 13:57:51 2004 Subject: [Web-SIG] Standardised configuration and temporary directories. In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net> References: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net> Message-ID: <413C51E5.2090107@xhaus.com> [Alan Kennedy] >>2. Standardised parameter configuration and specification. [Paul Boddie] > As you've said, various frameworks provide mechanisms for specifying > parameters, yet this means that there isn't a single method of > administration for developers or users who don't care enough about > those frameworks to know how to deal with them all. I'm inclined to > think that better tools could be the answer here - if you have a > simple configuration file reminiscent of Webware's .config files > (which are Python modules with simple dictionaries or attributes) > then different tools could produce Apache .conf files or Java > Servlet web.xml files, for example. Paul, thanks for taking the time to reply. On thinking about the configuration issue further on the way into work, I've changed my mind :-) The original two options I presented for configuration were A: By a specialised middleware component. B: In the server configuration file. (I will now call this the "platform configuration file"). I originally thought that option B was the best, but now I think differently. And from what I read from your post, Paul, I think we're in agreement. Configuring the middleware stack is really the entire purpose of a python WSGI server. The platform in which the server and application reside, e.g. Apache, CGI, Tomcat, etc, should not be relevant. Instead, in an ideal scenario, the entire python application, i.e. server + middleware + configuration, should be portable to another platform(+WSGI layer). If this is to be the case, then the middleware and its configuration would be best kept under centralised python control, which would facilitate maximum portability between platforms. Conversely, as little as possible should be kept in the platform configuration file: ideally platforms should be the thinnest possible layer required to deliver WSGI requests to the python WSGI server. Which leads to the question of how best to configure middleware, in the server configuration. Taking the example of the session handling middleware:- 1. The server configuration specifies the middleware stack to be constructed for responding to requests. Parameters for specific pieces of middleware could be specified as parameters to the constructors for each component. For example, configuring session handling could go like this middleware_stack.append ( my_fancy_session_handler ( cookies=True, storage_dir='/var/session_dir' ) ) 2. Or there could be some standardised way for a server to specify config values to middleware components, e.g. middleware_config['my_fancy_sessions.cookies'] = True middleware_config['my_fancy_sessions.storage_dir'] = '/var/session_dir' middleware_stack.append(my_fancy_session_handler()) And there's probably a few other different ways to do it as well. Although I know we're firmly in the realm of server-specific configuration here, an area where WSGI may need to remain agnostic, it would be nice to standardise these configuration issues, in order to maximize portability of servers, middleware and configuration. Regards, Alan. From paul.boddie at ementor.no Mon Sep 6 14:14:19 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 6 14:14:30 2004 Subject: [Web-SIG] Standardised configuration and temporary directories. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CAA5@100nooslmsg005.common.alpharoot.net> Alan Kennedy wrote: > > On thinking about the configuration issue further on the way into work, > I've changed my mind :-) > > The original two options I presented for configuration were > > A: By a specialised middleware component. > > B: In the server configuration file. (I will now call this the "platform > configuration file"). > > I originally thought that option B was the best, but now I think > differently. And from what I read from your post, Paul, I think we're in > agreement. Are we? ;-) Certain things like sessions are most likely to be configured in the server environment. In Tomcat, for example, that would be in one of the XML configuration files, but for something like Apache/mod_python it would be nicest to use httpd.conf or a related file, and Webware and Zope store sessions in their own particular way - note that Zope uses its own special mechanisms which might not correspond exactly with the conceptual model you envisage. > Configuring the middleware stack is really the entire purpose of a > python WSGI server. The platform in which the server and application > reside, e.g. Apache, CGI, Tomcat, etc, should not be relevant. Instead, > in an ideal scenario, the entire python application, i.e. server + > middleware + configuration, should be portable to another platform(+WSGI > layer). That's what WebStack is about: the same code runs on the seven supported frameworks without any changes. Currently, the only server configuration required consists of the following kinds of activities: * Add directives to Apache's httpd.conf (for anything using Apache). * Add context definitions to Webware's configuration. * Prepare a .war file for a Java servlet container. * Add a product to a Zope 2 instance. Some interaction with the server configuration is clearly going to be necessary. > If this is to be the case, then the middleware and its configuration > would be best kept under centralised python control, which would > facilitate maximum portability between platforms. I think what we agree on is that much of an application's configuration can be done at a fairly high level. An application which stores stuff in the filesystem or which uses a database system doesn't necessarily need to have that kind of configuration entered into web.xml or httpd.conf, and it should be possible to keep that configuration portable, although I can imagine complications with things like Tomcat which define JDBC connections in the XML configuration files. Paul From py-web-sig at xhaus.com Mon Sep 6 14:46:13 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Sep 6 14:41:19 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <413B8B7A.4090401@xhaus.com> References: <413B8B7A.4090401@xhaus.com> Message-ID: <413C5C15.6030003@xhaus.com> [Alan Kennedy] >>1. Temporary storage/scratch directory. On thinking further about the temp directory issue, I see now that it is but one example of a class of problems relating to accessing physical resources on the local machine. The other main one that springs to mind is how WSGI applications discover the file-system path name that corresponds to an URI. CGI defines a "PATH_TRANSLATED" variable for this purpose, but "PATH_TRANSLATED" is a poor solution to the problem, IMHO. In order to explain what I mean, I'm going to go through an example. Say I have an Apache installation, running CGI scripts. Assume that my cgi-bin directory is at the root level of my document root, so my document root looks like this (I'm using DOS path names, to illustrate a point) DOCUMENT_ROOT = "c:\\htdocs\\" CGI_BIN = "c:\\htdocs\\cgi-bin\\" Now, say I receive a request for the following URI http://localhost/cgi-bin/my_application.py/images/stars.jpg The CGI variables for this request would be set as follows:- SCRIPT_NAME = "" PATH_INFO = "/images/stars.jpg" PATH_TRANSLATED = "c:\\htdocs\\images\\stars.jpg" And I want to introduce another variable, giving the path to the actual script CONTEXT_PATH = "c:\\htdocs\\cgi-bin\\my_application.py" There are a few points to make here 1. The contents of the PATH_TRANSLATED variable are not necessarily what I want. The standard definition for PATH_TRANSLATED is PATH_TRANSLATED = DOCUMENT_ROOT + PATH_INFO, i.e. PATH_TRANSLATED = 'c:\\htdocs\\' + '/images/stars.jpg', i.e. PATH_TRANSLATED = 'c:\\htdocs\\images\\stars.jpg' But what happens if I really want the path translated to a point relative to my cgi script, for example, not relative to the document root, i.e. what I really want is PATH_TRANSLATED = CONTEXT_PATH + PATH_INFO, i.e. PATH_TRANSLATED = 'c:\\htdocs\\cgi-bin\\application.py' + \ '/images/stars.jpg', i.e. PATH_TRANSLATED = 'c:\\htdocs\\cgi-bin\\images\\stars.jpg' 2. Because of the platform (i.e. windoze, *nix) specific path names returned for PATH_TRANSLATED, it is a hassle to write path manipulation functions which will reliably deliver the final path name that I am seeking. I could take the content of the PATH_TRANSLATED variable, subtract PATH_INFO from it again (being careful to deal correctly with "\" vs. "/"), and then work out my own path to the physical resource. But this is just going to cause all kinds of portability problems. Therefore I propose that WSGI somehow attempt to standardise access to local resources on the disk. This could be done, perhaps, by providing a function which resolves a logical URI to a physical resource. J2EE has just such a function (surprise ;-), called ServletContext.getRealPath(), which returns a file-system path name which is relative to the CONTEXT_PATH mentioned above. Without WSGI providing such local mapping functions, I don't see how WSGI applications/middleware can map URIs to files, without undertaking platform specific tricks. I know it might look like I'm trying to drag WSGI into being more container-oriented, more like J2EE for example. But I think the above issues are sufficiently commonplace/universal that it is worth dealing with them in a standardised way. Regards, Alan. From brsizer at kylotan.eidosnet.co.uk Mon Sep 6 15:23:18 2004 From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer) Date: Mon Sep 6 15:21:01 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <413C5C15.6030003@xhaus.com> References: <413B8B7A.4090401@xhaus.com> <413C5C15.6030003@xhaus.com> Message-ID: <413C64C6.2020408@kylotan.eidosnet.co.uk> Alan Kennedy wrote: > The other main one that springs to mind is how WSGI applications > discover the file-system path name that corresponds to an URI. I thought that one of the major features of most of these Python web frameworks is that a URI doesn't map to a file but to an object or a function, several of which might be in one physical file. Since WSGI seems to be promoted as a minimal system that applies equally to almost any system, I'd think that such a mapping falls entirely out of its scope. I agree that it might be useful to have this functionality. I think a standard way to map URIs to Python files would be beneficial for Python web development. I just don't see it fitting into what people here have told me about WSGI. -- Ben Sizer. From pje at telecommunity.com Mon Sep 6 15:23:59 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 15:23:19 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com> References: Message-ID: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> [skipping stuff that Ian answered] At 12:47 PM 9/2/04 -0700, Bill Janssen wrote: >I'm not familiar with all the ins and outs of files on Python and >Jython and IronPython, so I'll just say, reasonable enough. Though >I'd prefer to say, a file-like object (whatever that means). File-like is out of scope; there were only ever two kinds of objects intended to be returnable: 1) Iterables (the initial scope) 2) Objects that map to an operating system file descriptor, as an optional special case to increase performance (added later per user request) I think that perhaps because files (under 2.2+ at least) meet *both* of these criteria, some folks have construed this to mean that we really should allow any file-like object, when "file-like" never had anything to do with anything. It's a total red herring that has nothing to do with the spec's intent. I will add something to the Q&A section about this. > > These restrictions are intended to simplify servers and middleware; nobody > > has yet presented an example of a scenario where this imposed any > practical > > limitation. > >Here's a scenario for you: I want to return a valid HTTP header that >your WSGI layer doesn't allow! For example, accented Latin-1 >characters, which are valid in the Reason-Phrase. Technically, you could use the MIME header encoding support to put them in, encoded in 7-bit ASCII, as is allowed by RFC 2616. OTOH, I could see allowing 8-bit strings in ISO-8859-1 encoding as per RFC 2616, and don't see significant practical problems in doing so. > Or for another >example, a multi-line header value, which I actually use quite a bit, >and which is perfectly valid in HTTP, and which your prohibition on >control characters in header values breaks. > > > The fallback position would be that the status string and headers must not > > be CR or CRLF terminated. > >The fallback position would be fine. I'm currently still strongly -1 on allowing folding; the only thing that's going to budge me is use cases. I only accept "on general principle" arguments when they *simplify* compliance and make the spec more robust, not when they make compliance more difficult. Header folding adds repetitive boilerplate processing to all middleware that processes headers: boilerplate that can and will be written incorrectly or sometimes omitted because somebody forgot that headers are allowed to be folded. Before too long, the practical advice to WSGI application authors will be, "don't fold headers because it breaks a lot of middleware", and we'll be right back where we could've been in the first place if we just banned folding from the get-go. Meanwhile, the people who will have paid the price for this is all the conscientious implementors who tried to write code that would work properly with header folding. From py-web-sig at xhaus.com Mon Sep 6 15:30:00 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Sep 6 15:25:16 2004 Subject: [Web-SIG] Standardised configuration. In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CAA5@100nooslmsg005.common.alpharoot.net> References: <89DE0F3E9781C048A14DC88C06D9F93D18CAA5@100nooslmsg005.common.alpharoot.net> Message-ID: <413C6658.6070007@xhaus.com> [Alan Kennedy] >> I originally thought that option B was the best, but now I think >> differently. And from what I read from your post, Paul, I think >> we're in agreement. [Paul Boddie] > Are we? ;-) Certain things like sessions are most likely to be > configured in the server environment. In Tomcat, for example, that > would be in one of the XML configuration files, but for something > like Apache/mod_python it would be nicest to use httpd.conf or a > related file, and Webware and Zope store sessions in their own > particular way - note that Zope uses its own special mechanisms > which might not correspond exactly with the conceptual model you > envisage. Ah, now we're getting somewhere. I think that session handling is an excellent example against which to have this discussion. Note however that I am *not* advocating standardising session management under WSGI. J2EE session handling is generally a huge PITA, primarily because the base unit of session management is the servlet context, i.e. every servlet context gets its own "session space". For example '/forms' may map to one session space, while '/news' may map to a different session space. Any given user may have multiple sessions on a server, depending on the number of servlets they have interacted with. It is generally not possible, except using container specific methods, to have a single "uber-session" which concentrates all user session data into a single object. This "hierarchy problem" makes it difficult, and extremely container-specific, to manage a single set of users across a set of J2EE servlets. Most J2EE containers support both cookies and URL rewriting for session management, i.e. if the user-agent has cookies disabled, then all urls are rewritten to contain sessions IDs. Which means that the url rewriting algorithm has to be aware of multiple servlet contexts, and rewrite local urls to contain the session ID which is specific to the target context/servlet. Some J2EE containers support a "Single Sign On" facility, where the container manages the multiple session objects on the applications behalf, and makes it easy for the user (but not the programer) by only making them sign on to a server once. Tomcat does this using an extra cookie, the SSO cookie, which is transmitted to user-agents *as well as* the per-servlet cookie, i.e. the user-agent receives two cookies from the container. Worse, the Tomcat Single-Sign-On facility does not support URL rewriting: the user-agent *must* have cookies enabled in order for single sign on to work. Which sucks. I think that if WSGI applications were to rely on the local platform/container session management facilities, it is extremely unlikely that they would be portable. It's difficult enough to get coherent cross-servlet session-handling working on J2EE when writing in java, as these pages show http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/host.html#Single%20Sign%20On http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/valve.html#Single%20Sign%20On%20Valve http://www.fwd.at/tomcat/sharing-session-data-howto.html Imagine the complications if the application code were originally written to work with say, WebWare under cpython? To me, session handling is one of those things that is done in so many different ways by so many different platforms/containers that it is impractical to achieve application portability once a particular methodology has been chosen. So, IMHO, session handling is one of those "should be simple" areas of web programming that gets horrifically complicated when trying to move applications between platforms/containers: in fact I'd go so far as to say the multiple session handling techniques is one of the primary reasons why the python web world is currently so fragmented: every framework author thinks they know best: although some do it much better than others. I like webwares method of using URL path parameters, with a auto-refresh if a request is received that doesn't contain a session ID. But IIRC, this method is quite Apache specific, and requires modification of the Apache httpd.conf to get working. I could be wrong though. It is important to note that I am *not* advocating standardising session management under WSGI: far from it. But what I am advocating is trying to make it as easy as possible for session-handling middleware components to be as portable between WSGI servers as possible. WSGI, as it currently stands, makes it far easier to do this than any other approach: I'm just trying to foresee and eliminate the last few % that stands in the way of 100% portability. Regards, Alan. From paul.boddie at ementor.no Mon Sep 6 15:32:39 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 6 15:32:42 2004 Subject: [Web-SIG] Standardising containment. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CAF4@100nooslmsg005.common.alpharoot.net> Ben Sizer wrote: > > Alan Kennedy wrote: > > The other main one that springs to mind is how WSGI applications > > discover the file-system path name that corresponds to an URI. > > I thought that one of the major features of most of these Python web > frameworks is that a URI doesn't map to a file but to an object or a > function, several of which might be in one physical file. Since WSGI > seems to be promoted as a minimal system that applies equally to almost > any system, I'd think that such a mapping falls entirely out of its scope. It probably does for WSGI, although I wonder how such issues (and the many others out there) can be simultaneously avoided and yet anticipated by the specification in order to avoid incompatibilities later on. > I agree that it might be useful to have this functionality. I think a > standard way to map URIs to Python files would be beneficial for Python > web development. I just don't see it fitting into what people here have > told me about WSGI. I suppose that Alan is moving slowly up the stack. It's an interesting issue that existing frameworks have addressed in their own ways (the getRealPath that Alan mentioned, Webware's getServerSidePath, and so on), and although one can wonder whether application data (which the image example could almost be considered as being) should be configured within or with reference to the server environment or not, if you consider having to specify the filenames of resources within an application, it's much nicer to be able to make those filenames relative to some deployment variable (eg. where the application ends up when deployed) and to keep those resources bundled with the application than to have to manually configure the application to use absolute paths before/during/after deployment. I hope that made sense. ;-) Paul From pje at telecommunity.com Mon Sep 6 15:38:13 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 15:37:27 2004 Subject: [Web-SIG] Bill's comments on WSGI draft 1.4 In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022EB0@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20040906092441.02e8a610@mail.telecommunity.com> At 01:36 PM 9/2/04 -0700, Robert Brewer wrote: >Phillip J. Eby wrote: > > > I'd like to at least hear the rationale behind > > > favoring iterables so heavily over write(). > > > > One important reason: the server can suspend an iterable's execution > > without tying up a thread. It can therefore potentially use > > a much smaller thread pool to handle a given number of connections, > > because the threads are only tied up while they're executing an > > iterator 'next()' call. > > > > By contrast, 'write()' occurs *within* the application execution, > > so the only way to suspend execution is to suspend the thread (e.g. > > waiting for a lock). > >Hmm. I still don't get it--why would the server not simply "suspend >execution" of the framework within the write() call? In my naive >estimation, it would be the difference between: > >for chunk in framework.data: > output(chunk) > do_out_of_band_stuff() > >..and: > >def write(chunk): > output(chunk) > do_out_of_band_stuff() Because now you've moved the server code into the application thread; many Python web servers (pretty much all of the async ones including Medusa, Twisted, and ZServer) have a single thread for all I/O operations, distinct from the threads that run application requests. So, if you want to perform I/O from an app thread, you need lock synchronization code that didn't exist before... and the design rapidly becomes more complicated. Anyway, such servers' write() methods will probably look more like: def write(self,data): self.output_queue.put(data) and they'll then return to the caller. However, this has new issues of its own: specifically, if a program transmits a large file, it will consume lots of memory if it produces data faster than the client can accept it. (Because the output queue will back up.) Of course, one can throttle the output queue to some set maximum size, but then you end up right where I began this discussion: the application thread has to hang, tying up that thread's availability until the app's execution is complete, and thus reducing the concurrent request throughput of the server. If, however, the application is structured as an iterable, these problems all go away. Application threads are only tied up for computation, not waiting for I/O, output a client isn't going to receive is never produced, large memory buffers aren't needed, and so on. So, on purely technical grounds, the iterable approach is immensely superior; it should be used wherever practical to do so. >..and in fact, I see most existing servers having to do both when they >grow WSGI interfaces, since both are allowed in the WSGI spec (even if >one is deprecated). Yes, servers will have to support both; but it should be understood that for many important servers (especially ones written in Python) that applications using 'write()' may have detrimental effects on the server's overall throughput, even if the application seems to run quite well on say, a local connection to an unloaded server. So, that's why people should be discouraged from using 'write()' outside of necessity. >Maybe you could add a line or two of pseudocode to >help me understand...? (Assuming you're not fleeing for your life from >hurricanes, that is ;) Hurricane's past me now; I just got power and 'net back this morning. From pje at telecommunity.com Mon Sep 6 15:43:24 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 15:42:39 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <04Sep2.161513pdt."58612"@synergy1.parc.xerox.com> References: Message-ID: <5.1.1.6.0.20040906093835.02e804e0@mail.telecommunity.com> At 04:15 PM 9/2/04 -0700, Bill Janssen wrote: >[Ian Bicking] > > There's no crippling, it [streaming] is specifically allowed for. It's > not the > > primary interface that frameworks require, so Phillip wants to encourage > > those framework to use the iterable when they can. > >Why? Why is an editorial opinion in the technology spec? Why do you think it's a technology spec? I thought I was previously quite clear on this list that PEP 333 is "an attempt at market manipulation by social engineering mind control" (or something to that general effect), so that puts editorial opinion well within its scope, IMO. :) > And, which >frameworks are you talking about? Isn't this on the "server" or >"socket" side of things, rather than the "application" or "plug" or >"framework" side of things? Ian was speaking of application frameworks. Specifically, we wish to discourage use of 'write()' because it's "bad citizenship" for an application to hog the thread it's running in. Being iterable allows the server to control multitasking better, and thus improve the server's overall throughput. While 'write()' has to be available to support legacy streaming API's, it's not at all efficient for the typical asynchronous web server written in Python. From pje at telecommunity.com Mon Sep 6 15:47:03 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 15:46:18 2004 Subject: [Web-SIG] Iterators, generators and threads. In-Reply-To: <41385E70.20507@xhaus.com> Message-ID: <5.1.1.6.0.20040906094347.02e8cec0@mail.telecommunity.com> At 01:07 PM 9/3/04 +0100, Alan Kennedy wrote: >Offhand, I can't think of scenarios where a WSGI server or application >would *need* to iterate over an iterable across multiple threads. But I >can certainly think of multiple server architectures where the request and >its related response will pass through multiple threads before completion. >Whether or not it would make sense for such architectures to iterate an >iterable from multiple threads: well, I don't know: is it possible some >server designer might attempt something like this? > >Which would probably work as long as the iterable is not a generator. But >if it is: *boom*, the generator could be resumed simultaneously from >multiple threads, thus resulting in a ValueError. Generators don't actually add a new problem here. Pretend we're talking about a list object instead. If you were to "resume it simultaneously from multiple threads", what would happen? Well, you'd send items twice, or out of order. So, obviously, you can't iterate over *any* iterable returned by a WSGI app from multiple threads unless you serialize the access. From paul.boddie at ementor.no Mon Sep 6 15:48:37 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 6 15:48:41 2004 Subject: [Web-SIG] Standardised configuration. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.common.alpharoot.net> Alan Kennedy wrote: > > I think that session handling is an excellent example against which to > have this discussion. Note however that I am *not* advocating > standardising session management under WSGI. There will be plenty of other places to standardise it, I'm sure. ;-) > J2EE session handling is generally a huge PITA, primarily because the > base unit of session management is the servlet context, i.e. every > servlet context gets its own "session space". For example > > '/forms' may map to one session space, while > '/news' may map to a different session space. > > Any given user may have multiple sessions on a server, depending on the > number of servlets they have interacted with. It is generally not > possible, except using container specific methods, to have a single > "uber-session" which concentrates all user session data into a single > object. This "hierarchy problem" makes it difficult, and extremely > container-specific, to manage a single set of users across a set of J2EE > servlets. Session sharing sounds like a great idea, and I've seen some pretty unfortunate workarounds to achieve such things, but then overreliance on such mechanisms can be very restrictive if you change the "topology" of your system architecture (ie. relocate one application to another server). > Most J2EE containers support both cookies and URL rewriting for session > management, i.e. if the user-agent has cookies disabled, then all urls > are rewritten to contain sessions IDs. Which means that the url > rewriting algorithm has to be aware of multiple servlet contexts, and > rewrite local urls to contain the session ID which is specific to the > target context/servlet. This is a pretty nasty problem that WSGI and other technologies could do relatively cleanly for once. > Some J2EE containers support a "Single Sign On" facility, where the > container manages the multiple session objects on the applications > behalf, and makes it easy for the user (but not the programer) by only > making them sign on to a server once. Tomcat does this using an extra > cookie, the SSO cookie, which is transmitted to user-agents *as well as* > the per-servlet cookie, i.e. the user-agent receives two cookies from > the container. Worse, the Tomcat Single-Sign-On facility does not > support URL rewriting: the user-agent *must* have cookies enabled in > order for single sign on to work. Which sucks. I guess you haven't seen other SSO solutions, then, or are too polite to mention them. ;-) > I think that if WSGI applications were to rely on the local > platform/container session management facilities, it is extremely > unlikely that they would be portable. It's difficult enough to get > coherent cross-servlet session-handling working on J2EE when writing in > java, as these pages show > > http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/host.html#Single% 20Sign%20On > http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/valve.html#Single %20Sign%20On%20Valve > http://www.fwd.at/tomcat/sharing-session-data-howto.html > > Imagine the complications if the application code were originally > written to work with say, WebWare under cpython? Sharing sessions between completely different framework implementations (eg. Webware and mod_python) within some kind of WSGI infrastructure is going to be an extremely difficult thing to achieve, mostly because the session store implementations are probably not interoperable - I haven't checked, but the chances of interoperability are fairly low, I would think. My opinion is that as soon as you're sharing session information, you're moving towards some kind of shared database situation, anyway. > To me, session handling is one of those things that is done in so many > different ways by so many different platforms/containers that it is > impractical to achieve application portability once a particular > methodology has been chosen. You'll have to clarify that. I've been working on WebStack functionality which at least allows applications to treat sessions in the same way, and it shouldn't be surprising that this is possible given the narrow range of operations that most session implementations expose. Of course, were it possible for an application running on Webware to suddenly, between HTTP requests, find itself "migrated" to Twisted, it would be a bit much to expect that application to find its sessions intact after the move. > So, IMHO, session handling is one of those "should be simple" areas of > web programming that gets horrifically complicated when trying to move > applications between platforms/containers: in fact I'd go so far as to > say the multiple session handling techniques is one of the primary > reasons why the python web world is currently so fragmented: every > framework author thinks they know best: although some do it much better > than others. I like webwares method of using URL path parameters, with a > auto-refresh if a request is received that doesn't contain a session ID. > But IIRC, this method is quite Apache specific, and requires > modification of the Apache httpd.conf to get working. I could be wrong > though. Are you advocating a common session manager? I can see some major benefits with something like that. Paul From pje at telecommunity.com Mon Sep 6 15:53:27 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 15:52:41 2004 Subject: [Web-SIG] Standardised configuration and temporary directories. In-Reply-To: <413B8B7A.4090401@xhaus.com> Message-ID: <5.1.1.6.0.20040906094818.02e8a4d0@mail.telecommunity.com> At 10:56 PM 9/5/04 +0100, Alan Kennedy wrote: >2. Standardised parameter configuration and specification. >[snip] > >Perhaps a simple solution would be to add wording like the following to >the PEP: > >""" >WSGI compliant servers must provide a simple mechanism for users to place >name/value pairs in the WSGI environment, without modification or >transformation. This is to make it easy for users to gather all middleware >(i.e. server-independent) configuration under one centralized >configuration mechanism. >""" I could go for something like this as a *should*, as long as it was explained that the simplest possible implementation is to simply include operating system environment variables in 'environ'. (And at that point, your desire for temporary directory info might be met by simply using 'environ["TMP"]'!) From pje at telecommunity.com Mon Sep 6 16:00:40 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 15:59:54 2004 Subject: [Web-SIG] Standardised configuration and temporary directories. In-Reply-To: <413C51E5.2090107@xhaus.com> References: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net> <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net> Message-ID: <5.1.1.6.0.20040906095537.02e8e020@mail.telecommunity.com> At 01:02 PM 9/6/04 +0100, Alan Kennedy wrote: >Configuring the middleware stack is really the entire purpose of a python >WSGI server. The platform in which the server and application reside, e.g. >Apache, CGI, Tomcat, etc, should not be relevant. Instead, in an ideal >scenario, the entire python application, i.e. server + middleware + >configuration, should be portable to another platform(+WSGI layer). > >If this is to be the case, then the middleware and its configuration would >be best kept under centralised python control, which would facilitate >maximum portability between platforms. [snip] This is starting to get into the area of portable deployment standards for WSGI developers, which is mostly out of scope for the current PEP. I'd like to see us get some field experience in various ways to do it before we choose the "one obvious" way to do it. That being said, don't let me stop y'all from discussing various ways to do it, because if nobody does that we'll never get to the "one way" part. :) I just don't expect the discussion to yield anything that would convince me to "bless" a single option before PEP 333 is finalized. From pje at telecommunity.com Mon Sep 6 16:11:02 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 16:10:18 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <413C5C15.6030003@xhaus.com> References: <413B8B7A.4090401@xhaus.com> <413B8B7A.4090401@xhaus.com> Message-ID: <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com> At 01:46 PM 9/6/04 +0100, Alan Kennedy wrote: >[Alan Kennedy] > >>1. Temporary storage/scratch directory. > >On thinking further about the temp directory issue, I see now that it is >but one example of a class of problems relating to accessing physical >resources on the local machine. > >The other main one that springs to mind is how WSGI applications discover >the file-system path name that corresponds to an URI. *boggle* Why do you think that URIs have anything to do with file paths? In the general case, they are entirely unrelated. >[snip] >Therefore I propose that WSGI somehow attempt to standardise access to >local resources on the disk. This could be done, perhaps, by providing a >function which resolves a logical URI to a physical resource. J2EE has >just such a function (surprise ;-), called ServletContext.getRealPath(), >which returns a file-system path name which is relative to the >CONTEXT_PATH mentioned above. > >Without WSGI providing such local mapping functions, I don't see how WSGI >applications/middleware can map URIs to files, without undertaking >platform specific tricks. Well-written Python applications make this sort of thing part of their configuration today already, because in the general case (e.g. mod_rewrite) this stuff just plain isn't guessable. Also, if you need access to local resources, relative to some Python module, just grab the '__file__' attribute/variable of that module, and then use 'os.path' functions to portably manipulate it. E.g.: my_dir = os.path.dirname(__file__) target = os.path.join(os.path.join(my_dir,"images"),"stars.jpg") This is simple and portable. If you need something more complex, you should probably have configuration specific to the application that spells out what it needs to know. From paul.boddie at ementor.no Mon Sep 6 16:12:00 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 6 16:12:04 2004 Subject: [Web-SIG] Standardised configuration and temporary directories. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB0F@100nooslmsg005.common.alpharoot.net> Phillip J. Eby wrote: > > This is starting to get into the area of portable deployment standards for > WSGI developers, which is mostly out of scope for the current PEP. I'd > like to see us get some field experience in various ways to do it before > we choose the "one obvious" way to do it. > > That being said, don't let me stop y'all from discussing various ways to > do it, because if nobody does that we'll never get to the "one way" part. > :) I just don't expect the discussion to yield anything that would > convince me to "bless" a single option before PEP 333 is finalized. Well, this is the Web-SIG mailing list (as opposed to the WSGI mailing list), so there will hopefully be a bit more discussion, some experimentation and eventually some results to point to by the time any other PEPs get written. It has been a while since there was this much focus on Python Web standardisation on any mailing list. Paul From pje at telecommunity.com Mon Sep 6 16:21:27 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 16:20:40 2004 Subject: [Web-SIG] Standardised configuration. In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.comm on.alpharoot.net> Message-ID: <5.1.1.6.0.20040906101509.02e8f670@mail.telecommunity.com> At 03:48 PM 9/6/04 +0200, Paul Boddie wrote: >Alan Kennedy wrote: > > > > I think that session handling is an excellent example against which to > > > have this discussion. Note however that I am *not* advocating > > standardising session management under WSGI. > >There will be plenty of other places to standardise it, I'm sure. ;-) > > > J2EE session handling is generally a huge PITA, primarily because the > > base unit of session management is the servlet context, i.e. every > > servlet context gets its own "session space". For example > > > > '/forms' may map to one session space, while > > '/news' may map to a different session space. > > > > Any given user may have multiple sessions on a server, depending on >the > > number of servlets they have interacted with. It is generally not > > possible, except using container specific methods, to have a single > > "uber-session" which concentrates all user session data into a single > > object. This "hierarchy problem" makes it difficult, and extremely > > container-specific, to manage a single set of users across a set of >J2EE > > servlets. > >Session sharing sounds like a great idea, and I've seen some pretty >unfortunate workarounds to achieve such things, but then overreliance on >such mechanisms can be very restrictive if you change the "topology" of >your >system architecture (ie. relocate one application to another server). Just to throw another thought in here, keep in mind that one could write a "cookie consolidator" WSGI component that would send its own session-management cookie to the client after removing application-sent cookies from the responses and saving them somewhere locally. When a request comes in, the "cookie consolidator" would read its own cookie from HTTP_COOKIE, and then add the stored cookie data before passing it on to the application. So, from the app's point of view, it's as if all the cookies are going to the client, but in reality there's only one, with the rest of the data stored server-side. One could presumably also extend this cookie consolidator to manage other kinds of session keys as well, such as ones embedded in the URL. Or, for that matter, you could write one that embeds its session key in the URL instead of in a cookie, but still makes it look to the application as if cookies are being used. From david at sundayta.com Mon Sep 6 16:22:44 2004 From: david at sundayta.com (David Warnock) Date: Mon Sep 6 16:22:54 2004 Subject: [Web-SIG] Standardised configuration. In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.common.alpharoot.net> References: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.common.alpharoot.net> Message-ID: <413C72B4.3020507@sundayta.com> Paul, > You'll have to clarify that. I've been working on WebStack functionality > which at least allows applications to treat sessions in the same way, > and it > shouldn't be surprising that this is possible given the narrow range of > operations that most session implementations expose. Of course, were it > possible for an application running on Webware to suddenly, between HTTP > requests, find itself "migrated" to Twisted, it would be a bit much to > expect that application to find its sessions intact after the move. But I for 1 can certainly imagine an "application" consisting of multiple servers, so that parts of the "application/site" are webware, part twisted, part quixote. If all these supported wsgi and if there were a wsgi session add-on then surely this heads towards the possible, and that makes lots of things much easier to assemble/develop/extend. Dave From py-web-sig at xhaus.com Mon Sep 6 16:38:26 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Sep 6 16:33:31 2004 Subject: [Web-SIG] Standardised configuration. In-Reply-To: <5.1.1.6.0.20040906101509.02e8f670@mail.telecommunity.com> References: <5.1.1.6.0.20040906101509.02e8f670@mail.telecommunity.com> Message-ID: <413C7662.8010800@xhaus.com> [Phillip J. Eby] > Just to throw another thought in here, keep in mind that one could write > a "cookie consolidator" WSGI component that would send its own > session-management cookie to the client after removing application-sent > cookies from the responses and saving them somewhere locally. When a > request comes in, the "cookie consolidator" would read its own cookie > from HTTP_COOKIE, and then add the stored cookie data before passing it > on to the application. So, from the app's point of view, it's as if all > the cookies are going to the client, but in reality there's only one, > with the rest of the data stored server-side. > > One could presumably also extend this cookie consolidator to manage > other kinds of session keys as well, such as ones embedded in the URL. > Or, for that matter, you could write one that embeds its session key in > the URL instead of in a cookie, but still makes it look to the > application as if cookies are being used. That's an excellent idea, and could solve the problem of multiple session handling techniques very well, and in a portable manner. However, it would only work for WSGI middleware components that are above that session component in the middleware stack. If the application administrator had configured session management in the platform configuration file, e.g. tomcat server.xml, then that session management would be run *after* the entire WSGI middleware stack had completed. But that's not a problem according to my view of things: avoiding platform managed sessions is the whole point. Were I running a WSGI middleware stack inside Apache or Tomcat, I'd want to disable the "native" session handling completely, and instead take care of it entirely within the WSGI stack. Regards, Alan. From paul.boddie at ementor.no Mon Sep 6 16:33:58 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 6 16:34:01 2004 Subject: [Web-SIG] Standardised configuration. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB17@100nooslmsg005.common.alpharoot.net> David Warnock wrote: > > But I for 1 can certainly imagine an "application" consisting of multiple > servers, so that parts of the "application/site" are webware, part > twisted, part quixote. If all these supported wsgi and if there were a > wsgi session add-on then surely this heads towards the possible, and that > makes lots of things much easier to assemble/develop/extend. Yes, once you've discarded the Webware session mechanisms (or most likely swapped them out within Webware itself), and once you've done the same with Quixote and Twisted (or quite probably added sessions to Twisted unless it comes with session support these days), you could have a session manager of some kind under the applications. It might even have to happen under the frameworks, since I suppose you would need to define how best to make these servers co-exist and then add this session manager so that all server environments are affected in the same way. What I've done so far with certain WebStack examples is to provide a resource which deals with authentication and then to add the actual application functionality as a resource within that resource. I imagine that the chaining of WSGI components would be done in a similar fashion, although WebStack doesn't address the issue of dispatching through different server environments, whereas your example situation would have to tackle that issue. Paul From py-web-sig at xhaus.com Mon Sep 6 16:56:33 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Sep 6 16:51:38 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com> References: <413B8B7A.4090401@xhaus.com> <413B8B7A.4090401@xhaus.com> <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com> Message-ID: <413C7AA1.8010702@xhaus.com> [Alan Kennedy] >> The other main one that springs to mind is how WSGI applications >> discover the file-system path name that corresponds to an URI. [Phillip J. Eby] > *boggle* Why do you think that URIs have anything to do with file > paths? In the general case, they are entirely unrelated. Well, perhaps it's just that pretty much every web server/harness/framework I ever used has support for mapping URIs to files. How silly of me to try to apply my experience of other web systems to WSGI. In the *general* case, yes, such a mapping has no meaning. But there are specific cases, e.g. static file serving, where it is required. [Phillip J. Eby] > Well-written Python applications make this sort of thing part of their > configuration today already, because in the general case (e.g. > mod_rewrite) this stuff just plain isn't guessable. It doesn't even have to be guessable: it could be standardised. [Phillip J. Eby] > Also, if you need access to local resources, relative to some Python > module, just grab the '__file__' attribute/variable of that module, and > then use 'os.path' functions to portably manipulate it. E.g.: > > my_dir = os.path.dirname(__file__) > target = os.path.join(os.path.join(my_dir,"images"),"stars.jpg") > > This is simple and portable. If you need something more complex, you > should probably have configuration specific to the application that > spells out what it needs to know. And that is a nice (python-specific) solution to the problem. Perhaps it's worth adding something to the Q&A about how to map URIs to files in the local file system, based on the above pythonic, i.e. module.__file__, approach? Alan. From david at sundayta.com Mon Sep 6 16:59:03 2004 From: david at sundayta.com (David Warnock) Date: Mon Sep 6 16:59:10 2004 Subject: [Web-SIG] wsgi layers Message-ID: <413C7B37.7020305@sundayta.com> Hi, Is my understanding correct in terms of layers A web browser sends requests to a WSGI enabled web server (eg mod_python under apache, or medusa or twisted) which passes them through installed WSGI middleware layers (eg session management, gzip, cookie consolidator etc) to an application hosted inside a WSGI enabled application framework (eg quixote). So the intention is that the application is written within the features of a specific WSGI enabled application framework while it can be hosted (via the way it's framework is WSGI compliant) in any WSGI server environment. If all this is so, then I am confused about which projects are currently implementing/planning to implement wsgi as servers and as application frameworks. My assumption is that the servers being pluggable don't need to be my first concern as long as there is something that can be used for testing. But the application framework is the critical one for application developers. What is the state of play here? Thanks Dave -- David Warnock, Sundayta Ltd. http://www.sundayta.com iDocSys for Document Management. VisibleResults for Fundraising. Development and Hosting of Web Applications and Sites. From paul.boddie at ementor.no Mon Sep 6 17:00:09 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 6 17:00:13 2004 Subject: [Web-SIG] Standardising containment. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB1F@100nooslmsg005.common.alpharoot.net> Alan Kennedy wrote: > > [Phillip J. Eby] > > *boggle* Why do you think that URIs have anything to do with file > > paths? In the general case, they are entirely unrelated. > > Well, perhaps it's just that pretty much every web > server/harness/framework I ever used has support for mapping URIs to > files. How silly of me to try to apply my experience of other web systems > to WSGI. Perhaps WSGI is too "low-level" for such considerations. I don't know. > In the *general* case, yes, such a mapping has no meaning. > > But there are specific cases, e.g. static file serving, where it is > required. Coming from a J2EE background, as I guess you are, there's a fairly strong tradition that resources are sort of "mounted" within the context of the application, isn't there? In other words, if my application refers to somedir/somefile, the framework will have done the necessary directory changing such that the reference translates to $CONTEXT/somedir/somefile. It actually doesn't matter what the URL is and whether you're mapping that or something else to a filename, or whether you're mapping anything to a filename at all. It could just be a nice idea to define the behaviour when some component uses a non-absolute path in order to access some resource. [__file__] > And that is a nice (python-specific) solution to the problem. > > Perhaps it's worth adding something to the Q&A about how to map URIs to > files in the local file system, based on the above pythonic, i.e. > module.__file__, approach? I've seen some strange stuff with __file__ in my time, however. Moreover, how does all this map to things like Zope where resources aren't necessarily related to the filesystem? Paul From py-web-sig at xhaus.com Mon Sep 6 17:26:08 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Sep 6 17:21:12 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CAF4@100nooslmsg005.common.alpharoot.net> References: <89DE0F3E9781C048A14DC88C06D9F93D18CAF4@100nooslmsg005.common.alpharoot.net> Message-ID: <413C8190.8090902@xhaus.com> [Alan Kennedy] >>>The other main one that springs to mind is how WSGI applications >>>discover the file-system path name that corresponds to an URI. [Ben Sizer] >>I thought that one of the major features of most of these Python web >>frameworks is that a URI doesn't map to a file but to an object or a >>function, several of which might be in one physical file. Since WSGI >>seems to be promoted as a minimal system that applies equally to >>almost any system, I'd think that such a mapping falls entirely out >>of its scope. [Paul Boddie] > It probably does for WSGI, although I wonder how such issues (and the > many others out there) can be simultaneously avoided and yet > anticipated by the specification in order to avoid incompatibilities > later on. And avoiding incompatibility is what I am trying to do. [Ben Sizer] >>I agree that it might be useful to have this functionality. I think a >>standard way to map URIs to Python files would be beneficial for Python >>web development. I just don't see it fitting into what people here >> have told me about WSGI. [Paul Boddie] > I suppose that Alan is moving slowly up the stack. I'm sorry if I appear not to be as au fait with these matters as you. I see that you've been addressing all of these problems for years with WebStack. > It's an interesting issue that existing frameworks have addressed > in their own ways (the getRealPath that Alan mentioned, Webware's > getServerSidePath, and so on), and although one can wonder whether > application data (which the image example could almost be considered > as being) should be configured within or with reference to the server > environment or not, if you consider having to specify the filenames > of resources within an application, it's much nicer to be able to > make those filenames relative to some deployment variable (eg. where > the application ends up when deployed) and to keep those resources > bundled with the application than to have to manually configure the > application to use absolute paths before/during/after deployment. > > I hope that made sense. ;-) Yes, it does make sense. To summarise: it is *sometimes* the case that static resources and the functionality that renders them are *deployed* together, i.e. in the directory structure, which can make for simplicity of deployment and administration. And as Phillip has suggested, the python module.__file__ attribute can be used to support location of such resources. Regards, Alan. From pythonTutor at venix.com Mon Sep 6 17:55:00 2004 From: pythonTutor at venix.com (Lloyd Kvam) Date: Mon Sep 6 17:55:04 2004 Subject: [Web-SIG] Making HEAD request using urllib2 module Message-ID: <1094486100.3107.26.camel@laptop.venix.com> I wrote a URL checker to verify that a website is up and responding. For this, a HEAD request rather than GET seems better. The urllib2 module provides a Request class with a get_method method. I derived my HeadRequest class overriding get_method to return HEAD if there was no POST data. Then I discovered that AbstractHTTPHandler.do_open did not use Request.get_method, but simply used GET if there was no POST data. I changed do_open to use Request.get_method and that's working for me. Should I be reporting a bug and offering a patch? Or am I missing the boat on other issues? Is anyone making changes to urllib2? Lloyd Kvam Venix Corp. 1 Court Street, Suite 378 Lebanon, NH 03766-1358 voice: 603-653-8139 fax: 320-210-3409 (changed Aug 26, 2004) From floydophone at gmail.com Mon Sep 6 18:35:13 2004 From: floydophone at gmail.com (Peter Hunt) Date: Mon Sep 6 18:35:21 2004 Subject: [Web-SIG] RE: Standardised configuration. Message-ID: <6654eac404090609351a5ebd83@mail.gmail.com> I was actually thinking about this a week ago. First of all, I think the configuration should be implemented as middleware. It will read a configuration file or resource and stick it into environ["config"]. This way, we can have pluggable middleware which could, perhaps, take their configuration from a remote server, local file, or other data source. The configuration middleware should be extensible and allow each middleware to be configured (i.e. environ["config"]["mymodule.gzip_middleware"]). Now I'm not a huge fan of XML, but I think this would work okay: Finally, I see a need for at least two different types of configuration files. One has to be a "gateway" configuration. It sets up general settings used by all applications on the server. This is analagous to an httpd.conf file. For example, this is needed so shared webhosting providers can set up generic services, such as storing sessions on their RDBMS for speed purposes. There also needs to be an "application" configuration file, for those who want to set up application-specific services, such as gzip encoding. My simple XML configuration format allows both configuration of middleware, AND picking which middleware will be installed for a request. We also have to remember that applications may not have a working directory. They might simply exist as Python functions inside of BaseHTTPServer. Thus, the _gateway_ must instantiate the configuration middleware for the gateway, AND it must instantiate the configuration middleware for the application (if it exists). i.e. mod_python would pick the gateway configuration file as the one installed in the mod_python directory, and it would pick the application configuration file as the one in the working directory of the current script. What do you think? From tony at lownds.com Mon Sep 6 18:44:56 2004 From: tony at lownds.com (tony@lownds.com) Date: Mon Sep 6 19:05:24 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> References: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> Message-ID: <54134.67.127.185.114.1094489096.squirrel@*> > [skipping stuff that Ian answered] > > At 12:47 PM 9/2/04 -0700, Bill Janssen wrote: > >>I'm not familiar with all the ins and outs of files on Python and >>Jython and IronPython, so I'll just say, reasonable enough. Though >>I'd prefer to say, a file-like object (whatever that means). > > File-like is out of scope; there were only ever two kinds of objects > intended to be returnable: > > 1) Iterables (the initial scope) > > 2) Objects that map to an operating system file descriptor, as an optional > special case to increase performance (added later per user request) > But using a file object as an iterable is going to give terrible performance, and fileno() isn't good enough for Jython and IronPython. I don't see why allowing a file-like object is unreasonable. If an application returns a file-like object, it should render the same data whether accessed through read(), or fileno(), or next(). -Tony From ianb at colorstudy.com Mon Sep 6 19:22:19 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Mon Sep 6 19:22:24 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <413C7AA1.8010702@xhaus.com> References: <413B8B7A.4090401@xhaus.com> <413B8B7A.4090401@xhaus.com> <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com> <413C7AA1.8010702@xhaus.com> Message-ID: <413C9CCB.1070005@colorstudy.com> Alan Kennedy wrote: > [Alan Kennedy] > >>> The other main one that springs to mind is how WSGI applications >>> discover the file-system path name that corresponds to an URI. > > > [Phillip J. Eby] > >> *boggle* Why do you think that URIs have anything to do with file >> paths? In the general case, they are entirely unrelated. > > > Well, perhaps it's just that pretty much every web > server/harness/framework I ever used has support for mapping URIs to > files. How silly of me to try to apply my experience of other web > systems to WSGI. I guess it depends how you're looking at it. Zope, for instance, is exactly the opposite -- files are an extension, not a native concept (with respect to URLs). Quixote and Twisted both prominently feature ways to parse the URL to find a resource, which is not a file. At some level, most frameworks allow for this kind of URL manipulation. And I would assume the same is true in Java, somehow...? At least among Python frameworks, URIs cannot generally be mapped to URLs. Of course, there is an issue -- if not a file, it would be nice to find the terminal application for a particular URL. But that's very vague, and something that WSGI does not facilitate. If we have a bunch of middleware, is there any way to say "give me the last one"? Is that even meaningful, as the middleware is not necessary pass-through? So maybe if you think you need the terminal application, it might be better to reconsider and refactor the problem. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Mon Sep 6 20:14:31 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 6 20:13:51 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <54134.67.127.185.114.1094489096.squirrel@*> References: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> At 09:44 AM 9/6/04 -0700, tony@lownds.com wrote: >But using a file object as an iterable is going to give terrible >performance, and fileno() isn't good enough for Jython and IronPython. I >don't see why allowing a file-like object is unreasonable. Because explicit is better than implicit. Returning a "file-like" object can mean, "read all the data and send it as one block", or "read the data in arbitrary-size blocks and send them". The application should say what it means! Either: return [filelike.read()] or: yield filelike.read() or: return iter(lambda: filelike.read(bufsize), '') or something else, according to the results it intends. The server shouldn't have to *guess* which of these is meant. From tony at lownds.com Mon Sep 6 20:41:43 2004 From: tony at lownds.com (tony@lownds.com) Date: Mon Sep 6 21:02:15 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> References: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>PDT."<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> Message-ID: <54588.67.127.185.114.1094496103.squirrel@*> > Because explicit is better than implicit. Returning a "file-like" object > can mean, "read all the data and send it as one block", or "read the data > in arbitrary-size blocks and send them". The application should say what > it means! Either: > > return [filelike.read()] > > or: > yield filelike.read() > > or: > return iter(lambda: filelike.read(bufsize), '') > > or something else, according to the results it intends. The server > shouldn't have to *guess* which of these is meant. > Wouldn't servers be better equipped to send a file efficiently, rather than the application? The recipe for sending decent-sized chunks instead of line-sized chunks just obliterated the fileno() optimization. I'm specifically advocating that servers be required to use read() if they can't use fileno(). When an application returns an open file object, servers that send it out line by line (ie, as an interator) would be far far slower than servers that use fileno(). So that technique wouldn't really be portable across WSGI implementations. Using read() would make returning an open file a viable technique on all WSGI servers. -Tony From py-web-sig at xhaus.com Mon Sep 6 21:10:03 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Mon Sep 6 21:05:09 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <413C9CCB.1070005@colorstudy.com> References: <413B8B7A.4090401@xhaus.com> <413B8B7A.4090401@xhaus.com> <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com> <413C7AA1.8010702@xhaus.com> <413C9CCB.1070005@colorstudy.com> Message-ID: <413CB60B.6090504@xhaus.com> [Alan Kennedy] >>>> The other main one that springs to mind is how WSGI applications >>>> discover the file-system path name that corresponds to an URI. [Phillip J. Eby] >>> *boggle* Why do you think that URIs have anything to do with file >>> paths? In the general case, they are entirely unrelated. [Alan Kennedy] >> Well, perhaps it's just that pretty much every web >> server/harness/framework I ever used has support for mapping URIs to >> files. How silly of me to try to apply my experience of other web >> systems to WSGI. [Ian Bicking] > I guess it depends how you're looking at it. Zope, for instance, is > exactly the opposite -- files are an extension, not a native concept > (with respect to URLs). Quixote and Twisted both prominently feature > ways to parse the URL to find a resource, which is not a file. At some > level, most frameworks allow for this kind of URL manipulation. And I > would assume the same is true in Java, somehow...? At least among > Python frameworks, URIs cannot generally be mapped to URLs. Just a couple of quick points. 1. I am fully aware that there is not necessarily a mapping from URLs to files. It's just that sometimes it does have a meaning, with serving static files being the obvious example, and I think we need to keep that in mind. Though perhaps it should remain a server-specific thing. Perhaps it's worth adding a note to the spec to explain why such facilities are *not* available. 2. Phillip has already proposed a pythonic solution: the python module.__file__ attribute. 3. I am *not* holding up J2EE as the be-all-and-end-all of models for web development: it has substantial problems, IMO. It's just that A: I happen to be implementing a WSGI server on J2EE at the moment, B: it is a very mature web architecture that provides a lot of useful facilities. I think WSGI should at least be informed by as many such architectures as possible, and C: I've used J2EE often enough to know reasonably well what it can and can't do. 4. J2EE does not provide particularly good facilities for incrementally mapping URL sub-components to application objects, although it does provide all the information required should one desire to do so oneself. > Of course, there is an issue -- if not a file, it would be nice to find > the terminal application for a particular URL. But that's very vague, > and something that WSGI does not facilitate. If we have a bunch of > middleware, is there any way to say "give me the last one"? Is that > even meaningful, as the middleware is not necessary pass-through? So > maybe if you think you need the terminal application, it might be better > to reconsider and refactor the problem. I'm not sure I see a direct connection between the terminal application and uri->file mapping. Another example that springs to mind is a middleware component that takes care of, say "media downgrading", i.e. removing image references for aural/tactile/textual user-agents, and replacing it with a textual/metadata equivalent. Such a component may not live at the top of the middleware stack. Quite possibly some higher up component will be generating some form of markup, which contains image references. The rendering component, further down the stack, would rewrite those references in the markup to contain whatever textual equivalent is appropriate. Now, when the downgrading component is doing it's job, simply knowing a URI reference to each image may not be enough. If it is going to transform a reference to an image, it may need to actually find, open and parse that image, in order to extract it's metadata, e.g. width, height, textual description, etc. Let's further assume that requests for the images URIs are *not* handled by a WSGI component. Let's say for example instead that URIs for such static asset files are served by the platform (e.g. Apache) directly, for (perhaps dubious, perhaps valid) performance reasons. So how does the component actually get its mitts on the physical image when it is needed? All it has is an URI for the image. It could crank up httplib, make an HTTP request to the platform for the image, and examine the returned contents. But that's significantly more expensive than asking the platform to construct a file-system pathname for the image file, based solely on its URI, and then accessing it through the filesystem. This example is perhaps overly contrived, but I'm trying to explain examples of why I think it is sometimes necessary to refer to the platform in order to find physical locations of other content served by that platform. This other content may not be under the control of WSGI applications. Either way, I think it's a good thing for us to thrash all of these issues out. It's better that we sort it out as much as possible now rather than after the WSGI PEP has been finalised. Maybe my approach has been wrong over the last few days. I've been writing to the SIG about issues that I have seen during my implementation phase. When I write about a particular issue, or feature of another language/framework, that doesn't mean that I'm demanding for such to be added to WSGI. It just means "Hey Folks, here's something that occurred to me that may need some consideration for WSGI". And judging by many of the responses to my posts, e.g. along the lines of "I see what you're saying, *but* .... ", and "Well I think it's outside the spec, but yes you're right, it would be really nice to standardise X", I seem to be identifying the boundaries of WSGI pretty well. I'm happy to be shot down by good arguments: we're all trying to achieve the same thing here: the best possible pythonic web architecture. And I'll never be too old to learn ;-) Regards, Alan. From jjl at pobox.com Mon Sep 6 23:44:59 2004 From: jjl at pobox.com (John J Lee) Date: Mon Sep 6 23:45:04 2004 Subject: [Web-SIG] Making HEAD request using urllib2 module In-Reply-To: <1094486100.3107.26.camel@laptop.venix.com> References: <1094486100.3107.26.camel@laptop.venix.com> Message-ID: On Mon, 6 Sep 2004, Lloyd Kvam wrote: > I wrote a URL checker to verify that a website is up and responding. > For this, a HEAD request rather than GET seems better. The urllib2 > module provides a Request class with a get_method method. I derived my > HeadRequest class overriding get_method to return HEAD if there was no > POST data. Then I discovered that AbstractHTTPHandler.do_open did not > use Request.get_method, but simply used GET if there was no POST data. > > I changed do_open to use Request.get_method and that's working for me. > Should I be reporting a bug and offering a patch? Or am I missing the > boat on other issues? Don't think it's a bug -- it's simply not implemented. But do go ahead and upload it as a patch to the SF patch tracker! I don't like the idea of a subclass just for HEAD requests (but then I don't much like the Request class at all). How about an additional optional arg to the Request constructor, named 'method', instead? > Is anyone making changes to urllib2? Yes, me. John From pje at telecommunity.com Tue Sep 7 02:29:32 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Sep 7 02:28:49 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <54588.67.127.185.114.1094496103.squirrel@*> References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> At 11:41 AM 9/6/04 -0700, tony@lownds.com wrote: >I'm specifically advocating that servers be required to use read() if they >can't use fileno(). But with what block size? If the block size is the whole file, why not just use: return [filelike.read()] If it's some other block size, why not be explicit? > When an application returns an open file object, >servers that send it out line by line (ie, as an interator) would be far >far slower than servers that use fileno(). So that technique wouldn't >really be portable across WSGI implementations. Using read() would make >returning an open file a viable technique on all WSGI servers. Okay, you've convinced me: the fileno() optimization (as it's currently specified) needs to be removed, and I need to strip out all mention of returning files from the application. (Except maybe to mention that it's a bad idea!) Instead of using 'fileno' as an extension attribute on the iterable, we'll add a 'wsgi.file_wrapper' key, usable as follows by an application: return environ['wsgi.file_wrapper'](something,blksize) The 'file_wrapper' may introspect "something" in order to do a fileno() check, or other "I know how to send this kind of object quickly" optimizations. It must return an iterable, that the application may return back to the server. The server *must not* assume that the application *will* return the iterable; it is perfectly legal to do something like this: an_iter = environ['wsgi.file_wrapper'](something,blksize) for block in an_iter: yield block.replace('\n', '\r\n') In this case, the application iterates over the file, but the original iterator's contents are not yielded. In the same way, middleware may transform or ignore data yielded by the iterator. So, in effect 'file_wrapper' should just wrap the original file-like object in an iterator that the server can recognize and perform an optimization on, in the event that it *actually* is returned by the application. Here's the simplest possible conforming implementation of 'file_wrapper', that works for any modern (1.5.2+) Python: class file_wrapper: def __init__(self,readable,blocksize=8192): self.readable, self.blocksize = readable, blocksize self.close = readable.close def __getitem__(self,index): data = self.readable.read(self.blocksize) if data: return data raise IndexError environ['wsgi.file_wrapper'] = file_wrapper result = application(environ, start_response) if isinstance(result, file_wrapper): # check result.readable for fileno() or other optimizations else: # do normal iteration over 'result' Unfortunately, this is a lot more boilerplate than I'd like to impose on server authors. But, if we don't, then the same boilerplate is effectively imposed on all application/framework/middleware authors who want to return file-like objects. The other hassle here is going to be adjusting the PEP's presentation sequence so that this complication doesn't obscure the simplicity of the "CGI Gateway" example. :( The other alternative is to check for a 'read()' method as an alternative to iterability, but it leaves open the question of appropriate block size. I suppose we could say that this is up to the server. But, no matter how the introspection works, it's going to work strongly against the appearance of simplicity in the examples. :( From ianb at colorstudy.com Tue Sep 7 02:31:52 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Tue Sep 7 02:31:57 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <413CB60B.6090504@xhaus.com> References: <413B8B7A.4090401@xhaus.com> <413B8B7A.4090401@xhaus.com> <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com> <413C7AA1.8010702@xhaus.com> <413C9CCB.1070005@colorstudy.com> <413CB60B.6090504@xhaus.com> Message-ID: <413D0178.4010406@colorstudy.com> Alan Kennedy wrote: > > Of course, there is an issue -- if not a file, it would be nice to find > > the terminal application for a particular URL. But that's very vague, > > and something that WSGI does not facilitate. If we have a bunch of > > middleware, is there any way to say "give me the last one"? Is that > > even meaningful, as the middleware is not necessary pass-through? So > > maybe if you think you need the terminal application, it might be better > > to reconsider and refactor the problem. > > I'm not sure I see a direct connection between the terminal application > and uri->file mapping. > > Another example that springs to mind is a middleware component that > takes care of, say "media downgrading", i.e. removing image references > for aural/tactile/textual user-agents, and replacing it with a > textual/metadata equivalent. > > Such a component may not live at the top of the middleware stack. Quite > possibly some higher up component will be generating some form of > markup, which contains image references. The rendering component, > further down the stack, would rewrite those references in the markup to > contain whatever textual equivalent is appropriate. > > Now, when the downgrading component is doing it's job, simply knowing a > URI reference to each image may not be enough. If it is going to > transform a reference to an image, it may need to actually find, open > and parse that image, in order to extract it's metadata, e.g. width, > height, textual description, etc. This is reasonable. My initial suggestion would be to create an artificial request; creating a new environ and re-calling the application, fetching the object at that location. Then, if it is a file object you can find it on disk (file objects have some attribute, I forget what), or if not you can read the data in and find its width and such. But that might not work... > Let's further assume that requests for the images URIs are *not* handled > by a WSGI component. Let's say for example instead that URIs for such > static asset files are served by the platform (e.g. Apache) directly, > for (perhaps dubious, perhaps valid) performance reasons. Obviously, this is much more complex, as the middleware can't call its application, since the application doesn't actually have access to the object, rather some parent server handles the object. If you wanted to do the same sort of recursive request, a server could provide an extension to allow this. Presumably you would get back another iterable, which may be a file object, which would contain the necessary information. But, in both cases, there's a limit to what you can do -- you only get access to the public information stored in that particular image. Maybe there's text files alongside the image, which mean that you need access to the filename. E.g., image.jpg and image.jpg.desc, in the same directory. If you get back the original file object, you can do this -- but it seems likely in many circumstances that you won't get back the file object at all, you'll get some wrapped version, and you won't be able to find the filename. This is also where it would be nice if the response had more structure (or at least potential for structure) than what we currently have in WSGI. If there were an (optional) attribute .fileobj (or something) wrappers could use this to expose the underlying file object, useful when you want to do this kind of server introspection. It's not impossible that the application iterator could have these methods, but it's not an extension that WSGI really talks about. Maybe it should. Another extension that a server could implement is a URL resolver; if the server actually resolved URLs, to applications/resources/files, then it might expose this. But as an extension it's not uniform, and I don't think it could be very uniform. But I think there's a genuine need there, as I encounter things like this myself. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From tony at lownds.com Tue Sep 7 05:21:23 2004 From: tony at lownds.com (tony@lownds.com) Date: Tue Sep 7 05:41:52 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> Message-ID: <55542.67.127.185.114.1094527283.squirrel@*> > The other alternative is to check for a 'read()' method as an alternative > to iterability, but it leaves open the question of appropriate block size. I suppose we could say that this is up to the server. > Yes, much simpler, and servers can come up with a good block size. > But, no matter how the introspection works, it's going to work strongly against the appearance of simplicity in the examples. :( > > Here's the tail end of the CGI example. result = application(environ, start_response) try: if hasattr(result, 'read'): result = iter(lambda: result.read(BLOCKSIZE), '') for data in result: write(data) finally: if hasattr(result,'close'): result.close() -Tony From paul.boddie at ementor.no Tue Sep 7 12:45:07 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Tue Sep 7 12:45:12 2004 Subject: [Web-SIG] Standardising containment. Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CC2B@100nooslmsg005.common.alpharoot.net> Alan Kennedy wrote: > > Perhaps it's worth adding something to the Q&A about how to map URIs to > files in the local file system, based on the above pythonic, i.e. > module.__file__, approach? Another note about this here: http://www.google.com/groups?selm=ur7pfb6uz.fsf%40fitlinxx.com I've been most interested in having applications represented by modules or packages which get imported by various adapters, but in schemes where applications are just executed programs, you might run into an issue with __file__ and older Python releases. Paul P.S. Although __file__ is supposedly Pythonic, it's quite possible that the resources associated with an application don't always reside in an easily discoverable location relative to the application's modules - ie. they get installed in some opaquely-named directory which might vary with the framework being used, even it is located relative to those modules in the filesystem. Perhaps an explicit resource path (or context path) needs defining somewhere. From py-web-sig at xhaus.com Tue Sep 7 13:06:20 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Tue Sep 7 13:01:22 2004 Subject: [Web-SIG] wsgi layers In-Reply-To: <413C7B37.7020305@sundayta.com> References: <413C7B37.7020305@sundayta.com> Message-ID: <413D962C.3020906@xhaus.com> [David Warnock] > Is my understanding correct in terms of layers > > A web browser sends requests to a WSGI enabled web server (eg > mod_python under apache, or medusa or twisted) which passes them > through installed WSGI middleware layers (eg session management, > gzip, cookie consolidator etc) to an application hosted inside > a WSGI enabled application framework (eg quixote). I understand it differently, though perhaps wrongly. I see that the request arrives to a web server, which is either a pure python server, or a native-code server with a python interpreter and a very thin WSGI adapter. This transforms the request into a WSGI compatible request, and calls a single python application callable with it, as specified under WSGI. Possible server+adapter combinations would be Apache + mod_python_wsgi AnyServer + CGI + wsgi.py SimpleHttpServerWSGI + Very little Tomcat + modjy Factored-out Medusa request dispatcher PyWx+, FastCGI+, SCGI+, etc, etc, etc. The single python application callable, I see as *being* the python framework, e.g. WebWare. So all those server+adapter combinations listed above basically become the bootstrap process by which HTTP requests are fed to WSGI frameworks. Hence, a fully-refactored-for-WSGI WebWare would then be portable to all of the above server+adapter combinations (python 2.2+ accepted). The WSGIWebWare application would then be responsible for driving the request through a stack (more likely a tree) of middleware components, based on its configuration. So I suppose that I see middleware stacks/trees as the generic class of python frameworks, and individual frameworks as instances of that class, each with their own specific mechanisms for specifying configuration of the stack/tree of middleware components. To me, the portability of middleware would be ideally between frameworks. For example, I could take the WebWare session management middleware component and plug it into a Snakelets middleware stack. Or more appropriately: don't make the Snakelets guy have to bend his brain about session management and all of its horrors: just borrow and reuse an existing quality and field-tested component. So, when I write about middleware portability, this is what I mean, although that seems to conflict with your picture of middleware happening outside the framework. The difference between your picture and mine is that I don't see where the middleware configuration happens in your processing model, i.e. how is the stack of middleware components before the framework configured? In the case of twisted or zope, I have to say that I'm not familiar enough with the structure of either to know how exactly they would fit in. But I know that an asynchronous WSGI server could be fairly easily put together simply using asyncore. In this case, the application callable could then be a simple dispatcher that sends WSGI requests down queues into processing objects in other threads (which have been created by the application callable at initialization time). The other-thread objects receiving those requests from the queues could themselves drive the requests through a stack of WSGI middleware. So the queues down which requests are sent would simply be a mechanism for extending middleware trees/stacks across thread boundaries (and potentially processor boundaries in jython and ironpython). > So the intention is that the application is written within the > features of a specific WSGI enabled application framework while > it can be hosted (via the way it's framework is WSGI compliant) > in any WSGI server environment. > > If all this is so, then I am confused about which projects are > currently implementing/planning to implement wsgi as servers and > as application frameworks. My assumption is that the servers being > pluggable don't need to be my first concern as long as there is > something that can be used for testing. But the application > framework is the critical one for application developers. What > is the state of play here? The above is my outline view of the topic. I think it would be great if we could standardize on some terminology to be discussing these matters. I found myself considering replacing the word "framework" with "WebWare-like" up above, because the "f-word" is potentially inappropriately used. From the middleware components point of view, the framework is the WSGI server. From the server point of view, e.g. mod_python, the framework is the WSGI application. Regards, Alan. From pythonTutor at venix.com Tue Sep 7 14:50:00 2004 From: pythonTutor at venix.com (Lloyd Kvam) Date: Tue Sep 7 14:50:27 2004 Subject: [Web-SIG] Making HEAD request using urllib2 module In-Reply-To: References: <1094486100.3107.26.camel@laptop.venix.com> Message-ID: <1094561400.4811.9.camel@laptop.venix.com> Thanks for the response. On Mon, 2004-09-06 at 17:44, John J Lee wrote: > On Mon, 6 Sep 2004, Lloyd Kvam wrote: > > > I wrote a URL checker to verify that a website is up and responding. > > For this, a HEAD request rather than GET seems better. The urllib2 > > module provides a Request class with a get_method method. I derived my > > HeadRequest class overriding get_method to return HEAD if there was no > > POST data. Then I discovered that AbstractHTTPHandler.do_open did not > > use Request.get_method, but simply used GET if there was no POST data. > > > > I changed do_open to use Request.get_method and that's working for me. > > Should I be reporting a bug and offering a patch? Or am I missing the > > boat on other issues? > > Don't think it's a bug -- it's simply not implemented. But do go ahead > and upload it as a patch to the SF patch tracker! > > I don't like the idea of a subclass just for HEAD requests (but then I > don't much like the Request class at all). How about an additional > optional arg to the Request constructor, named 'method', instead? I started down the subclass path on the assumption that overriding get_method was all that was necessary and I could avoid changing the urllib2 module. > > > > Is anyone making changes to urllib2? > > Yes, me. > > > John > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/pythontutor%40venix.com -- Lloyd Kvam Venix Corp From janssen at parc.com Tue Sep 7 20:31:16 2004 From: janssen at parc.com (Bill Janssen) Date: Tue Sep 7 20:32:38 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: Your message of "Mon, 06 Sep 2004 17:29:32 PDT." <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> Message-ID: <04Sep7.113120pdt."58613"@synergy1.parc.xerox.com> > But, no matter how the introspection works, it's going to work strongly > against the appearance of simplicity in the examples. :( Luckily, the WSGI spec is for server and framework implementors, who are used to a lack of simplicity :-). Bill From py-web-sig at xhaus.com Tue Sep 7 21:12:27 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Tue Sep 7 21:07:27 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> Message-ID: <413E081B.70609@xhaus.com> [Phillip J. Eby] > Instead of using 'fileno' as an extension attribute on the iterable, > we'll add a 'wsgi.file_wrapper' key, usable as follows by an > application: > > return environ['wsgi.file_wrapper'](something,blksize) > > The 'file_wrapper' may introspect "something" in order to do a > fileno() check, or other "I know how to send this kind of object > quickly" optimizations. It must return an iterable, that the > application may return back to the server. [tony@lownds.com] > Here's the tail end of the CGI example. > > result = application(environ, start_response) > try: > if hasattr(result, 'read'): > result = iter(lambda: result.read(BLOCKSIZE), '') > for data in result: > write(data) > finally: > if hasattr(result,'close'): > result.close() Since I am just about to implement "wsgi.file_wrapper", I just wanted to check that my understanding of it is correct. I think Tony's example above is not correct: the hasattr(result, 'read') should not be necessary, since the 'file_wrapper' class should implement its own iterator? I think it should read simply result = application(environ, start_response) try: for data in result: write(data) finally: if hasattr(result,'close'): result.close() Only the application has to change in this case, to return any file like object, wrapped in a 'file_wrapper'? Is this correct? Regards, Alan. From pje at telecommunity.com Tue Sep 7 21:12:04 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Sep 7 21:11:17 2004 Subject: [Web-SIG] Standardising containment. In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CC2B@100nooslmsg005.comm on.alpharoot.net> Message-ID: <5.1.1.6.0.20040907150800.028698c0@mail.telecommunity.com> At 12:45 PM 9/7/04 +0200, Paul Boddie wrote: >P.S. Although __file__ is supposedly Pythonic, it's quite possible that >the >resources associated with an application don't always reside in an >easily >discoverable location relative to the application's modules - ie. they >get >installed in some opaquely-named directory which might vary with the >framework being used, even it is located relative to those modules in >the >filesystem. Perhaps an explicit resource path (or context path) needs >defining somewhere. Note that existing frameworks and applications already have lots of ways to handle this. For example, applications like Roundup, MoinMoin, and Pyblosxom either have prescribed layouts or use configuration files that indicate where things are. Thus, this is an area where adding a facility to WSGI is just creating "choice N+1" instead of actually reducing unnecessary choice. From tony at lownds.com Tue Sep 7 22:16:50 2004 From: tony at lownds.com (tony@lownds.com) Date: Tue Sep 7 22:37:27 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <413E081B.70609@xhaus.com> References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <413E081B.70609@xhaus.com> Message-ID: <64786.204.162.121.54.1094588210.squirrel@*> > [Phillip J. Eby] > > Instead of using 'fileno' as an extension attribute on the iterable, > > we'll add a 'wsgi.file_wrapper' key, usable as follows by an > > application: > > > > return environ['wsgi.file_wrapper'](something,blksize) > > > > The 'file_wrapper' may introspect "something" in order to do a > > fileno() check, or other "I know how to send this kind of object > > quickly" optimizations. It must return an iterable, that the > > application may return back to the server. > > [tony@lownds.com] > > Here's the tail end of the CGI example. > > > > result = application(environ, start_response) > > try: > > if hasattr(result, 'read'): > > result = iter(lambda: result.read(BLOCKSIZE), '') > > for data in result: > > write(data) > > finally: > > if hasattr(result,'close'): > > result.close() > > Since I am just about to implement "wsgi.file_wrapper", I just wanted to > check that my understanding of it is correct. > > I think Tony's example above is not correct: the hasattr(result, 'read') > should not be necessary, since the 'file_wrapper' class should implement > its own iterator? My change is not correct, wrt using a file_wrapper. I was showing the change needed for WSGI server to simply use a file-like object. Sorry for any confusion. Which do you think is better? That servers should understand file objects as return values, or that applications should be careful to wrap files? -Tony From py-web-sig at xhaus.com Tue Sep 7 23:35:05 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Tue Sep 7 23:30:04 2004 Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 In-Reply-To: <64786.204.162.121.54.1094588210.squirrel@*> References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <413E081B.70609@xhaus.com> <64786.204.162.121.54.1094588210.squirrel@*> Message-ID: <413E2989.3030100@xhaus.com> [tony@lownds.com] > Which do you think is better? That servers should understand file objects > as return values, or that applications should be careful to wrap files? I really like the wsgi.file_wrapper solution, because it is neither of the above. I see it as the server telling the application how files should be wrapped, but in a platform independent way. I think that Phillip's posted definition of the FileWrapper class should be included in the spec, as an example of what is expected. Many server authors can just drop that standard FileWrapper definition into their code, and all will be well. Although the definition of the file_wrapper may need to vary between servers, the overhead is not large. And any server author who really needs to get fancy with file_wrapper's will probably have a very good idea of what they are doing anyway. From the efficiency point of view, it is important to note that the server is free to implement the FileWrapper class in whatever way it sees fit, e.g. ignoring the buffer size parameter, or supplying it's own optimal default value for the parameter, etc, etc. Phillip, am I off-base by requesting that there be a 'pathname' attribute on file_wrapper instances? Fair enough if the file_wrapper gets hidden by some component of the middleware stack: in that case the pathname loses its meaning anyway because the component has obviously transformed the content of the file in some way. In cases where the file_wrapper does not wrap an OS file, e.g. sockets, pipes, etc, the pathname could be set/defaulted to None. One use case for this is, for example, a page templating middleware component. While parsing the text of a page template (wrapped in a file_wrapper) passed down from higher up the stack, it could use the pathname as a starting point to resolve relative pathnames in the page template source, e.g. include files, etc. Though it could perhaps be argued that the higher-up component should be responsible for resolving such relative references, because it is the component which actually knows where the template file came from? Regards, Alan. From pje at telecommunity.com Wed Sep 8 00:55:08 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 8 00:54:24 2004 Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's comments on WSGI draft 1.4) In-Reply-To: <413E081B.70609@xhaus.com> References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> At 08:12 PM 9/7/04 +0100, Alan Kennedy wrote: >[Phillip J. Eby] > > Instead of using 'fileno' as an extension attribute on the iterable, > > we'll add a 'wsgi.file_wrapper' key, usable as follows by an > > application: > > > > return environ['wsgi.file_wrapper'](something,blksize) > > > > The 'file_wrapper' may introspect "something" in order to do a > > fileno() check, or other "I know how to send this kind of object > > quickly" optimizations. It must return an iterable, that the > > application may return back to the server. > >[tony@lownds.com] > > Here's the tail end of the CGI example. > > > > result = application(environ, start_response) > > try: > > if hasattr(result, 'read'): > > result = iter(lambda: result.read(BLOCKSIZE), '') > > for data in result: > > write(data) > > finally: > > if hasattr(result,'close'): > > result.close() > >Since I am just about to implement "wsgi.file_wrapper", I just wanted to >check that my understanding of it is correct. Before you implement it, I should warn you that I'm thinking 'file_wrapper' was a bad idea, and that there's a better way to do all this. As I understand them, the current use cases for file-like objects are: 1. sendfile(fileno()) for fast file-descriptor copying (Unix-like OSes only, and only single-thread synchronous servers like Apache 1.x or CGI) 2. Convenience in returning an open file or pipe 3. Convenience in returning a StringIO or other "file-like" object By the way, as far as I know, none of these use cases are especially common in today's existing web frameworks. Anyway, use cases 2 and 3 can be grouped into cases where the object is "large", "small", or "pipe-like": "Small" case: return [filelike.read()] "Large" case: return iter(lambda: filelike.read(SIZE), '') "Pipe-like" case: return iter(filelike.read, '') These are all very simple, one-line solutions (at least for 2.2+) and have the advantage of being explicit, and refusing the temptation to guess. The application is in total control of how the resource will be transmitted. That leaves only use case 1, which is a fairly limited use case and isn't even applicable to most web servers written in Python, as most such servers are asynchronous and can't take advantage of the 'sendfile()' system call (which Python doesn't expose as an 'os' facility anyway). Therefore, my current thinking is to relegate use case 1 to a WSGI extension, 'wsgi.fd_wrapper', which can used like this (if the application is returning an object with a working 'fileno()' method): if 'wsgi.fd_wrapper' in environ: return environ['wsgi.sendfile'](fd.fileno()) else: # return a normal iterable In other words, 'wsgi.fd_wrapper' would be sort of like my earlier 'wsgi.file_wrapper', but it would be *optional* to implement and use. (Meaning it can be relegated to an application note, instead of having to be introduced in-line.) For Alan's attempt to support Jython 2.1, he could write an 'iter' function or class and put it in __builtin__, so that programs written to this idiom would still work. After thinking about the 'file_wrapper' idea some more, I'm thinking that this way works better for everything but the issue of closing files. However, my example 'file_wrapper' class should maybe be included in the PEP under an application note about sending files and file-like objects. From jjl at pobox.com Wed Sep 8 11:09:46 2004 From: jjl at pobox.com (John J Lee) Date: Wed Sep 8 11:06:12 2004 Subject: [Web-SIG] Making HEAD request using urllib2 module In-Reply-To: <1094561400.4811.9.camel@laptop.venix.com> References: <1094486100.3107.26.camel@laptop.venix.com> <1094561400.4811.9.camel@laptop.venix.com> Message-ID: On Tue, 7 Sep 2004, Lloyd Kvam wrote: > On Mon, 2004-09-06 at 17:44, John J Lee wrote: [...] > > I don't like the idea of a subclass just for HEAD requests (but then I > > don't much like the Request class at all). How about an additional > > optional arg to the Request constructor, named 'method', instead? > > I started down the subclass path on the assumption that overriding > get_method was all that was necessary and I could avoid changing the > urllib2 module. [...] Right. So, are you going to upload a modified version, then? :-) John From py-web-sig at xhaus.com Wed Sep 8 13:56:54 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 8 13:51:53 2004 Subject: [Web-SIG] Modjy status. Message-ID: <413EF386.3060407@xhaus.com> Dear Sig, I just wanted to quickly let ye know the current status of modjy: my j2ee implementation of a WSGI server. Since I reduced the amount of java used to a minimum, and rewrote most of the code in jython, things have gone much quicker. Basically, about 95% of the code 75% of the documentation 50% of the testing is now complete. On the code front, I have to add one or two more extra features, but modjy already does pretty much all that it needs to. On the testing side, I'm currently only focussed on testing the WSGI compliance: I've not put much effort into testing the "server" as a whole, because it is likely to change shape significantly over time. However, I will endeavour to test modjy as much as possible. Lastly: documentation. I had originally said in this forum that I would publish modjy last weekend, no matter what state it was in. But I couldn't bring myself to publish it without some decent documentation to support it: users would find it hard to work with, and just be confused and disappointed. I've written most of configuration, installation, etc, documentation. I still have to work on documenting the WSGI compliance, and a few other bits and pieces. For the next few days, other work has to take higher priority than modjy. But I will get back to it at the weekend. Presuming that I can get all of the above finished on Sunday, I'll hopefully be releasing it on Sunday evening. Just wanted to keep y'all informed. Kind regards, Alan. From exarkun at divmod.com Wed Sep 8 15:28:29 2004 From: exarkun at divmod.com (Jp Calderone) Date: Wed Sep 8 15:28:33 2004 Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's comments on WSGI draft 1.4) In-Reply-To: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> Message-ID: <413F08FD.2090805@divmod.com> Phillip J. Eby wrote: > [snip] > > Before you implement it, I should warn you that I'm thinking > 'file_wrapper' was a bad idea, and that there's a better way to do all > this. > > As I understand them, the current use cases for file-like objects are: > > 1. sendfile(fileno()) for fast file-descriptor copying (Unix-like OSes > only, and only single-thread synchronous servers like Apache 1.x or CGI) FWIW, there's a non-zero probability Twisted will support this at some point in the future. A (horrible, hackish) proof of concept already exists. Jp From py-web-sig at xhaus.com Wed Sep 8 16:25:12 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 8 16:21:17 2004 Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's comments on WSGI draft 1.4) In-Reply-To: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> Message-ID: <413F1648.3040303@xhaus.com> [Phillip J. Eby] >>> Instead of using 'fileno' as an extension attribute on the iterable, >>> we'll add a 'wsgi.file_wrapper' key, usable as follows by an >>> application: >>> >>> return environ['wsgi.file_wrapper'](something,blksize) >>> >>> The 'file_wrapper' may introspect "something" in order to do a >>> fileno() check, or other "I know how to send this kind of object >>> quickly" optimizations. It must return an iterable, that the >>> application may return back to the server. and > [...] I should warn you that I'm thinking > 'file_wrapper' was a bad idea, and that there's a better way to do all > this. > > As I understand them, the current use cases for file-like objects are: > > 1. sendfile(fileno()) for fast file-descriptor copying (Unix-like > OSes only, and only single-thread synchronous servers like Apache 1.x > or CGI) Well, I see sendfile functionality as being much more than widespread than that. Java.nio, for example, has excellent support for fast "channel transfers" between file channels and other writable channel types. http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/FileChannel.html This support goes right down to the level of allocating "direct buffers", which use DMA to bypass the CPU when transferring the bytestream to/from the destination channel. On OSes where such DMA facilities are not supported, the exact same code still works, but just isn't as fast. For an excellent discussion of how these facilities work in java.nio, and more importantly why they work and are high performance, I recommend Ron Hitchens comprehensive book "Java NIO" http://www.oreilly.com/catalog/javanio/ And I'd be surprised if the .Net CLR doesn't soon develop such functionality, if it isn't already supported. > [other cases snipped] > > These are all very simple, one-line solutions (at least for 2.2+) and > have the advantage of being explicit, and refusing the temptation to > guess. The application is in total control of how the resource will > be transmitted. Well, I suppose the key question here is "should the application be in total control of how the resource is transmitted"? Can we rely on all WSGI applications behaving correctly across all server platforms? Should the server not have some say in how the resource can be optimally tansmitted, in its environment? > That leaves only use case 1, which is a fairly limited use case and > isn't even applicable to most web servers written in Python, as most > such servers are asynchronous and can't take advantage of the > 'sendfile()' system call (which Python doesn't expose as an 'os' > facility anyway). A pity that cpython doesn't implement sendfile as an native C method that is layered on top of a native OS implementation if available, or a generic C implementation if not. The current lack of the call means that people tend to implement their own sendfile in pure python, meaning that they end up acquiring and releasing the GIL between every chunk sent. Also, I don't think we should restrict ourselves to thinking solely in terms of single-threaded asynchronous architectures. When I think about asynchronous, high-performance and high-throughput server architectures, I tend to think in terms of hybrid asynchronous/threaded architectures, of the type described by Welsh et al. in the excellent and readable 14-page overview paper "A Design Framework for Highly Concurrent Systems" (highly recommended reading, for those who might be interested) http://www.eecs.harvard.edu/~mdw/papers/events.pdf More details on Welsh's work can be obtained from his publications page. http://www.eecs.harvard.edu/~mdw/pubs.html Welsh describes the use of thread-pools of a fixed "width" to service particular request types, with requests shunted between those (otherwise isolated) thread pools using queues. For example, if the server hardware is capable of processing 50 disk requests simultaneously, then the "width" of the thread pool serving resources from disk should be 50: any more is a waste, any less will underperform the theoretical maximum. It is important to note that those 50 threads would be threads which continually block while waiting for disk read completions. When the disk I/O has completed, they could either "sendfile" the data back to the client, or more likely pass it onto a dedicated thread-pool that does nothing but transfer disk byte streams to client sockets. Meaning that that they need some way to record/represent the fact that the bytestream is coming from a file. This file->socket transfer could also conceivably be done by a single thread, which continually watches the readiness status of large sets of both socket and file channels/descriptors, and transferring blocks between them as appropriate. And "blocks" is the key word here. Data comes from disks in fixed size chunks, the size of which are optimised for maximum throughput at all levels of the OS. Many modern operating systems come with specialised high-performance support for transferring data from one channel/descriptor to another. Such support can radically increase throughput on a server. So I suppose my real concern is that by relegating disk-originating byte streams to being second-class citizens under WSGI, we might hinder the portability of some highly-desirable server architectural approaches. > Therefore, my current thinking is to relegate use case 1 to a WSGI > extension, 'wsgi.fd_wrapper', which can used like this (if the > application is returning an object with a working 'fileno()' method): > > if 'wsgi.fd_wrapper' in environ: > return environ['wsgi.sendfile'](fd.fileno()) > else: > # return a normal iterable > > In other words, 'wsgi.fd_wrapper' would be sort of like my earlier > 'wsgi.file_wrapper', but it would be *optional* to implement and use. > (Meaning it can be relegated to an application note, instead of having > to be introduced in-line.) Well, I suppose that that makes sense too. After all, all of this talk of "highly-concurrent" architectures doesn't really apply to Apache + CGI/WSGI, for example. > For Alan's attempt to support Jython 2.1, he could write an 'iter' > function or class and put it in __builtin__, so that programs written > to this idiom would still work. > > After thinking about the 'file_wrapper' idea some more, I'm thinking > that this way works better for everything but the issue of closing > files. However, my example 'file_wrapper' class should maybe be > included in the PEP under an application note about sending files and > file-like objects. Perhaps a "finalise" method might be appropriate? Just thinking through some scenarios here: What happens if the server is just about to start serving a multi-megabyte PDF file back to a client socket, and then the client closes the socket, i.e. the user cancelled their request. What should the server do in that case? Should it continue to iterate through the iterable right until the end, discarding the results? Or should it just drop the iterable on the floor, to be sorted out by GC (and thus potentially wasting file-descriptors)? Or should it attempt to finalise the iterable, so that all related resource is freed? Does these considerations also apply when the bytestream being transferred is not "physical", i.e. coming from a file-descriptor/channel. What if the bytestream is coming from an iterable yielding several megabytes of python strings, from a page rendering component, for example. How does the server tell the application to stop, because the client is no longer interested? Does it simply drop the iterable on the floor and forget about it? Might the application have a need to know that the client aborted the request, for example in E-commerce scenarios? If the application did need to know, how could the server inform the application? Kind regards, Alan. From neel at mediapulse.com Wed Sep 8 17:12:20 2004 From: neel at mediapulse.com (Michael C. Neel) Date: Wed Sep 8 17:24:21 2004 Subject: Matt Welsh WAS: Re: [Web-SIG] Use cases for file-like objects (was Re: Bill's comments on WSGI draft 1.4) In-Reply-To: <413F1648.3040303@xhaus.com> References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <413F1648.3040303@xhaus.com> Message-ID: <1094656340.29095.5.camel@mike.mediapulse.com> > Also, I don't think we should restrict ourselves to thinking solely in > terms of single-threaded asynchronous architectures. When I think about > asynchronous, high-performance and high-throughput server architectures, > I tend to think in terms of hybrid asynchronous/threaded architectures, > of the type described by Welsh et al. in the excellent and readable > 14-page overview paper "A Design Framework for Highly Concurrent > Systems" (highly recommended reading, for those who might be interested) > > http://www.eecs.harvard.edu/~mdw/papers/events.pdf > > More details on Welsh's work can be obtained from his publications page. > > http://www.eecs.harvard.edu/~mdw/pubs.html I just grabbed this and I've only started it, but this looks to be a very interesting read, thank you for the reference. I also agree on it's relevence to WSGI, and encourage otheres on this list to take a moment and read it as well. Mike From pje at telecommunity.com Wed Sep 8 17:55:20 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 8 17:54:38 2004 Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's comments on WSGI draft 1.4) In-Reply-To: <413F1648.3040303@xhaus.com> References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com> At 03:25 PM 9/8/04 +0100, Alan Kennedy wrote: >Well, I see sendfile functionality as being much more than widespread than >that. Java.nio, for example, has excellent support for fast "channel >transfers" between file channels and other writable channel types. Well, I see a few options here, then. We can use 'wsgi.file_wrapper' to wrap Python 'file' objects, allowing each platform to dig into the file object and get at the file descriptor, nio, or what-have-you in a platform specific way. As long as it remains an optional extension, I'm fine with that. Another option is to have separate 'wsgi.nio_wrapper', 'wsgi.fd_wrapper', and so on, for different physical backend types. > > [other cases snipped] > > > > These are all very simple, one-line solutions (at least for 2.2+) and > > have the advantage of being explicit, and refusing the temptation to > > guess. The application is in total control of how the resource will > > be transmitted. > >Well, I suppose the key question here is "should the application be in >total control of how the resource is transmitted"? Yes, because of the need for backward compatibility. I realize that most people discussing WSGI here on the Web-SIG seem more interested in new applications than old, but backward compatibility is critical and that means apps must have control that's comparable to what they have today. >Welsh describes the use of thread-pools of a fixed "width" to service >particular request types, with requests shunted between those (otherwise >isolated) thread pools using queues. The description you use here sounds exactly like typical Python async servers today: they have fixed-size threadpools for running "application" code, and another fixed size thread pool (width=1) for I/O. >So I suppose my real concern is that by relegating disk-originating byte >streams to being second-class citizens under WSGI, we might hinder the >portability of some highly-desirable server architectural approaches. We're not; we're simply requiring that any functionality more sophisticated than an iterable be treated as an optional extension, that the application has to check for and opt to use. The application developer is motivated to do this because of the promise of extra performance when run on platforms that support the boost. But middleware developers don't have to think about it because they always have access to the data in iterable form. > > After thinking about the 'file_wrapper' idea some more, I'm thinking > > that this way works better for everything but the issue of closing > > files. However, my example 'file_wrapper' class should maybe be > > included in the PEP under an application note about sending files and > > file-like objects. > >Perhaps a "finalise" method might be appropriate? > >Just thinking through some scenarios here: > >What happens if the server is just about to start serving a multi-megabyte >PDF file back to a client socket, and then the client closes the socket, >i.e. the user cancelled their request. What should the server do in that >case? Should it continue to iterate through the iterable right until the >end, discarding the results? Or should it just drop the iterable on the >floor, to be sorted out by GC (and thus potentially wasting >file-descriptors)? Or should it attempt to finalise the iterable, so that >all related resource is freed? The current spec requires that the iterable's 'close()' method be called at the termination of the request, whether the iterator was exhausted or not. So, the server is free to cancel iteration when a client connection is lost. >Does these considerations also apply when the bytestream being transferred >is not "physical", i.e. coming from a file-descriptor/channel. What if the >bytestream is coming from an iterable yielding several megabytes of python >strings, from a page rendering component, for example. How does the server >tell the application to stop, because the client is no longer interested? >Does it simply drop the iterable on the floor and forget about it? > >Might the application have a need to know that the client aborted the >request, for example in E-commerce scenarios? If the application did need >to know, how could the server inform the application? By calling 'close()' on the iterable, as the spec requires. Until PEP 325 is implemented, though, generators have to be wrapped in a custom iterable in order to support this functionality, e.g.: class MyApp: def __init__(self,environ,start_response): # setup code here def __iter__(self): # generator yielding results def close(self): # cleanup code here There are of course other ways to do the same basic thing, such as my file_wrapper example class. But, once PEP 325 is implemented, you'll be able to use try/finally in the generator body, and the finally block will be executed when close() is called or the generator is garbage collected. (PEP 325 was written by Samuele Pedroni, so I assume he intends to implement it in Jython, too.) From py-web-sig at xhaus.com Wed Sep 8 18:29:59 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 8 18:24:55 2004 Subject: [Web-SIG] Asynchronous architectures, abstract and concrete. In-Reply-To: <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com> References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com> Message-ID: <413F3387.9040908@xhaus.com> [Phillip J. Eby] > [lots of excellent stuff snipped] Thanks for the great explanations Phillip, and I agree with your positions on these issues. There is just one area that I wanted to address. [Alan Kennedy] >> Welsh describes the use of thread-pools of a fixed "width" to service >> particular request types, with requests shunted between those >> (otherwise isolated) thread pools using queues. [Phillip J. Eby] > The description you use here sounds exactly like typical Python async > servers today: they have fixed-size threadpools for running > "application" code, and another fixed size thread pool (width=1) for I/O. In reply 1. Welsh's architecture is much more abstract and high level, in that it discusses clustering, multiply redundant hardware pools, failover, isolation, load-balancing, etc, and no specific implementation technology. 2. The existing cpython frameworks are all still limited by the cpython GIL. Which gives all the more reason for pushing as much as possible down closer to the operating system, and outside of pure python. 3. Welsh's architecture discusses isolation of multiple IO subsystems into different thread groups. For example, there could be a thread group holding a pool of (blocking) database connections, which would be the appropriate "width" to process as many requests as can be concurrently supported by the RDBMS. Since there are blocking sockets/pipes/fifos between the application and the database, such database operations also count as a form of IO, which has to be managed. It could potentially be managed in an asynchronous fashion. Do any of the cpython frameworks support an asynchronous database API? Just some thoughts. I really think Welsh's paper is worth a read. In fact, it's been 6 months since I read it: I'm going to read it now again, in light of my newly gained WSGI knowledge. Should only take 30 to 40 mins to read it again. Regards, Alan. From exarkun at divmod.com Wed Sep 8 19:39:18 2004 From: exarkun at divmod.com (Jp Calderone) Date: Wed Sep 8 19:39:23 2004 Subject: [Web-SIG] Asynchronous architectures, abstract and concrete. In-Reply-To: <413F3387.9040908@xhaus.com> References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com> <413F3387.9040908@xhaus.com> Message-ID: <413F43C6.3080900@divmod.com> Alan Kennedy wrote: > [snip] > > 3. Welsh's architecture discusses isolation of multiple IO subsystems > into different thread groups. For example, there could be a thread group > holding a pool of (blocking) database connections, which would be the > appropriate "width" to process as many requests as can be concurrently > supported by the RDBMS. Since there are blocking sockets/pipes/fifos > between the application and the database, such database operations also > count as a form of IO, which has to be managed. It could potentially be > managed in an asynchronous fashion. Do any of the cpython frameworks > support an asynchronous database API? Yes, http://twistedmatrix.com/documents/current/howto/enterprise Jp From janssen at parc.com Thu Sep 9 01:40:48 2004 From: janssen at parc.com (Bill Janssen) Date: Thu Sep 9 01:41:20 2004 Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's comments on WSGI draft 1.4) In-Reply-To: Your message of "Wed, 08 Sep 2004 07:25:12 PDT." <413F1648.3040303@xhaus.com> Message-ID: <04Sep8.164056pdt."58612"@synergy1.parc.xerox.com> > A pity that cpython doesn't implement sendfile as an native C method > that is layered on top of a native OS implementation if available, or a > generic C implementation if not. The current lack of the call means that > people tend to implement their own sendfile in pure python, meaning that > they end up acquiring and releasing the GIL between every chunk sent. Is this something that should be added to the standard library (probably as part of the socket module)? Bill From pje at telecommunity.com Thu Sep 9 02:43:09 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 9 02:42:37 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... Message-ID: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> * HTTP_AUTHENTICATION -- I haven't seen a concrete proposal for this yet, and I don't personally consider it a high priority. If something is to go in for this, somebody needs to put together a proposal, preferably in the form of a patch to the PEP. * Byte strings: so far the only discussion here has centered on character sets required by HTTP RFCs. I'm going to loosen up the ASCII status/header requirement slightly, to indicate that ISO-8859-1 is acceptable encoding, per RFC 2616. Any other comments regarding byte string issues? * Error handling -- I'm assuming the SIG consensus is +1 on the 'wsgi.fatal_errors' key, but haven't seen any feedback on my ideas for 'start_response', except that I seem to recall someone saying they didn't want the body passed to start_response. Taking that part out, we end up with something like this: 'start_response()' doesn't actually transmit the status or headers until the first write() call occurs or the first string is yielded from the returned iterable. 'start_response' simply stores the status or headers for future use, and may therefore be called more than once. However, calling 'start_response()' *after* a write(), or after the first string is yielded, is a fatal error. Top-level servers/gateways should log detailed information about errors that occur after a partial result is transmitted. They may also attempt to send error information to the client if the content type is text (e.g. text/html, text/xml, text/plain). Feedback, anyone? * File-like objects -- I think anything we offer for file-like objects should be optional. The big question is whether to offer a single, introspection-based extension for all file-like things, or whether to use separate extensions for different sorts of things, like 'wsgi.fd_wrapper' for file descriptors and 'wsgi.nio_wrapper' for Java NIO objects, etc. Does anybody have any arguments/use cases one way or the other? * Configuration -- I'm going to mention that servers *should* provide an easy way to configure name-value pairs to be supplied to an application's 'environ', and that one way to do that is simply to include OS environment variables in 'environ'. Am I missing anything else that's been discussed recently? (E.g. just before I went into hiding from the hurricane...) From py-web-sig at xhaus.com Thu Sep 9 13:20:51 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 9 13:15:46 2004 Subject: [Web-SIG] CPU cache locality. In-Reply-To: <413F43C6.3080900@divmod.com> References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com> <413F3387.9040908@xhaus.com> <413F43C6.3080900@divmod.com> Message-ID: <41403C93.8080509@xhaus.com> [Alan Kennedy] >> 3. Welsh's architecture discusses isolation of multiple IO subsystems >> into different thread groups. For example, there could be a thread >> group holding a pool of (blocking) database connections, which would >> be the appropriate "width" to process as many requests as can be >> concurrently supported by the RDBMS. Since there are blocking >> sockets/pipes/fifos between the application and the database, such >> database operations also count as a form of IO, which has to be >> managed. It could potentially be managed in an asynchronous fashion. >> Do any of the cpython frameworks support an asynchronous database API? [Jp Calderone] > Yes, http://twistedmatrix.com/documents/current/howto/enterprise Thanks for the reply Jp. I've been thinking further about multi-threading, CPU cache locality and iterators. While I was thinking about it in relation to twisted enterprise at first, it's really an issue that applies to WSGI as well. But let's take twisted enterprise as an example. I'm not intimately familiar with Twisted, so please forgive me if I get something wrong. So twisted has a pool of threads which carries out synchronous database operations on behalf of clients, but in an asynchronous manner from the clients perspective. This is done by receiving the "database requests" from a queue, processing each synchronously using blocking DB-API calls, and then returning the result to the client asynchronously, either using a callback function or sending the results back on a queue. Is this how twisted "deferred"s work? So, for the sake of argument, let's say that a similar structure is in place in a WSGI framework. Further, let's say that database "results", i.e. strings, ints, blobs, etc, from database columns will be yielded as iterable data by some middleware component. These values will be processed further down the middleware stack by some other component, which, for example, is generating HTML pages containing the data. Let's assume that there is a single I/O thread which is responsible for communicating final results back to the user, i.e. through the client socket. Due to the on-demand nature of the iterator which middleware uses to return values, it is possible that the I/O thread could end up executing database code. For example, say that the database data is accessed through a python descriptor, meaning that accessing the data may cause execution of python code in whatever python object retrieved the data from database Which will be detrimental to CPU cache locality. Because the I/O thread will potentially execute code from every component in the middleware stack, its thread of execution could meander all over several megabytes of python bytecode. Which is pretty much guaranteed to eliminate any benefit that may be provided by CPU caches. In the worst case, this could cause significant cache "thrashing", as lots of different pieces of bytecode clash and "fight" for space in the CPU cache. Welsh[1] states the problem like this: "In a thread-per-task system, the instruction cache tends to take many misses as the thread's control passes through many unrelated code modules to process the task. In addition, whenever a context switch occurs (due to thread preemption or blocking I/O call, say), other threads will invariably push the waiting thread's state out of the cache. When the original thread resumes execution, it will need to take many cache misses in order to bring its code and state back into the cache. In this situation, all of the threads in the system are competing for limited cache space." The solution to this problem is for middleware components to only return references to passive data, and never to return iterators that cause the execution of python code. I notice that Phillip has include a statement in PEP-0333 which states in the section under "Buffering and Streaming": """ Generally speaking, applications will achieve the best throughput by buffering their (modestly-sized) output and sending it all at once. When this is the case, applications should simply return a single-element iterable containing their entire output as a single string. [snip] For large files, however, or for specialized uses of HTTP streaming (such as multipart "server push"), an application may need to provide output in smaller blocks (e.g. to avoid loading a large file into memory). It's also sometimes the case that part of a response may be time-consuming to produce, but it would be useful to send ahead the portion of the response that precedes it. """ Phillip, when you wrote about "performance" here, did you have CPU cache's in mind? Regards, Alan. 1. A Design Framework for Highly Concurrent Systems http://www.eecs.harvard.edu/~mdw/papers/events.pdf From pje at telecommunity.com Thu Sep 9 15:24:06 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 9 15:23:43 2004 Subject: [Web-SIG] CPU cache locality. In-Reply-To: <41403C93.8080509@xhaus.com> References: <413F43C6.3080900@divmod.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com> <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com> <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com> <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com> <413F3387.9040908@xhaus.com> <413F43C6.3080900@divmod.com> Message-ID: <5.1.1.6.0.20040909091926.020cb020@mail.telecommunity.com> At 12:20 PM 9/9/04 +0100, Alan Kennedy wrote: >I notice that Phillip has include a statement in PEP-0333 which states in >the section under "Buffering and Streaming": > >""" >Generally speaking, applications will achieve the best throughput by >buffering their (modestly-sized) output and sending it all at once. When >this is the case, applications should simply return a single-element >iterable containing their entire output as a single string. > >[snip] > >For large files, however, or for specialized uses of HTTP streaming (such >as multipart "server push"), an application may need to provide output in >smaller blocks (e.g. to avoid loading a large file into memory). It's also >sometimes the case that part of a response may be time-consuming to >produce, but it would be useful to send ahead the portion of the response >that precedes it. >""" > >Phillip, when you wrote about "performance" here, did you have CPU cache's >in mind? Actually, the word "performance" doesn't appear anywhere in the above; I referred only to "throughput". Performance can affect throughput, but not really the other way around. The reason that returning a single-element iterable improves throughput in async architectures like Twisted and ZServer is that they use a thread pool for application code. If the application object returns an iterable containing the whole response body, then the application thread is now free to run a new application instance. This allows greater "throughput" at the application level, because more requests can be run in a given period of time than if an application thread had to continue to be used. From py-web-sig at xhaus.com Thu Sep 9 18:01:51 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 9 17:57:28 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> Message-ID: <41407E6F.4050809@xhaus.com> [Phillip J. Eby] > * File-like objects -- I think anything we offer for file-like objects > should be optional. The big question is whether to offer a single, > introspection-based extension for all file-like things, or whether to > use separate extensions for different sorts of things, like > 'wsgi.fd_wrapper' for file descriptors and 'wsgi.nio_wrapper' for Java > NIO objects, etc. Does anybody have any arguments/use cases one way > or the other? Optionality is fine by me. But I don't understand what reasons there might be to have separate class names per platform? It's always been my understanding that the intention for this capability is so that applications can give "hints", to servers that support high-performance methods of file transmission, that the resource being returned is a candidate for bulk transfer. So, as an application author, I'll surely want that hinting process to work on as many servers as possible, regardless of the platform. So, if there is a choice of multiple such hinting processes, and I have to look for each one of them at runtime, my code is longer and less efficient than it could be, e.g. def app_object(environ, start_response): start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] ) result = open('humungous.pdf') for cname in ['fd','nio','dotnet','stackless','pypy','smalltalk']: try: return environ['wsgi.%s_wrapper' % cname](result): except KeyError: pass return result Instead, if a single class is used, the definition of which is different per server, then I have only to look at that one class. def app_object(environ, start_response): start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] ) result = open('humungous.pdf') if environ.has_key('wsgi.file_wrapper'): return environ['wsgi.file_wrapper'](result) return result One reason I can see for having multiple classes is if they really represent fundamentally different concepts. For example, there are possibly more types of optimisations available, e.g. return a stream of bytes from a shared memory partition, if the platform supported DMA access to that shared memory, which would then be bulk-transferable, i.e. bypassing the CPU. Since shared memory is a concept whose implementation varies subtly between platforms, should we be trying to abstract that concept into one class with a single interface, whose implementation differs between platforms, or into separate classes, one for each platform? What about an optimised transfer from an RDBMS, say a BLOB stored in a database row. Should that be wrapped with a file_wrapper (because it's really coming from a file descriptor?), or with a special db_blob_wrapper class? Would these db_blob_wrappers differ between different database platforms? Because it is quite possible that the RDBMS data is also coming through the network subsystem, this bulk transfer could potentially be arranged at the network level, conceivably on a sophisticated network-card/router/etc, and thus never even reach the bus on the serving machine. OK, that's a bit wild and unlikely :-), but I'm just trying to foresee as many scenarios for bulk transfers as I can, to see if the proposed WSGI model fits. I suppose it's about recording enough meta-information for the server to recognise such optimisable scenarios. So the question has to be asked: how portable do we need these optimisations to be between servers. Is medusa likely to have its middleware component dedicated to sendfile, for example? And twisted have its own, thread-pool based, implementation, for example. In which case portability of, say the sendfile optimisation, becomes an issue of server configuration, not support classes. Or might it be that we need to facilitate the application at two levels in the server? Take the example of shared memory :- 1. In the middleware stack, a component maps a certain URL space into the shared memory partition, and returns a specialised wrapper class that contains a shared memory reference, i.e. a handle, start/end/len, etc. 2. The application also needs to plug into the server, below the middleware stack, so that it can implement the actual bulk transfer from the shared memory (assuming that the shared_memory_wrapper wasn't obscured by some component below it in the stack). Since shared memory support, and probably DMA support, would vary between platform, this is where the platform specific element comes in: there would be different versions of that "server plug-in" for different platforms/servers. Lastly, I should also point out that, with the current jython I/O subsystem, the sendfile/transferTo optimisation is not currently possible, inside most existing J2EE containers anyway. This is because sockets created using the old java.net APIs, do not by default have nio.channels associated with them. Most existing J2EE containers, which must support blocking servlets by definition, don't bother to handle sockets using java.nio, because it's more work, not necessary, and not portable to older versions of the platform. So it's not possible to use the sockets they create for bulk transfers. A container could be redesigned to use the java.nio APIs, completely in a blocking fashion, if desired. Which still wouldn't be any use in existing jython, because jython's current socket modules are entirely based on old java.net classes. Which means that jython code couldn't access the channel nature of the sockets, even if those sockets supported it, without modification of the standard library. I have a (~60% complete) side-project to develop aysnchronous socket support on jython 2.1, by porting the socket, select and (maybe) asyncore modules to java.nio. When that is complete (timescale==months, v busy), I hope to see experimentation, from myself and others, on running python asynchronous models on jython. Here is what the jython file_wrapper code might look like. class jython_file_wrapper: def __init__(self, wrapped): self.wrapped = wrapped def sendfile(self, jynio_socket): if hasattr(self.wrapped, 'getChannel') : self.wrapped.getChannel().transferTo(jynio_socket) else: self.send_in_chunks_instead(jynio_socket) Regards, Alan. From pje at telecommunity.com Thu Sep 9 18:31:27 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 9 18:31:16 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <41407E6F.4050809@xhaus.com> References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com> At 05:01 PM 9/9/04 +0100, Alan Kennedy wrote: >[Phillip J. Eby] > > * File-like objects -- I think anything we offer for file-like objects > > should be optional. The big question is whether to offer a single, > > introspection-based extension for all file-like things, or whether to > > use separate extensions for different sorts of things, like > > 'wsgi.fd_wrapper' for file descriptors and 'wsgi.nio_wrapper' for Java > > NIO objects, etc. Does anybody have any arguments/use cases one way > > or the other? > >Optionality is fine by me. > >But I don't understand what reasons there might be to have separate class >names per platform? > >It's always been my understanding that the intention for this capability >is so that applications can give "hints", to servers that support >high-performance methods of file transmission, that the resource being >returned is a candidate for bulk transfer. So, as an application author, >I'll surely want that hinting process to work on as many servers as >possible, regardless of the platform. You may want that, but it's going to be platform-dependent whether you can do that. A trivial example: Java doesn't have file descriptors, so you're not going to be able to use 'sendfile()' in Java. So, what's the point of having 'fd_wrapper' available there? >So, if there is a choice of multiple such hinting processes, and I have to >look for each one of them at runtime, my code is longer and less efficient >than it could be, e.g. > >def app_object(environ, start_response): > start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] ) > result = open('humungous.pdf') > for cname in ['fd','nio','dotnet','stackless','pypy','smalltalk']: > try: > return environ['wsgi.%s_wrapper' % cname](result): > except KeyError: > pass > return result > >Instead, if a single class is used, the definition of which is different >per server, then I have only to look at that one class. An object that works with 'fd' isn't going to work with 'nio', or vice versa is it? Or am I missing something about how nio works? I suppose the alternative is to specify 'wsgi.file_wrapper' such that it's required to always return *something* usable, even if it can't figure out any way to optimize it. Objects passed to 'file_wrapper' would have to have a 'read', optionally a 'close', and optionally 'fileno'. (A Jython WSGI server would ignore fileno, of course.) From tony at lownds.com Thu Sep 9 20:09:16 2004 From: tony at lownds.com (tony@lownds.com) Date: Thu Sep 9 20:30:25 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com> References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com> Message-ID: <60305.204.162.121.54.1094753356.squirrel@*> [Phillip] > I suppose the alternative is to specify 'wsgi.file_wrapper' such that it's > required to always return *something* usable, even if it can't figure out > any way to optimize it. Objects passed to 'file_wrapper' would have to > have a 'read', optionally a 'close', and optionally 'fileno'. (A Jython > WSGI server would ignore fileno, of course.) > I like this option. As long as the file_wrapper does not initiate any actions until the server gets it, the results of file_wrapper can be opaque to middleware. Other methods might be useful too, for instance, tell() - if an application passes a file that has been seeked to a certain point, thats where reading of data should start. I'm assuming the new "combined" wsgi.file_wrapper key would be optional. This puts a burden on applications that need to send back data from files, because they'd need fallback logic if the wsgi.file_wrapper key isn't present. But that seems better on the whole that putting the burden on servers, all the time. -Tony From tony at lownds.com Thu Sep 9 20:32:10 2004 From: tony at lownds.com (tony@lownds.com) Date: Thu Sep 9 20:53:17 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> Message-ID: <60470.204.162.121.54.1094754730.squirrel@*> > * Error handling -- I'm assuming the SIG consensus is +1 on the > 'wsgi.fatal_errors' key, but haven't seen any feedback on my ideas for > 'start_response', except that I seem to recall someone saying they didn't > want the body passed to start_response. Taking that part out, we end up > with something like this: > > 'start_response()' doesn't actually transmit the status or headers until > the first write() call occurs or the first string is yielded from the > returned iterable. 'start_response' simply stores the status or headers > for future use, and may therefore be called more than once. However, > calling 'start_response()' *after* a write(), or after the first string is > yielded, is a fatal error. Top-level servers/gateways should log detailed > information about errors that occur after a partial result is > transmitted. They may also attempt to send error information to the > client > if the content type is text (e.g. text/html, text/xml, text/plain). > > Feedback, anyone? > I still like the idea of having an exception that servers will always catch and send back to the user. If an application doesn't know whether a server can display an error page, it will tend to include it's own error-displaying logic (made simpler by the start_response() above). But, if applications take care of displaying those exceptions, then exception catching middleware won't really be useful for those applications. As long as exceptions get logged, I think it is fine for there to be no requirement about sending error data back to the client, after the response is started. Without some other way for applications to send errors, then the additional requirements on start_response do make sense, even though it complicates some pretty tricky logic. How does wsgi.fatal_errors help servers? Wouldn't servers have to make up specialized exceptions for inclusion in wsgi.fatal_errors, in order to avoid interfering with catching other exceptions? Now write() and start_response() need more logic, to throw only errors in wsgi.fatal_errors. And servers can't rely on applications adhering to the rules in the specs. -Tony From py-web-sig at xhaus.com Thu Sep 9 21:05:06 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 9 21:00:25 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com> References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com> Message-ID: <4140A962.2050602@xhaus.com> [Phillip J. Eby] > Java doesn't have file descriptors, so > you're not going to be able to use 'sendfile()' in Java. So, what's > the point of having 'fd_wrapper' available there? and > An object that works with 'fd' isn't going to work with 'nio', or vice > versa is it? Or am I missing something about how nio works? > > I suppose the alternative is to specify 'wsgi.file_wrapper' such that > it's required to always return *something* usable, even if it can't > figure out any way to optimize it. Objects passed to 'file_wrapper' > would have to have a 'read', optionally a 'close', and optionally > 'fileno'. (A Jython WSGI server would ignore fileno, of course.) Ah, I think I see where the confusion lies. Perhaps I should have taken more time to explain a certain issue earlier than this. Jython files *may* have the local analogue of a file descriptor, i.e. a channel, but only when the jython code is running on a jvm that supports java.nio, which means 1.4 or greater. I could define the fileno method of jython files like this class file: def fileno(self): if hasattr(self.java_file, 'getChannel'): # java >= 1.4 behaviour return self.java_file.getChannel() else: # java < 1.4 behaviour raise UnimplementedException() The current jython 2.1 library only raises the exception, because java.nio didn't exist when it was written. Now, I could suggest a patch to the jython runtime to redefine fileno as above, but that's not a safe thing to do: existing python code that is expecting a cpython file descriptor will almost certainly break if it gets passed a java.nio.channels.FileChannel instead. Unless the entire I/O subsystem has been rewritten, as I am doing for jynio sockets, which *do* have a useful fileno() method which each of the new modules knows how to use properly. The returned object confers identical semantics to cpython file descriptors, when passed to jynio socket modules. For example, when jynio is finished, this code will run identically on cpython and jython s = socket.socket(AF_INET, SOCK_STREAM) fd = s.fileno() po = select.poll() po.register(fd) http://www.xhaus.com/alan/python/jynio/socket.html#socketvschannel A very similar set of operations can be carried out on both file descriptors and channels: i.e. selectability/event notification, bulk transfers, etc. http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/SelectableChannel.html http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/InterruptibleChannel.html http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/WritableByteChannel.html http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/GatheringByteChannel.html So, when running on java 1.4+, I *can* get the local equivalent of a file descriptor, and do meaningful things with it, in terms of bulk transfer, etc. But that doesn't work on older JVMs where only java.io is available: There is no other way (without examining private file object data, i.e. the java.io.FileInputStream encapsulated in the jython file) that I can determine if something is file-like, other than to do this if type(app_object) is types.FileType: do_high_performance_file_stuff(app_object) That's why I pushed for permitting return of file-likes from applications: because it's the only "safe" way to recognise files on pre-1.4 jvms. And it's also portable to all other python platforms. It would still definitely be useful to recognise the optimisation on older VMs, because I could still have a fast native-java loop-while-sending-blocks implementation of "sendfile", which would be substantially faster than a jython one, because it would avoid the unnecessary transformation of the file data into jython data structures (i.e. binary strings) and then back again. But your solution of a single server-provided file_wrapper class solves the problem nicely. Because the application has hinted that the application object is a file, I now have a simple way of checking, that works across all jvms. So I can now very simply provide the bulk transfer optimisation, and implement it differently, depending on the availability of the java.nio classes, e.g. try: import java.nio class file_wrapper: def send_file(self, dest) use_nio_transfer_to(dest) except ImportError: import java.io class file_wrapper: def send_file(self, dest) use_looping_sendfile(dest) Also, the 'file_wrapper' solution alleviates the need for me look at private data inside jython file objects, to see if the underlying java.io.FileInputStream has a getChannel method. So it's definitely the cleanest solution. As for the 'file_wrapper' class name across platforms, as you can see from the above, having different class names for each platform would not change the above considerations one bit: it would just make the application authors life more difficult. I hope that makes the situation clearer! Regards, Alan. From pje at telecommunity.com Thu Sep 9 21:30:27 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 9 21:30:25 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <60470.204.162.121.54.1094754730.squirrel@*> References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> At 11:32 AM 9/9/04 -0700, tony@lownds.com wrote: >I still like the idea of having an exception that servers will always >catch and send back to the user. Currently, isn't that *every* exception? I'm making the assumption that the server will want to log and display every non-fatal error. (Except those occurring after the headers are sent, which can only be logged in the general case.) >If an application doesn't know whether a >server can display an error page, it will tend to include it's own >error-displaying logic (made simpler by the start_response() above). But, >if applications take care of displaying those exceptions, then exception >catching middleware won't really be useful for those applications. This seems circular to me: if the application throws an error that's actually an application-defined error message, then why is middleware going to be *useful* here? You must have some other use case in mind besides the middleware presenting a friendly message, since presumably the application can produce a friendlier message (at least in the sense of being specific to the app and looking like the app). Could you elaborate on your use case? >As long as exceptions get logged, I think it is fine for there to be no >requirement about sending error data back to the client, after the >response is started. > >Without some other way for applications to send errors, then the >additional requirements on start_response do make sense, even though it >complicates some pretty tricky logic. I'm not sure it's *that* bad... headers_set = [] headers_sent = [] def write(data): if not headers_set: raise AssertionError("write() before start_response()") elif not headers_sent: status, headers = headers_sent[:] = headers_set write(status+'\r\n') for header in headers: write('%s: %s\r\n' % header) write('\r\n') # actual write() code goes here... def start_response(status,headers): if headers_sent: raise AssertionError("Headers already sent!") headers_set[:] = [status,headers] return write # ... result = application(environ, start_response) try: try: for data in result: write(data) if not headers_sent: write('') # force headers to be sent except: if not headers_sent: # call start_response() with a 500 error # status, then write out an error message # re-raise the error finally: # XXX ensure client connection is closed first if hasattr(result,'close'): result.close() Of course, all of the above should be wrapped in a try-except that logs any errors and continues the server. >How does wsgi.fatal_errors help servers? Wouldn't servers have to make up >specialized exceptions for inclusion in wsgi.fatal_errors, in order to >avoid interfering with catching other exceptions? Now write() and >start_response() need more logic, to throw only errors in >wsgi.fatal_errors. Hm. Well, the alternative would be that the server has to track state to know its state is hosed. That is, if you try to write() when a client connection is lost, subsequent write() calls should fail. Similarly, start_response() after write() should fail, but then so should subsequent write() calls. It seemed to me that it was simpler to raise a fatal error in that case, which the application would allow to pass through. But, if the server has to consider the possibility that the app might not be able to enforce this (e.g. because of bare 'except:' clauses), then I suppose we might as well just have the complexity of state checking and ignore the fatal errors issue. OTOH, the purpose of fatal_errors is to allow the *app* to know that it's pointless to go on, and that it *should* abort. This still seems somewhat useful to me, although it could also be argued that virtually *any* exception raised by start_response() and write() should be considered fatal. Cascading errors are also a potential problem. Let's say the application doesn't propagate a fatal error, but instead "converts" it to a different kind of error. Now, the server must catch the application's error, while still knowing that it erred internally first. Sigh. This suggests to me that start_response() and write() must have exception handlers that set a flag when they have an uncaught exception, so that they know to ignore the application's later errors if the problem originated within the server. Ugh. I suppose the bright side is that we wouldn't need 'wsgi.fatal_errors' any more, but my "not so bad" code above now needs some additional error handling and an 'internal_errors' state variable. >And servers can't rely on applications adhering to the >rules in the specs. I'm not sure what you mean here, but maybe it's what I just said above? (about apps maybe being broken in their handling of fatal_errors). From tony at lownds.com Thu Sep 9 21:45:57 2004 From: tony at lownds.com (tony@lownds.com) Date: Thu Sep 9 22:07:06 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> Message-ID: <61044.204.162.121.54.1094759157.squirrel@*> > At 11:32 AM 9/9/04 -0700, tony@lownds.com wrote: > >>I still like the idea of having an exception that servers will always >>catch and send back to the user. > > Currently, isn't that *every* exception? I'm making the assumption that > the server will want to log and display every non-fatal error. (Except > those occurring after the headers are sent, which can only be logged in > the > general case.) > No, I mean that the server will send back a document that was sent as part of the exception, not a document derived from the exception and/or traceback. It is a mechanism that applications can rely on to get an error notice to the user. > >>If an application doesn't know whether a >>server can display an error page, it will tend to include it's own >>error-displaying logic (made simpler by the start_response() above). But, >>if applications take care of displaying those exceptions, then exception >>catching middleware won't really be useful for those applications. > > This seems circular to me: if the application throws an error that's > actually an application-defined error message, then why is middleware > going > to be *useful* here? > > You must have some other use case in mind besides the middleware > presenting > a friendly message, since presumably the application can produce a > friendlier message (at least in the sense of being specific to the app and > looking like the app). Could you elaborate on your use case? > Middleware can use the exception to provide side-effects, like notifying developers, or displaying diagnostics to certain IPs. Mainly the use case is that raising an exception with an HTML page is less error prone for applications and middleware than invoking write from within an except clause. The server can decide whether it will be able to send out the error page, rather than the application or middleware having to try and figure out if it can successfully start a response from scratch. > OTOH, the purpose of fatal_errors is to allow the *app* to know that it's > pointless to go on, and that it *should* abort. This still seems somewhat > useful to me, although it could also be argued that virtually *any* > exception raised by start_response() and write() should be considered > fatal. > Yes, I would have thought so. >>And servers can't rely on applications adhering to the >>rules in the specs. > > I'm not sure what you mean here, but maybe it's what I just said > above? (about apps maybe being broken in their handling of fatal_errors). > > Yep -Tony From pje at telecommunity.com Thu Sep 9 22:31:56 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 9 22:31:54 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <61044.204.162.121.54.1094759157.squirrel@*> References: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> At 12:45 PM 9/9/04 -0700, tony@lownds.com wrote: > > At 11:32 AM 9/9/04 -0700, tony@lownds.com wrote: > > > >>I still like the idea of having an exception that servers will always > >>catch and send back to the user. > > > > Currently, isn't that *every* exception? I'm making the assumption that > > the server will want to log and display every non-fatal error. (Except > > those occurring after the headers are sent, which can only be logged in > > the > > general case.) > > > >No, I mean that the server will send back a document that was sent as part >of the exception, not a document derived from the exception and/or >traceback. It is a mechanism that applications can rely on to get an error >notice to the user. I'm still not seeing how this is different from the application simply catching the exception at its highest level, and doing: start_response("500 Error occurred", [('Content-type','text/plain')]) return ["error body here"] > > You must have some other use case in mind besides the middleware > > presenting > > a friendly message, since presumably the application can produce a > > friendlier message (at least in the sense of being specific to the app and > > looking like the app). Could you elaborate on your use case? > > > >Middleware can use the exception to provide side-effects, like notifying >developers, or displaying diagnostics to certain IPs. In that case, why not have the application simply not catch the error, and let middleware do it? I'm still confused as to how having a special exception helps. >Mainly the use case is that raising an exception with an HTML page is less >error prone for applications and middleware than invoking write from >within an except clause. The server can decide whether it will be able to >send out the error page, rather than the application or middleware having >to try and figure out if it can successfully start a response from >scratch. Ah. ISTM that use case is effectively handled: use start_response()+return [body] as I described above. If start_response fails, you're in basically the same position you'd have been if you were raising a special error. (I.e., your error wasn't going to get reported anyway.) Of course, it could be argued that the server in that case doesn't have anything of interest to log regarding the error. But that could be handled by adding a 'body' argument to 'start_response' as I previously proposed. Let me see if I understand your actual use case... you want to be able to write an application that, although it handles its own errors, also gives users the option of placing error-handling middleware over it to change how its errors are rendered, logged, etc. And, you want that mechanism to be based on Python exception information (type, value, traceback) rather than on HTTP information (status, headers, content). Finally, you want this to be unconditionally available, rather than having to first check whether the exception handling middleware is installed. Is this correct? From gabriel.cooper at mediapulse.com Thu Sep 9 22:41:15 2004 From: gabriel.cooper at mediapulse.com (Gabriel Cooper) Date: Thu Sep 9 22:39:37 2004 Subject: [Web-SIG] [ANNOUNCE] SnakeSkin: Python Application Toolkit In-Reply-To: <4140BCE4.8000101@mediapulse.com> References: <1094756605.12825.25.camel@mike.mediapulse.com> <4140BB34.4000200@mediapulse.com> <4140BCE4.8000101@mediapulse.com> Message-ID: <4140BFEB.8060702@mediapulse.com> We are proud to announce the release of SnakeSkin, a python application toolkit released under an Open Source BSD-Style license, newly available at http://snakeskin-tools.sourceforge.net/ In SnakeSkin, developers can customize the framework to the application, unlike in traditional frameworks, such as PHP. For example, adding custom tags to the templating system is quick and easy. The goal of the project is to have a framework that scales down as well as up--a "Zope-lite" framework. SnakeSkin can scale down to be useful in a simple form-to-email or just to apply a clean-cut design skin. The toolkit can just as easily scale up to handle complex content managment systems, B2B extranets, and full-fledged e-commerce engines. We do it all the time. SnakeSkin, based upon the existing Albatross project maintained by Object Craft, runs under several webservers, including CGI based, Apache, FastCGI, and its own included webserver (used mainly for development). SnakeSkin has several built in capabilities: * Dynamic Macro Features (think server-side includes on steroids) * SQL support in both the application and the template * Support for Apach 2.0 Filters ... and includes Albatross features ... * Clean separation of logic and design * A simple-yet-robust templating system that is Web Designer-friendly (Plays nice with Dreamweaver) * Secure Session Management in hidden fields, server-side data-stores, or through a session server We are ready to consider the current version, 0.9, as a canadiate for 1.0 release. Anyone that has feedback on the current design and/or finds bugs, please send information in though the mailling list ( http://lists.sourceforge.net/lists/listinfo/snakeskin-tools-discuss ) or file a bug report on sourceforge.net. Thank You, The SnakeSkin team. From andrew at andreweland.org Fri Sep 10 13:45:25 2004 From: andrew at andreweland.org (Andrew Eland) Date: Fri Sep 10 13:58:11 2004 Subject: [Web-SIG] Adding status code constants to httplib Message-ID: <414193D5.6010405@andreweland.org> Hi, Over in web-sig, we're discussing PEP 333, the Web Server Gateway Interface. Rather than defining our own set of constants for the HTTP status code integers, we thought it would be a good idea to add them to httplib, allowing other applications to benefit. I've uploaded a patch[1] to httplib.py and the corresponding documentation. Do people think this is a good idea? -- Andrew Eland (http://www.andreweland.org) [1] http://sourceforge.net/tracker/index.php?func=detail&aid=1025790&group_id=5470&atid=305470 From pje at telecommunity.com Fri Sep 10 17:01:08 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 10 17:01:37 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <414193D5.6010405@andreweland.org> Message-ID: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> At 12:45 PM 9/10/04 +0100, Andrew Eland wrote: >Over in web-sig, we're discussing PEP 333, the Web Server Gateway >Interface. Rather than defining our own set of constants for the HTTP >status code integers, we thought it would be a good idea to add them to >httplib, allowing other applications to benefit. I've uploaded a patch[1] >to httplib.py and the corresponding documentation. Do people think this is >a good idea? I would also put the statuses in a dictionary, such that: status_code[BAD_GATEWAY] = "Bad Gateway" This could be accomplished via something like: status_code = dict([ (val, key.replace('_',' ').title()) for key,val in globals.items() if key==key.upper() and not key.startswith('HTTP') and not key.startswith('_') ]) From andrew at andreweland.org Fri Sep 10 17:12:02 2004 From: andrew at andreweland.org (Andrew Eland) Date: Fri Sep 10 17:24:54 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> Message-ID: <4141C442.8050005@andreweland.org> Phillip J. Eby wrote: > I would also put the statuses in a dictionary, such that: > > status_code[BAD_GATEWAY] = "Bad Gateway" There's a table mapping status codes to messages on BaseHTTPRequestHandler at the moment. It could be moved into httplib to make it more publically visible. -- Andrew From py-web-sig at xhaus.com Fri Sep 10 17:45:35 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Fri Sep 10 17:41:14 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <4141C442.8050005@andreweland.org> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <4141C442.8050005@andreweland.org> Message-ID: <4141CC1F.4000207@xhaus.com> [Phillip J. Eby] >> I would also put the statuses in a dictionary, such that: >> >> status_code[BAD_GATEWAY] = "Bad Gateway" [Andrew Eland] > There's a table mapping status codes to messages on > BaseHTTPRequestHandler at the moment. It could be moved into httplib to > make it more publically visible. And that mapping has 2 levels of human readable messages on it, for example 304: ('Not modified', 'Document has not changed singe given time'), I think that, since the human readable versions are seldom heeded anyway, perhaps a single message is all we need? And I'm -1 on forcing servers, particularly CGI servers, to import the client-side httplib (2.3 httplib.pyc == 42K) just to get this mapping. If the changes are not going to make it in until the next release of cpython anyway, then maybe we should just aim for a new module? Or is some version of 2.4 the target, in which case minimal patches might make it in, whereas new modules won't? Just my 0,02 euro. Alan. From andrew at andreweland.org Fri Sep 10 17:46:44 2004 From: andrew at andreweland.org (Andrew Eland) Date: Fri Sep 10 17:59:36 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <4141CC1F.4000207@xhaus.com> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <4141C442.8050005@andreweland.org> <4141CC1F.4000207@xhaus.com> Message-ID: <4141CC64.2090205@andreweland.org> Alan Kennedy wrote: > And that mapping has 2 levels of human readable messages on it, for example > 304: ('Not modified', 'Document has not changed singe given time'), > I think that, since the human readable versions are seldom heeded > anyway, perhaps a single message is all we need? A simple move would mean we'd have to keep both, for backwards compatability. I guess BaseHTTPRequestHandler could mix its long messages in with those in a httplib table, but it sounds ugly. > And I'm -1 on forcing servers, particularly CGI servers, to import the > client-side httplib (2.3 httplib.pyc == 42K) just to get this mapping. I think the number of people who wouldn't import httplib on speed/process size grounds is very small. If they're that worried about efficiency, they could copy and paste the table, and manage the extra development complexity. -- Andrew From pje at telecommunity.com Fri Sep 10 18:08:37 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 10 18:09:10 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <4141C442.8050005@andreweland.org> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040910120714.032b5b80@mail.telecommunity.com> At 04:12 PM 9/10/04 +0100, Andrew Eland wrote: >Phillip J. Eby wrote: > >>I would also put the statuses in a dictionary, such that: >> status_code[BAD_GATEWAY] = "Bad Gateway" > >There's a table mapping status codes to messages on BaseHTTPRequestHandler >at the moment. It could be moved into httplib to make it more publically >visible. It doesn't appear to include HTTP/1.1 status codes. From py-web-sig at xhaus.com Fri Sep 10 18:25:53 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Fri Sep 10 18:20:58 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <5.1.1.6.0.20040910120714.032b5b80@mail.telecommunity.com> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <5.1.1.6.0.20040910120714.032b5b80@mail.telecommunity.com> Message-ID: <4141D591.2090903@xhaus.com> [Andrew Eland] >> There's a table mapping status codes to messages on >> BaseHTTPRequestHandler at the moment. It could be moved into httplib >> to make it more publically visible. [Phillip J. Eby] > It doesn't appear to include HTTP/1.1 status codes. Hmm. The version I'm seeing, python23/Lib, has all the codes from RFC 2616. Are you looking at the python 2.1 version, by any chance? Regards, Alan. From mnot at mnot.net Sat Sep 11 07:24:29 2004 From: mnot at mnot.net (Mark Nottingham) Date: Sat Sep 11 07:24:44 2004 Subject: [Web-SIG] Adding status code constants to httplib In-Reply-To: <414193D5.6010405@andreweland.org> References: <414193D5.6010405@andreweland.org> Message-ID: FYI; status codes as exceptions; http://www.mnot.net/python/http/status.py On Sep 10, 2004, at 9:45 PM, Andrew Eland wrote: > Hi, > > Over in web-sig, we're discussing PEP 333, the Web Server Gateway > Interface. Rather than defining our own set of constants for the HTTP > status code integers, we thought it would be a good idea to add them > to httplib, allowing other applications to benefit. I've uploaded a > patch[1] to httplib.py and the corresponding documentation. Do people > think this is a good idea? > > -- Andrew Eland (http://www.andreweland.org) > > [1] > http://sourceforge.net/tracker/index.php? > func=detail&aid=1025790&group_id=5470&atid=305470 > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net > -- Mark Nottingham http://www.mnot.net/ From tony at lownds.com Sat Sep 11 18:24:00 2004 From: tony at lownds.com (tony@lownds.com) Date: Sat Sep 11 18:45:41 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> References: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> Message-ID: <55666.68.122.33.37.1094919840.squirrel@*> >>No, I mean that the server will send back a document that was sent as >> part >>of the exception, not a document derived from the exception and/or >>traceback. It is a mechanism that applications can rely on to get an >> error >>notice to the user. > > I'm still not seeing how this is different from the application simply > catching the exception at its highest level, and doing: > > start_response("500 Error occurred", > [('Content-type','text/plain')]) > return ["error body here"] > > Servers need additional logic to try and support calling start_response twice. Calling start_response again could still be an error for the application, masking the error. That code doesn't work from the iterator. > Let me see if I understand your actual use case... you want to be able to > write an application that, although it handles its own errors, also gives > users the option of placing error-handling middleware over it to change > how > its errors are rendered, logged, etc. And, you want that mechanism to be > based on Python exception information (type, value, traceback) rather than > on HTTP information (status, headers, content). Finally, you want this to > be unconditionally available, rather than having to first check whether > the > exception handling middleware is installed. Is this correct? Yes, with the addition of a server-provided exception class that holds the error document payload. -Tony From pje at telecommunity.com Sat Sep 11 19:13:22 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Sep 11 19:12:30 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <55666.68.122.33.37.1094919840.squirrel@*> References: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com> At 09:24 AM 9/11/04 -0700, tony@lownds.com wrote: > >>No, I mean that the server will send back a document that was sent as > >> part > >>of the exception, not a document derived from the exception and/or > >>traceback. It is a mechanism that applications can rely on to get an > >> error > >>notice to the user. > > > > I'm still not seeing how this is different from the application simply > > catching the exception at its highest level, and doing: > > > > start_response("500 Error occurred", > > [('Content-type','text/plain')]) > > return ["error body here"] > > > > > > >Servers need additional logic to try and support calling start_response >twice. They'll need it in any case. What are the odds that all errors will occur before start_response happens? >Calling start_response again could still be an error for the >application, masking the error. True. This is probably the strongest argument for having a special exception. That is, that an exception-in-progress could be masked by the error of calling start_response again. OTOH, there's always: try: try: t,v,tb = sys.exc_info() start_response("500 Error occurred", headers) except: raise t,v,tb # reraise the original else: return ["error body here"] finally: t = v = tb = None but admittedly, this is "guru-level" coding. OTOH, we could simply have an optional third argument to start_response: start_response(status,headers,sys.exc_info()) the idea being that 'start_response' should reraise the exc_info tuple (or some private exception type) if the response has already been started. It can also optionally log the error information. Note that this also allows middleware to trivially intercept error reports by overriding start_response. If it decides to handle the error itself, the middleware can simply throw an exception that it then catches as the app aborts. >That code doesn't work from the iterator. It only would have worked if it was in the first iteration, anyway. The server is probably in the best position to attempt recovery following the first iteration. However, in most cases where such code would *be* in the iterator, it's likely a generator that can simply yield the error body. Using the third-argument strategy above, it's going to get an error if it wouldn't work. > > Let me see if I understand your actual use case... you want to be able to > > write an application that, although it handles its own errors, also gives > > users the option of placing error-handling middleware over it to change > > how > > its errors are rendered, logged, etc. And, you want that mechanism to be > > based on Python exception information (type, value, traceback) rather than > > on HTTP information (status, headers, content). Finally, you want this to > > be unconditionally available, rather than having to first check whether > > the > > exception handling middleware is installed. Is this correct? > >Yes, with the addition of a server-provided exception class that holds the >error document payload. I think that we can meet this use case without a server-provided exception class; the server (or middleware) just needs to know that you're starting an error response, and what the error is. Adding an argument to start_response seems like a good, clean way to do this, and it looks easy to use/implement on all sides. What do you think? From tony at lownds.com Sat Sep 11 19:38:55 2004 From: tony at lownds.com (tony@lownds.com) Date: Sat Sep 11 20:00:35 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com> References: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com><5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com><5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com> Message-ID: <56402.68.122.33.37.1094924335.squirrel@*> >>Servers need additional logic to try and support calling start_response >>twice. > > They'll need it in any case. What are the odds that all errors will occur > before start_response happens? > Not good, hence the requirement that servers support re-starting the response. With exception handling, they just need a little bit of logic to decide whether to send the payload of the exception. They don't HAVE to support re-starting the response. Hmm, except then there would be a lot of "200 Ok" responses that actually ended in an error. > >>Calling start_response again could still be an error for the >>application, masking the error. > > True. This is probably the strongest argument for having a special > exception. That is, that an exception-in-progress could be masked by the > error of calling start_response again. OTOH, there's always: > > try: > try: > t,v,tb = sys.exc_info() > start_response("500 Error occurred", headers) > except: > raise t,v,tb # reraise the original > else: > return ["error body here"] > finally: > t = v = tb = None > > but admittedly, this is "guru-level" coding. OTOH, we could simply have > an > optional third argument to start_response: > > start_response(status,headers,sys.exc_info()) > > the idea being that 'start_response' should reraise the exc_info tuple (or > some private exception type) if the response has already been started. It > can also optionally log the error information. > > Note that this also allows middleware to trivially intercept error reports > by overriding start_response. If it decides to handle the error itself, > the middleware can simply throw an exception that it then catches as the > app aborts. > That reasonably handles the exception case. Applications and middleware should never catch exceptions from start_response then, correct? > I think that we can meet this use case without a server-provided exception > class; the server (or middleware) just needs to know that you're starting > an error response, and what the error is. Adding an argument to > start_response seems like a good, clean way to do this, and it looks easy > to use/implement on all sides. What do you think? > I'm beginning to think that re-startability is important. It makes it much less likely that a successful HTTP code is returned when the application actually broke. Given that, I don't see much of an advantage to the exception. -Tony From pje at telecommunity.com Sat Sep 11 22:34:15 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Sep 11 22:33:21 2004 Subject: [Web-SIG] Reviewing WSGI open issues, again... In-Reply-To: <56402.68.122.33.37.1094924335.squirrel@*> References: <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com> <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com> <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com> <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com> <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040911162653.0213fd20@mail.telecommunity.com> At 10:38 AM 9/11/04 -0700, tony@lownds.com wrote: > > OTOH, we could simply have > > an > > optional third argument to start_response: > > > > start_response(status,headers,sys.exc_info()) > > > > the idea being that 'start_response' should reraise the exc_info tuple (or > > some private exception type) if the response has already been started. It > > can also optionally log the error information. > > > > Note that this also allows middleware to trivially intercept error reports > > by overriding start_response. If it decides to handle the error itself, > > the middleware can simply throw an exception that it then catches as the > > app aborts. > > > >That reasonably handles the exception case. Applications and middleware >should never catch exceptions from start_response then, correct? Not if the call to start_response() was made from an error handler, no. But I think it's acceptable to catch errors from normal (2-argument) calls to start_response(). > > I think that we can meet this use case without a server-provided exception > > class; the server (or middleware) just needs to know that you're starting > > an error response, and what the error is. Adding an argument to > > start_response seems like a good, clean way to do this, and it looks easy > > to use/implement on all sides. What do you think? > > > >I'm beginning to think that re-startability is important. It makes it much >less likely that a successful HTTP code is returned when the application >actually broke. Given that, I don't >see much of an advantage to the exception. Good. We'll take the "third argument" approach, then. It's going to expand the PEP quite a bit, but every error handling proposal so far was going to do that. But this one handles your use case without adding much overhead for the more common cases. I can't believe we've managed to get away without having *any* special environ keys for error handling or any custom exception classes (except when you want to do something special, of course). I'm going to try and get all the updates into the PEP this weekend, hopefully before the next hurricane goes by. If it comes too close and we lose power here, I don't expect we'll get it back for a week or two, as there are too many crews still out fixing power outages from the *last* hurricane! Once all the pending updates are in, I think we'll be almost ready to finalize the PEP, and we should plan on another posting to python-list and python-dev, giving a finalization deadline and requesting that all remaining change requests be submitted in the form of a patch. Hm. Actually, that might be premature, since I recall we were planning to get the HTTP/1.1 stuff in order first. Mark, how's that coming along? :) From py-web-sig at xhaus.com Sun Sep 12 18:48:53 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Sun Sep 12 18:43:54 2004 Subject: [Web-SIG] Modjy status. Message-ID: <41447DF5.1060001@xhaus.com> Dear Sig, Just a quick message to let y'all know that modjy is 95% ready, but not the 98% or 99% percent I would like. Thus far 1. The code is pretty stable. I've had it under version control for several days now, and the changes are fewer and fewer. 2. I have pretty much finished the documentation, (including all those nitty gritty little details!) 3. I've tried to make it as friendly to non-J2EE people as possible. But there are still not enough tests. All that's being tested at the moment is cases where the application causes an exception. There are plenty of cases that need to be checked, including many positive ones, e.g. returning a variety of different iterable types to the server. Only when a reasonably comprehensive test suite is passing will I feel totally comfortable with people trying it out. So I'll be finalizing those tests over the next day or two, (after I've had a little rest: been working non-stop on this), and then I'll be happy for people to download modjy and try it out, safe in the knowledge that it's less likely to fall at the first fence, and thus put people off modjy for good. Per ardua ad astra, Alan. From pje at telecommunity.com Mon Sep 13 21:59:54 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 13 21:59:01 2004 Subject: [Web-SIG] PEP 333 Update Message-ID: <5.1.1.6.0.20040913154519.0228d750@mail.telecommunity.com> I'm about to check in a major update to PEP 333; it should be available on the PEPs page within about an hour, and from SF CVS some time thereafter. Here is a summary of the changes: * Added 'wsgi.url_scheme', and updated sections relating thereto (such as the "URL Reconstruction" algorithm) * Replaced the old "Optional Platform-Specific File Handling" section with a new one based on 'wsgi.file_wrapper', and expunged all references in the rest of the PEP that so much as suggest that returning a file or file-like object from an application is something you should ever do. * Significantly expanded the "Error Handling" section, and other sections that relate to the new 'exc_info' parameter to 'start_response()'. * Changed the definition of 'start_response' such that headers are not immediately sent to the client. * Revised the "CGI gateway" example to include error handling and delayed header-sending. * Miscellaneous explanatory clean-ups, such as linking from the specification regarding the use of 'len()' on the returned iterable, to the section of the spec that explains why using 'len()' is sometimes helpful. * Added a (very brief) explanation of why returning an iterable is preferable to using 'write()', if the latter can be avoided, and noted that 'write()' must not be invoked from within the returned iterable. * Removed requirement that status and headers be pure 7-bit ASCII, referring instead to the RFC 2616 definitions. (But left in the no-folding requirement that's specific to the PEP.) * Added notes on using 'environ' to supply an application with limited configuration data * Removed open issues that are now closed; added an open issue for reviewing the currently-required CGI variables, as it may be that some of them don't really need to be required. * Added more kudos for Tony and Alan in the acknowledgements section. We are now getting very close to finalization, I think. There are just two more open issues to cover, plus some possible re-organization for HTTP/1.1-specifc stuff. After that, I think we should post to python-list and python-dev one last time, then finalize the PEP. After that, the semantics would be frozen, and only changes to e.g. the Q&A section, or edits for clarity would be allowed. At that point, framework and server developers can then feel comfortable releasing something and calling it PEP 333-compatible, if in fact it is. :) From pje at telecommunity.com Tue Sep 14 18:59:30 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Sep 14 18:59:22 2004 Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0 Message-ID: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com> I've reviewed last month's Python-Dev discussion about the future Python 'bytes()' type, and the eventual transition away from Python's current 8-bit strings. Mainly, the impression I get is that significant change in this respect really can't happen until Python 3.0, because too many things have to change at once for it to work. So, here's what I propose to do about the open issue in PEP 333. Servers and gateways that run under Python implementations where all strings are Unicode (e.g. Jython) *may*: * accept Unicode statuses and headers, so long as they properly encode them for transmission (latin-1 + RFC 2047) * accept Unicode for response body segments, so long as each segment may be encoded as latin-1 (i.e. only uses chars 0-255) * produce Unicode input headers and body strings by decoding from latin-1, as long as the produced values are considered type 'str' for that Python implementation. I think that these rules allow conformance with the "letter of the law" for the rest of the WSGI spec, since servers, gateways, and applications are still required to use 'str' instances in all of the above cases. The issue here is that non-CPython implementations may be able to place arbitrary Unicode characters in a 'str' instance, so the encoding rules need to be clear. I think this is probably the right thing to do, leaving the adoption of any "byte array" usage to Python 3.0 and WSGI 2.0 or 3.0 or whatever we're on by then. But I am not a Unicode guru, and I'm definitely not familiar with the details of non-CPython 'str' vs. Unicode issues. So, I hope that there are some folks out there (Alan?) who can comment on this. Thanks. From paul.boddie at ementor.no Wed Sep 15 12:33:31 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Wed Sep 15 12:33:35 2004 Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0 Message-ID: <0F4BD34E02639E428B4654DCBAB4502D03E17B@100NOOSLMSG004.common.alpharoot.net> Phillip J. Eby wrote: > > I've reviewed last month's Python-Dev discussion about the future Python > 'bytes()' type, and the eventual transition away from Python's current > 8-bit strings. > > Mainly, the impression I get is that significant change in this respect > really can't happen until Python 3.0, because too many things have to > change at once for it to work. I think there was (and perhaps still is) a runtime option to force Python to treat all strings as Unicode objects. > So, here's what I propose to do about the open issue in PEP 333. Servers > and gateways that run under Python implementations where all strings are > Unicode (e.g. Jython) *may*: > > * accept Unicode statuses and headers, so long as they properly encode > them for transmission (latin-1 + RFC 2047) I think I encode all Unicode objects used in this area as US-ASCII in WebStack. > * accept Unicode for response body segments, so long as each segment may > be encoded as latin-1 (i.e. only uses chars 0-255) It should be possible to be more intelligent about response bodies, but you can argue that it isn't up to something like WSGI to go through the necessary gymnastics to make sure that Unicode objects presented to the response stream become encoded appropriately. > * produce Unicode input headers and body strings by decoding from > latin-1, as long as the produced values are considered type 'str' for that > Python implementation. I think I've left incoming headers as plain strings, but I suppose a similar translation could be performed in WebStack. Paul From py-web-sig at xhaus.com Wed Sep 15 16:28:14 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 15 16:22:59 2004 Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0 In-Reply-To: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com> References: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com> Message-ID: <4148517E.7040701@xhaus.com> [Phillip J. Eby] > I've reviewed last month's Python-Dev discussion about the future > Python 'bytes()' type, and the eventual transition away from Python's > current 8-bit strings. > > Mainly, the impression I get is that significant change in this > respect really can't happen until Python 3.0, because too many > things have to change at once for it to work. > > So, here's what I propose to do about the open issue in PEP 333. > Servers and gateways that run under Python implementations where all > strings are Unicode (e.g. Jython) *may*: Encoding issues? "Oh no", screams Alan, turning tail and sprinting away! ;-) Before starting my response, I just want to point out two things: 1. I'm no bot when it comes to python and character encodings. 2. that the text below may come across a little cold. I've spent a few hours thinking through the issues, checking code, rewriting text, rewriting, rewriting, .... I think the below is the most accurate picture I can present: it won't win any poetry competitions. Before getting into the WSGI parameter encoding issues, just a quick overview of character strings vs. binary strings in jython. Strings in jython: textual vs. binary ===================================== Java stores all textual strings as unicode strings, i.e. sequences of 2-byte characters. These strings can be transcoded to any encoding: when they are so transcoded, that delivers a sequence of bytes. Java keeps the concept of textual unicode strings and byte sequences separate, through the use of (rigidly enforced) method signatures. This ensures both static type correctness and memory efficiency. Jython blends the two concepts, by using java.lang.String's to store both python text strings and python binary strings, i.e. byte arrays. It stores the latter by the trick of only using the lower byte of each two-byte unicode character to store data, leaving the upper byte unused. You can see this by running this code on jython. #-------------------------------------------- s = u'\u00E1\u00E9\u00ED\u00F3\u00FA' u8 = s.encode('utf-8') u16 = s.encode('utf-16') for x in [s, u8, u16]: print "%d:%s:%s" % (len(x), str(type(x)), `x`) #-------------------------------------------- which outputs """ 5:org.python.core.PyString:'\xE1\xE9\xED\xF3\xFA' 10:org.python.core.PyString:'\xC3\xA1\xC3\xA9\xC3\xAD\xC3\xB3\xC3\xBA' 12:org.python.core.PyString:'\xFE\xFF\x00\xE1\x00\xE9\x00\xED\x00\xF3\x00\xFA'""" """ The only way to create binary strings in jython is to create them explicitly, for example, by transcoding text strings as above, or by reading from a byte-oriented stream like a socket, or binary file. These binary strings do not have their encoding metadata associated with them, in common with cpython: the programmer must know the encoding of the byte-array/binary-string they're handling. When these binary strings are created, and stored as textual unicode strings, they look like latin-1 textual strings, since all of the upper-bytes of the characters are zero. So on jython, a binary encoded latin-1 string and a unicode string containing only latin-1 characters are represented identically. In jython, any other time a string is created, by assignment to a string literal ('', "", """ """), or by reading from a text file, text stream, etc, the result is always a textual unicode string. So, on to WSGI [Phillip J. Eby] > * accept Unicode statuses and headers, so long as they properly encode > them for transmission (latin-1 + RFC 2047) String parameters in jython are always passed as unicode strings, containing either textual strings or the binary-string/byte-arrays described above. So the strings received by the jython start_response_callable will be either textual or binary unicode strings. The start_response_callable has to be able to operate on these strings regardless, i.e. transform them using standard python functions, e.g. .split(' '), int(), etc. If these functions fail to operate correctly on a binary string, then there is little the start_response_callable can do, without knowing the encoding of the binary string so that it can decode to a textual string. If the operations fail on a textual string, it is because the string contains invalid data for the operation. Note that this is common with cpython, under which code must also simply assume that .split() and int() will simply work on the string passed, without knowing its encoding. Status ====== So, in the case of the http status value, as long as int(status_str.split(' ')) returns an integer, that's fine. Which should be the case all of the time, as long as what was passed really was a string containing an ascii integer followed by a space. Headers ======= In the case of the header list, both header names and header values could also be passed as either textual or binary strings. There are three scenarios for the content of those strings 1. They are binary strings, i.e. have zero upper-bytes, and are presumably suitable (application knows best) for use as http headers without transformation. 2. They are latin-1 strings, i.e. have zero upper-bytes, and are thus suitable for use as http headers without transformation. 3. They are non latin-1 strings, i.e. have non-zero upper-bytes, and so will have to be encoded before transmission, according to RFC 2047. What jython should do ===================== So any jython middleware, gateway or server that receives a Unicode string for a header value must A: Send it without transformation if all upper-bytes are zero. B: Encode it according to RFC 2047 if there are non-zero upper-bytes, then send it. In the case of B, how should the jython code know which iso-8859-X charset to use for RFC 2047? Is there library code? Is mimify the right module to use? A couple of notes about J2EE ============================ 1. Under J2EE, the HttpServletResponse method signatures specify that a java.lang.String, i.e. 2-byte unicode, value must be given for header names and values (although see next point). 2. The most recent 2.4 version of the servlet specification now permits header strings to be an "octet string ... encoded according to RFC 2047". This was not specified in previous versions of the spec, i.e. 2.3 or 2.2). http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/http/HttpServletResponse.html#addHeader(java.lang.String,%20java.lang.String) 3. Which indicates to me that J2EE expects that you have completely taken care of encoding yourself, i.e. that you will have RFC-2047 encoded your header, if required, before passing it to J2EE. 4. So if a jython start_response_callable receives a binary string, it should simply transmit it directly. If it receives a unicode string with non-zero upper-bytes, it should attempt to encode it in RFC-2047 before transmission. This could be done like so unicode_header = "my value" try: wire_string = unicode_header.encode('latin-1') except UnicodeError: wire_string = encode_in_rfc2047(unicode_header) Standalone pure jython server ============================= When running a standalone pure jython WSGI server, jython code will be writing header values directly to the client socket. In this case, the jython start_response_callable/server needs latin-1/RFC2047 strings to transmit down the socket. The same rules as J2EE above apply to the treatment of strings in this case. So, in regards to the WSGI requirement above, the application *must* transmit Unicode statuses and headers to a jython start_response_callable, which will attempt to appropriately RFC-2047 encode the strings if they contain anything other than latin-1 characters. Which I think completely agrees with your requirement as stated, just with different wording. [Phillip J. Eby] > * accept Unicode for response body segments, so long as each segment > may be encoded as latin-1 (i.e. only uses chars 0-255) I would say "jython servers can *only* accept unicode strings for response body segments", since this is the jython mechanism for passing binary strings. As you (kind-of) specify, the response body segment is not really a latin-1 encoded textual string, it is really a binary string of varying encoding, depending on the application. But treating it as a latin-1 string has the effect of preserving its content as a binary string. So again, I think that this meets with your requirement, except stated differently. If WSGI response bodies "crossed over" somehow from a cpython application to a jython application, through either swig-style linkage or through some form of http relay protocol such as FastCGI, the jython receiving end of that would have to produce a response body encoded as a jython binary string. Which is exactly what jython socket operations, etc, produce. So pure python middleware code that distributes WSGI requests over, say a network socket, should run identically between jython and cpython. Which is nice to know. And which would probably true for IronPython too: That Jim Hugunin is a clever lad. Jython really does all this stuff pretty seamlessly in relation to cpython. [Phillip J. Eby] > * produce Unicode input headers and body strings by decoding from > latin-1, as long as the produced values are considered type 'str' for > that Python implementation. On jython, there is no point in decoding latin-1 strings to unicode strings, because their representations are identical: both are types.StringType, both take 2 bytes per character/byte, with the upper byte as zero. If the recipient is another jython component, all string types will be received correctly. If the recipient is a cpython component, then it will still receive the correct string, because whatever interface lies between the cpython and the jython will have correctly converted the data (if it was latin-1 data). So perhaps this requirement could be stated as "jython components/applications must produce unicode input headers and body strings, which must only contain latin-1 characters"? Whew! That turned out to be not so bad after all! (Alan crosses his fingers behind his back :-) Regards, Alan. From pje at telecommunity.com Wed Sep 15 18:01:25 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 15 18:02:11 2004 Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0 In-Reply-To: <0F4BD34E02639E428B4654DCBAB4502D03E17B@100NOOSLMSG004.comm on.alpharoot.net> Message-ID: <5.1.1.6.0.20040915115842.033643b0@mail.telecommunity.com> At 12:33 PM 9/15/04 +0200, Paul Boddie wrote: >Phillip J. Eby wrote: > > So, here's what I propose to do about the open issue in PEP 333. >Servers > > and gateways that run under Python implementations where all strings >are > > Unicode (e.g. Jython) *may*: > > > > * accept Unicode statuses and headers, so long as they properly >encode > > them for transmission (latin-1 + RFC 2047) > >I think I encode all Unicode objects used in this area as US-ASCII in >WebStack. > > > * accept Unicode for response body segments, so long as each segment >may > > be encoded as latin-1 (i.e. only uses chars 0-255) > >It should be possible to be more intelligent about response bodies, but >you >can argue that it isn't up to something like WSGI to go through the >necessary gymnastics to make sure that Unicode objects presented to the >response stream become encoded appropriately. > > > * produce Unicode input headers and body strings by decoding from > > latin-1, as long as the produced values are considered type 'str' for >that > > Python implementation. > >I think I've left incoming headers as plain strings, but I suppose a >similar >translation could be performed in WebStack. You only need to worry about these things in WebStack if it's running under conditions where 'str' objects may contain any Unicode character. Currently that's only Jython, and maybe IronPython. As far as I know, CPython's -U option is broken; that is, not all of the Python stdlib works correctly with Unicode 'str' objects, so for the time being it's unlikely you'll need to worry about any of this under CPython. From pje at telecommunity.com Wed Sep 15 18:23:41 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 15 18:24:29 2004 Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0 In-Reply-To: <4148517E.7040701@xhaus.com> References: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com> <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040915120132.03364c30@mail.telecommunity.com> At 03:28 PM 9/15/04 +0100, Alan Kennedy wrote: >String parameters in jython are always passed as unicode strings, >containing either textual strings or the binary-string/byte-arrays >described above. So the strings received by the jython >start_response_callable will be either textual or binary unicode strings. > >The start_response_callable has to be able to operate on these strings >regardless, i.e. transform them using standard python functions, e.g. >.split(' '), int(), etc. If these functions fail to operate correctly on a >binary string, then there is little the start_response_callable can do, >without knowing the encoding of the binary string so that it can decode to >a textual string. If the operations fail on a textual string, it is >because the string contains invalid data for the operation. The point here is that a Jython WSGI server should either invoke '.encode("latin1")' on all strings supplied to it (whether in 'start_response()', 'write()', or yielded by the iterable), or otherwise verify that there are either no non-latin1 characters, or (optionally) transcode any non-latin1 characters as prescribed by RFC 2047 (status/headers only). It should be a fatal error to send a non-latin1 string to 'write()' or yield one from the iterable, however. >What jython should do >===================== > >So any jython middleware, gateway or server that receives a Unicode string >for a header value must > >A: Send it without transformation if all upper-bytes are zero. >B: Encode it according to RFC 2047 if there are non-zero upper-bytes, then >send it. > >In the case of B, how should the jython code know which iso-8859-X charset >to use for RFC 2047? Is there library code? Is mimify the right module to use? Actually, 'B' is optional. (Note that my proposal said a server *may* accept Unicode, not that it was required to do so.) It is also perfectly valid for a server or gateway to reject Unicode that can't be rendered as latin1. In other words, only 'A' is required. That's because applications are already required to do their own latin1/RFC 2047 encoding. But after looking at all of your comments and thinking this over a bit, I'm thinking that there's a simpler way to specify the intent of my proposal; something like: """On Python platforms where the 'str' or 'StringType' type is Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all strings must contain only characters representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). It should be considered a fatal error for an application to supply strings containing any other Unicode character, whether the string is in the 'headers', the 'status', supplied to 'write()', or is produced by the application's returned iterable.""" Adding this to the current "Unicode" section would suffice, I think, to deal with the current and future platforms in a cleanly compatible way. It also makes it clear that there is no additional burden on either the server/gateway or application sides: it's just a clarification of what it means to be a 'str' for WSGI's purposes. From pje at telecommunity.com Wed Sep 15 19:00:36 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 15 19:01:29 2004 Subject: [Web-SIG] Loosening the CGI variable requirements in PEP 333 Message-ID: <5.1.1.6.0.20040915125541.0293c6e0@mail.telecommunity.com> Currently, the requirement for CGI variables reads like this: """``environ`` Variables --------------------- The ``environ`` dictionary is required to contain these CGI environment variables, as defined by the Common Gateway Interface specification [2]_. The following variables **must** be present, but **may** be an empty string, if there is no more appropriate value for them:""" I'd like to change that last sentence to: """The following variables **must** be present (unless their value would be an empty string, in which case they may be omitted):""" This means that other parts of the spec would need to use e.g. 'environ.get("PATH_INFO","")'. But, I think this change will make it a little bit easier on servers or gateways that already have some sort of CGI basis or support, without substantially affecting anything else. Comments, anyone? (By the way, as far as I can tell, this is the very last open issue for PEP 333, so once this one's decided, I think it's time to begin the finalization process.) From py-web-sig at xhaus.com Wed Sep 15 20:56:25 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Wed Sep 15 20:52:11 2004 Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0 In-Reply-To: <5.1.1.6.0.20040915120132.03364c30@mail.telecommunity.com> References: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com> <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com> <5.1.1.6.0.20040915120132.03364c30@mail.telecommunity.com> Message-ID: <41489059.4090904@xhaus.com> [Phillip J. Eby] > But after looking at all of your comments and thinking this over a > bit, I'm thinking that there's a simpler way to specify the intent > of my proposal; something like: > > """On Python platforms where the 'str' or 'StringType' type is > Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all > strings must contain only characters representable in ISO-8859-1 > encoding (\u0000 through \u00FF, inclusive). It should be considered > a fatal error for an application to supply strings containing any > other Unicode character, whether the string is in the 'headers', the > 'status', supplied to 'write()', or is produced by the application's > returned iterable.""" Great: Says it all, in a neat and concise way. Nice job! +1 Regards, Alan. From floydophone at gmail.com Thu Sep 16 00:48:11 2004 From: floydophone at gmail.com (Peter Hunt) Date: Thu Sep 16 00:48:21 2004 Subject: [Web-SIG] WSGI woes Message-ID: <6654eac40409151548295fd2d9@mail.gmail.com> It looks like WSGI is not well received over at twisted.web. http://twistedmatrix.com/pipermail/twisted-web/2004-September/000644.html I thought the blocking call was handled by the iterator, but maybe I'm wrong. From pje at telecommunity.com Thu Sep 16 01:12:49 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 01:14:26 2004 Subject: [Web-SIG] WSGI woes In-Reply-To: <6654eac40409151548295fd2d9@mail.gmail.com> Message-ID: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> At 06:48 PM 9/15/04 -0400, Peter Hunt wrote: >It looks like WSGI is not well received over at twisted.web. > >http://twistedmatrix.com/pipermail/twisted-web/2004-September/000644.html Excerpting from that post: """The WSGI spec is unsuitable for use with asynchronous servers and applications. Basically, once the application callable returns, the server (or "gateway" as wsgi calls it) must consider the page finished rendering.""" This is incorrect. Here is a simple WSGI application that demonstrates yielding 50 data blocks for transmission *after* the "application callable returns". def an_application(environ, start_response): start_response("200 OK", [('Content-Type','text/plain')]) for i in range(1,51): yield "Block %d" % i This has been a valid WSGI application since the August 8th posting of the WSGI pre-PEP. It may be, however, that Mr. Preston means that applications which want to use 'write()' or a similar push-oriented approach to produce data cannot do so after the application returns. If so, we should discuss that use case further, preferably on the Web-SIG. >I thought the blocking call was handled by the iterator, but maybe I'm wrong. I'm not sure what you mean, but if you're asking whether the iterable is allowed to create output blocks after the application callable returns, then yes. From floydophone at gmail.com Thu Sep 16 05:06:04 2004 From: floydophone at gmail.com (Peter Hunt) Date: Thu Sep 16 05:06:11 2004 Subject: [Web-SIG] WSGI - alternate ideas, part II Message-ID: <6654eac4040915200673ed116e@mail.gmail.com> I know we've come a long way fleshing out WSGI, so remember, these are just ideas. I'm not saying we should trash what we have, but I just wanted to throw this out there. I've been programming my own "web development kit", that is, a platform (i.e. cgi, fastcgi, mod_python) independent templating and controller system. Basically, it, along with lots of other efforts, simply require a standard "request" and "response" object. In addition, I think the application should call the gateway, instead of the other way around. I also propose that the API be simple and use as much standard, prewritten code as possible. Finally, it should be extensible, such that we don't load, say, sessions if they aren't needed for a certain application. Thus, here is my "WSGI-X" proposal. The application will call the gateway, opposite of WSGI. For example, a CGI WSGI-X application may begin with: #!/usr/bin/env python if __name__ == "__main__": from wsgix import cgi req = cgi.get_request() The req object is the core of the interface. In essence, it's extremely simple. The Request class has four attributes: fs - an object which mimics cgi.FieldStorage environ - a dictionary corresponding to the CGI environment stdout - the raw, unbuffered direct output stream to the client finish_hooks - list or iterable of functions that are called when finish() is called It also declares one method, which may or may not be needed to be overridden by subclasses specific to the gateway: finish() - finish the request Now we have a basic interface to interact with HTTP. If one wants to write an extension to provide services like simplified cookie handling, sessions, or buffered headers and content, they write an extension function. A simple one for cookies would look like: def cookie_extension(req): if not hasattr(req, "cookie"): req.cookie = Cookie.SimpleCookie(req.environ.get("HTTP_COOKIE","")) It modifies the request object if it hasn't already been modified. This saves us a bit of overhead so we won't need to parse the cookie again in case it is called twice (as it will if other extensions depend on it). Finish hooks have the same signature and execute when the finish() method is called. For example, a buffering extension would flush the buffer. Extensions can also add methods to the request object, for items such as add_header(). There's my proposal. Tear it apart :) I'm going to post some example code tomorrow or the day after, most likely. From pje at telecommunity.com Thu Sep 16 05:27:18 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 05:26:28 2004 Subject: [Web-SIG] WSGI - alternate ideas, part II In-Reply-To: <6654eac4040915200673ed116e@mail.gmail.com> Message-ID: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com> At 11:06 PM 9/15/04 -0400, Peter Hunt wrote: >Thus, here is my "WSGI-X" proposal. The application will call the >gateway, opposite of WSGI. For example, a CGI WSGI-X application may >begin with: > >#!/usr/bin/env python >if __name__ == "__main__": > from wsgix import cgi > req = cgi.get_request() That's pretty hard to implement correctly in any number of servers. Really, pretty much every server wants to call the application, rather than the other way around, because servers want to use their own event loop. >stdout - the raw, unbuffered direct output stream to the client So, header parsing is required? Or are only 'nph-' CGI scripts allowed? >Now we have a basic interface to interact with HTTP. If one wants to >write an extension to provide services like simplified cookie >handling, sessions, or buffered headers and content, they write an >extension function. A simple one for cookies would look like: > >def cookie_extension(req): > if not hasattr(req, "cookie"): > req.cookie = > Cookie.SimpleCookie(req.environ.get("HTTP_COOKIE","")) Note that this can easily be accomplished in WSGI, by changing 'hasattr(req,"cookie")' to '"my_extension.cookie" in environ' and 'req.cookie' to 'environ["my_extension.cookie"]'. >It modifies the request object if it hasn't already been modified. >This saves us a bit of overhead so we won't need to parse the cookie >again in case it is called twice (as it will if other extensions >depend on it). Also achievable within 'environ'. >Finish hooks have the same signature and execute when >the finish() method is called. For example, a buffering extension >would flush the buffer. Extensions can also add methods to the request >object, for items such as add_header(). Under WSGI, such "finish" hooks can be rendered as a 'close()' method on an iterable by a piece of middleware. From dp at ulaluma.com Thu Sep 16 07:13:52 2004 From: dp at ulaluma.com (Donovan Preston) Date: Thu Sep 16 07:14:18 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> Message-ID: <335B4CE1-079F-11D9-A6FD-000A95864FC4@ulaluma.com> On Sep 15, 2004, at 7:12 PM, Phillip J. Eby wrote: > At 06:48 PM 9/15/04 -0400, Peter Hunt wrote: >> It looks like WSGI is not well received over at twisted.web. >> >> http://twistedmatrix.com/pipermail/twisted-web/2004-September/ >> 000644.html > > Excerpting from that post: > > """The WSGI spec is unsuitable for use with asynchronous servers and > applications. Basically, once the application callable returns, the > server (or "gateway" as wsgi calls it) must consider the page finished > rendering.""" > > This is incorrect. As I said in my original post, I hadn't mentioned anything about this yet because I didn't have a solution or proposal to fix the problem, which I maintain remains. I will attempt to suggest solutions, but I am unsure whether they will work or make sense in all environments. Allow me to explain: > Here is a simple WSGI application that demonstrates yielding 50 data > blocks for transmission *after* the "application callable returns". > > def an_application(environ, start_response): > start_response("200 OK", [('Content-Type','text/plain')]) > for i in range(1,51): > yield "Block %d" % i > > This has been a valid WSGI application since the August 8th posting of > the WSGI pre-PEP. According to the spec, """The application object must return an iterable yielding strings.""" Whether the application callable calls write before returning or yields strings to generate content, the effect is the same -- there is no way for the application callable to say "Wait, hang on a second, I'm not ready to generate more content yet. I'll tell you when I am." This means the only way the application can pause for network activity is by blocking. For example, a page which performed an XML-RPC call and transformed the output into HTML would be required to perform the XML-RPC call synchronously. Or a page which initiated a telnet session and streamed the results into a web page would be required to perform reads on the socket synchronously. The server or gateway, by calling next(), is assuming that the call will yield a string value, and only a string value. Of course, Twisted has a canonical way of indicating that a result is not yet ready, the Deferred. An asynchronous application could yield a Deferred and an asynchronous server would attach a callback to this Deferred which invoked the next() method upon resolution. This is how Nevow handles Deferreds (in Nevow SVN head at nevow.flat.twist.deferflatten). However, the WSGI spec says nothing about Deferred and indeed, Deferred would be useless in the case of another asynchronous server such as Medusa. I would suggest that WSGI include a simple Deferred implementation, but WSGI is simply a spec which is not intended to have any actual code. Thus, one solution would be for the WSGI spec to be amended to state: """The application object must return an iterable yielding strings or objects implementing the following interface: def addCallback(callable): '''Add 'callable' to the list of callables to be invoked when a string is available. Callable should take a single argument, which will be a string.''' The application object must invoke the callable passed to addCallback, passing a string which will be written to the request. """ This places additional burdens upon implementors of WSGI servers or gateways. In the case of a threaded HTTP server which uses blocking writes, implementing support for these promises would have to look something like this: import Queue def handle_request(inSocket, outSocket): ... read inSocket, parse the request and dispatch ... iterable = application(environ, start_response) try: while True: val = iterable.next() if isinstance(val, str): outSocket.write(val) else: result = Queue.Queue() val.addCallback(result.put) outSocket.write(result.get()) except StopIteration: outSocket.close() > It may be, however, that Mr. Preston means that applications which > want to use 'write()' or a similar push-oriented approach to produce > data cannot do so after the application returns. If so, we should > discuss that use case further, preferably on the Web-SIG. And now we come to my other half-baked proposal. Instead of merely returning a write callable, start_response could return a tuple of (write, finish) callables. The application would be free to call write at any time until it calls finish, at which point calling either callable becomes illegal. Again, the synchronous server support for this would have to use spin locking in a fashion such as this: import threading def handle_request(inSocket, outSocket): ... read request, dispatch ... finished = threading.Semaphore() def start_response(...): ... write headers ... return outSocket.write, finished.release iterable = application(environ, start_response) if iterable is None: finished.acquire() # Once we get here, the application is done with the request. Finally, we come to the task of implementing a server or gateway which can asynchronously support either asynchronous or blocking applications. Since there is no way for the server or gateway to know whether the application object it is about to invoke will block, starving the main loop and preventing network activity from being serviced, it must invoke all applications in a new thread or process. A solution to this would be to require application callables to provide additional metadata, perhaps via function or object attributes, which indicate whether they are capable of running in asynchronous, threaded, or multiprocess environments. Since it's getting late and this message is getting long I will leave this discussion for another day. dp From pje at telecommunity.com Thu Sep 16 08:37:06 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 08:36:05 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <335B4CE1-079F-11D9-A6FD-000A95864FC4@ulaluma.com> References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> At 01:13 AM 9/16/04 -0400, Donovan Preston wrote: >On Sep 15, 2004, at 7:12 PM, Phillip J. Eby wrote: > >>At 06:48 PM 9/15/04 -0400, Peter Hunt wrote: >>>It looks like WSGI is not well received over at twisted.web. >>> >>>http://twistedmatrix.com/pipermail/twisted-web/2004-September/ 000644.html >> >>Excerpting from that post: >> >>"""The WSGI spec is unsuitable for use with asynchronous servers and >>applications. Basically, once the application callable returns, the >>server (or "gateway" as wsgi calls it) must consider the page finished >>rendering.""" >> >>This is incorrect. > >As I said in my original post, I hadn't mentioned anything about this >yet because I didn't have a solution or proposal to fix the problem, >which I maintain remains. Reading the rest of your post, I see that you are actually addressing the issue of asynchronous *applications*, and I have only been addressing asynchronous *servers* in the spec to date. (Technically "half-async" servers, since to be properly portable, a WSGI server *must* support synchronous applications, and therefore an async WSGI server must have a thread pool for running applications, even if it contains only one thread.) However, I'm not certain that it's actually possible to support *portable* asynchronous applications under WSGI, since such asynchrony requires additional support such as an event loop service. As a practical matter, asynchronous applications today require a toolset such as Twisted or peak.events in addition to the web server, and I don't really know of a way to make such applications portable across web servers, since the web server might use a different toolset that insists on having its own event loop. Or it might be like mod_python or CGI, and not really have any meaningful way to create an event loop: it could be utterly synchronous in nature and impossible to make otherwise. Thus, as a practical matter, applications that make use of asynchronous I/O *may* be effectively outside WSGI's scope, if they have no real chance of portability. As I once said on the Web-SIG, the idea of WSGI is more aimed at allowing non-Twisted apps to run under a Twisted web server, than at allowing Twisted applications to run under other web servers! The latter, obviously, is much more ambitious than the former. But I'm happy to nonetheless explore whether there is any way to support such applications without unduly complicating middleware. I don't expect it would complicate servers much, but middleware can be quite difficult, because middleware currently isn't even required to return when the application does! It's not recommended, but a middleware component can sit there and iterate over the return value and call its parent's write() method all it wants. In the presence of this kind of behavior, there isn't any real way to guarantee that a thread isn't going to be tied up with processing. But realistically, that's what an async server's thread pool is *for*. Anyway, as you'll see below, WSGI can actually run async apps with minimal blocking even without any modifications to the spec, and with server-specific extensions you can eliminate *all* the blocking, as long as middleware doesn't do anything pathological. In practice, of course, I think the spec *should* be updated so that middleware is prohibited from interfering with the control flow, and I'll give some thought as to how that should be phrased. >According to the spec, """The application object must return an >iterable yielding strings.""" Whether the application callable calls >write before returning or yields strings to generate content, the >effect is the same -- there is no way for the application callable to >say "Wait, hang on a second, I'm not ready to generate more content >yet. I'll tell you when I am." This means the only way the application >can pause for network activity is by blocking. That is correct. The application must block for such activities. However, as a practical matter, this isn't a problem for e.g. database access, since using Twisted's adbapi would still tie up *some* thread with the exact same blocking I/O, so there's actually no loss in simply doing unadorned DBAPI access from within the application. > For example, a page >which performed an XML-RPC call and transformed the output into HTML >would be required to perform the XML-RPC call synchronously. Or a page >which initiated a telnet session and streamed the results into a web >page would be required to perform reads on the socket synchronously. Technically, it could perform these tasks asynchronously, as long as the data were queued such that the application's return iterable simply retrieved results from the queue. However, this would naturally block whenever the client was ready for I/O, but no results were available yet. However, an asynchronous server isn't going to sit there in a loop calling next()! Presumably, it's going to wait until the previous string gets sent to the client, before calling next() again. And, it's presumably going to round-robin the active iterables through the threadpool, so that it doesn't keep blocking on iterables that aren't likely to have any data to produce as yet. Yes, this arrangement can still block threads sometimes, if there are only a few iterables active and they are waiting for some very slow async I/O. But the frequency of such blockages can be further reduced with a couple of extensions. Suppose there was an 'environ["async.sleep"]' and 'environ["async.wake"]'. A call to 'sleep' would mean, "don't bother iterating over me again until you get a 'wake' call". This *still* wouldn't prevent some item of middleware from hogging a thread in the threadpool, but I suppose you could actually make the 'sleep' function sit in a loop and run active iterables' next() methods until one of the suspended iterables in the current thread "wakes", at which point it would return control to whatever iterable it was called from. Or, if you want to use Greenlets, you can always return control directly to the iterable that needs to "wake up". Anyway, my point here is that it's possible to get a pretty decent setup for async applications, without any need to actually modify the base WSGI spec. And, if you add some optional extensions, you can get an even smoother setup for async I/O. Finally, I'm open to trying to define the 'sleep/wake' facilities as "standard options" in WSGI, as well as clarifying the middleware control flow to support this better. >The server or gateway, by calling next(), is assuming that the call >will yield a string value, and only a string value. The spec doesn't rule out empty strings, however, which would be the natural way to indicate that no data is available. So, the protocol in an async app's iterator would be something like: while queue.empty(): if 'async.wake' in environ: someDeferred.addCallback(environ['async.wake']) environ['async.sleep']() yield "" # We should only get to this line once environ['async.wake'] has been called else: yield "" # delay an exponentially increasing period if queue is still empty If middleware is required to match the control flow of the application it wraps (e.g. write()=>write(), yield=>yield), then this would result in complete non-blockingness when the server supports the 'async' extensions. Of course, a blocking delay *is* required when running in a server that doesn't support the async extensions, but that's unavoidable in that case. (Technically, you might be better off just doing synchronous I/O if you're being run in a synchronous server, but that's of course optional.) >"""The application object must return an iterable yielding strings or >objects implementing the following interface: > >def addCallback(callable): > '''Add 'callable' to the list of callables to be invoked when a > string > is available. Callable should take a single argument, which will > be a >string.''' > >The application object must invoke the callable passed to addCallback, >passing a string which will be written to the request. >""" > >This places additional burdens upon implementors of WSGI servers or >gateways. And a near-intolerable burden on middleware, which would have to have a way to "pass through" this facility. It would be much better to limit the pass-through requirements to covering write and yield, rather than requiring middleware to implement addCallback facilities as well. >Finally, we come to the task of implementing a server or gateway which >can asynchronously support either asynchronous or blocking >applications. Since there is no way for the server or gateway to know >whether the application object it is about to invoke will block, >starving the main loop and preventing network activity from being >serviced, it must invoke all applications in a new thread or process. But *some* thread is going to be working on it, and this is true whether you use a thread pool or the server is purely synchronous. And, because a WSGI server *must* support synchronous applications, it *must* have some thread available that is amenable to blocking. Of course "new" threads are not required. I assume that in the case of Twisted, something like reactor.deferToThread() will be used to wrap a WSGI application's initial invocation, and each individual 'next()' call. From wilk-ml at flibuste.net Thu Sep 16 10:58:34 2004 From: wilk-ml at flibuste.net (William Dode) Date: Thu Sep 16 10:58:36 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> (Phillip J. Eby's message of "Thu, 16 Sep 2004 02:37:06 -0400") References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> Message-ID: <874qly3805.fsf@blakie.riol> "Phillip J. Eby" writes: > At 01:13 AM 9/16/04 -0400, Donovan Preston wrote: > >>On Sep 15, 2004, at 7:12 PM, Phillip J. Eby wrote: >> >>>At 06:48 PM 9/15/04 -0400, Peter Hunt wrote: >>>>It looks like WSGI is not well received over at twisted.web. >>>> >>>>http://twistedmatrix.com/pipermail/twisted-web/2004-September/ 000644.html >>> >>>Excerpting from that post: >>> >>>"""The WSGI spec is unsuitable for use with asynchronous servers and >>>applications. Basically, once the application callable returns, the >>>server (or "gateway" as wsgi calls it) must consider the page finished >>>rendering.""" >>> >>>This is incorrect. >> >>As I said in my original post, I hadn't mentioned anything about this >>yet because I didn't have a solution or proposal to fix the problem, >>which I maintain remains. > > Reading the rest of your post, I see that you are actually addressing > the issue of asynchronous *applications*, and I have only been > addressing asynchronous *servers* in the spec to date. (Technically > "half-async" servers, since to be properly portable, a WSGI server > *must* support synchronous applications, and therefore an async WSGI > server must have a thread pool for running applications, even if it > contains only one thread.) > > However, I'm not certain that it's actually possible to support > *portable* asynchronous applications under WSGI, since such > asynchrony requires additional support such as an event loop service. Like others, i did my litle framework who can work on top of twisted, cgi or BaseHTTPServer. So it's possible ;-) But it doesn't mean that i whant to run my application on any server. Generaly i use twisted server when i have specials need, like telnet, irc... So this application will not run under cgi. But i like to can reuse quickly somes litle cgi application under twisted. I need the same framework for all the servers to can share 90% of my api, to map the url to a resource, for session, cookies... So, i hope we can find a solution to run simple application anywhere, and to be open for very specific uses. Sorry, because of my poor english, i cannot help a lot in the discussion... -- William Dod? - http://flibuste.net From floydophone at gmail.com Thu Sep 16 14:02:38 2004 From: floydophone at gmail.com (Peter Hunt) Date: Thu Sep 16 14:02:44 2004 Subject: [Web-SIG] WSGI - alternate ideas, part II In-Reply-To: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com> References: <6654eac4040915200673ed116e@mail.gmail.com> <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com> Message-ID: <6654eac4040916050254f7297f@mail.gmail.com> I think that the application should be passed a finish() method as a parameter or start_response return value. If the WSGI application is not a generator and returns wsgi.NOT_DONE_YET (similar to Twisted.web's NOT_DONE_YET), it is required to call finish(). Otherwise, the gateway will call finish() after the generator is finished or a string value is returned. That way, one could do all of the deferred calls they want, and simply return NOT_DONE_YET and call finish(). How does that sound? On Wed, 15 Sep 2004 23:27:18 -0400, Phillip J. Eby wrote: > At 11:06 PM 9/15/04 -0400, Peter Hunt wrote: > > >Thus, here is my "WSGI-X" proposal. The application will call the > >gateway, opposite of WSGI. For example, a CGI WSGI-X application may > >begin with: > > > >#!/usr/bin/env python > >if __name__ == "__main__": > > from wsgix import cgi > > req = cgi.get_request() > > That's pretty hard to implement correctly in any number of > servers. Really, pretty much every server wants to call the application, > rather than the other way around, because servers want to use their own > event loop. > > > >stdout - the raw, unbuffered direct output stream to the client > > So, header parsing is required? Or are only 'nph-' CGI scripts allowed? > > > >Now we have a basic interface to interact with HTTP. If one wants to > >write an extension to provide services like simplified cookie > >handling, sessions, or buffered headers and content, they write an > >extension function. A simple one for cookies would look like: > > > >def cookie_extension(req): > > if not hasattr(req, "cookie"): > > req.cookie = > > Cookie.SimpleCookie(req.environ.get("HTTP_COOKIE","")) > > Note that this can easily be accomplished in WSGI, by changing > 'hasattr(req,"cookie")' to '"my_extension.cookie" in environ' and > 'req.cookie' to 'environ["my_extension.cookie"]'. > > > >It modifies the request object if it hasn't already been modified. > >This saves us a bit of overhead so we won't need to parse the cookie > >again in case it is called twice (as it will if other extensions > >depend on it). > > Also achievable within 'environ'. > > > >Finish hooks have the same signature and execute when > >the finish() method is called. For example, a buffering extension > >would flush the buffer. Extensions can also add methods to the request > >object, for items such as add_header(). > > Under WSGI, such "finish" hooks can be rendered as a 'close()' method on an > iterable by a piece of middleware. > > From neel at mediapulse.com Thu Sep 16 15:41:07 2004 From: neel at mediapulse.com (Michael C. Neel) Date: Thu Sep 16 15:40:48 2004 Subject: [Web-SIG] WSGI - alternate ideas, part II In-Reply-To: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com> References: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com> Message-ID: <1095342067.30862.9.camel@mike.mediapulse.com> On Wed, 2004-09-15 at 23:27, Phillip J. Eby wrote: > At 11:06 PM 9/15/04 -0400, Peter Hunt wrote: > > >Thus, here is my "WSGI-X" proposal. The application will call the > >gateway, opposite of WSGI. For example, a CGI WSGI-X application may > >begin with: > > > >#!/usr/bin/env python > >if __name__ == "__main__": > > from wsgix import cgi > > req = cgi.get_request() > > That's pretty hard to implement correctly in any number of > servers. Really, pretty much every server wants to call the application, > rather than the other way around, because servers want to use their own > event loop. To insert my highly unqualified 2 cents; this is simialr to the way SnakeSkin/Albatross work: import snakeskin from snakeskin.cgiapp import Request app = snakeskin.SimpleApp(...) app.run(Request()) .... which allows me a chance to do something like: app = snakeskin.SimpleApp(...) myReq = Request() myReq.custom_data = {...} app.run(myReq) changing to mod_python, chane line two to from snakeskin.apacheapp import Request the rest is the same. I don't think there is an issue with the current wsgi where app is callable; calling the object would just imply a: from snakeskin.wsgiapp import Request Mike From py-web-sig at xhaus.com Thu Sep 16 16:59:15 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 16 16:54:40 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> Message-ID: <4149AA43.6000803@xhaus.com> [Phillip J. Eby] > However, an asynchronous server isn't going to sit there in a loop > calling next()! Presumably, it's going to wait until the previous > string gets sent to the client, before calling next() again. And, > it's presumably going to round-robin the active iterables through the > threadpool, so that it doesn't keep blocking on iterables that aren't > likely to have any data to produce as yet. > > Yes, this arrangement can still block threads sometimes, if there are > only a few iterables active and they are waiting for some very slow > async I/O. But the frequency of such blockages can be further reduced > with a couple of extensions. Suppose there was an > 'environ["async.sleep"]' and 'environ["async.wake"]'. A call to > 'sleep' would mean, "don't bother iterating over me again until you > get a 'wake' call". and > Anyway, my point here is that it's possible to get a pretty decent > setup for async applications, without any need to actually modify the > base WSGI spec. And, if you add some optional extensions, you can get > an even smoother setup for async I/O. > > Finally, I'm open to trying to define the 'sleep/wake' facilities as > "standard options" in WSGI, as well as clarifying the middleware > control flow to support this better. What would be really nice would be if there were some way for the application to return, to event-based servers or gateways, an object that could be included in the server's event loop, e.g. its select/poll loop. For example, if an application were waiting on return data from a database, through a network socket, it could return that database-connection-socket descriptor to the server. The server would then check for activity on the database socket in its event loop, i.e. select.poll.POLLIN. When this event, i.e. database data, appears, the server can have *reasonable* confidence that a call to the applications iterator will then yield data. Of course, it is not guaranteed that the application will have data available (e.g. the database socket contains half the data required by the app, or the database connection is shared between multiple apps). But it's better than the application blocking. But I can't think of any unified way to generalise this solution to non-descriptor based event loops or applications. For example, what if the application is waiting for data on a Queue.Queue? Or a threading.Event? How could the application enable the server to check for the Queue.Queue or threading.Event it awaits? Perhaps the server could maintain an extra event loop for checking such threaded event notification mechanisms? Or it could associate an "app ready" flag with each client connection? It could go something like this:- 1. The application returns to the server an instance of a class that indicates it will only generate content when a thread notification primitive is set. Or perhaps the thread notification primitive has an optional attribute of the returned iterable, e.g. if hasattr(iterable, 'ready_to_go'): etc 2. The server adds this thread notification primitive to its lists/"event loop", or associates the notification primitive with the descriptor for the incoming/outgoing client socket. 3. When the client socket becomes ready for output, the server checks the ready_to_go flag on the application. If the flag is not set, it simply passes over that individual socket to the next. 4. When the client socket is ready to consume output *and* the application is ready to produce output, i.e. it's ready flag is set, the server gets the data from the app's iterator and transmits it down the client socket. The server could conceivably loop until either the client socket is full or the application iterator is empty, and then just suspend that client/application pair. Or it could spin that app->client transfer into a separate dedicated thread. I don't like the idea of adding callbacks to WSGI: that's too twisted specific. I can picture, for example, a very simple coroutine based async server that would not need to have callbacks. Instead, they would simply yield a NO-OP state to the server/scheduler/dispatcher, indicating they have no data ready right now. And, of course, that's what we're really discussing here: server scheduling, and how servers ensure that application output gets transmitted to clients with maximum efficiency and timeliness. IMHO, asynchronous server scheduling algorithms and concerns have no place in core WSGI, although a well-designed optional extension to support effiency might have a nice unification effect on python asynchronous server architectures. Just my ?0,02 Regards, Alan. From pje at telecommunity.com Thu Sep 16 17:14:47 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 17:13:44 2004 Subject: [Web-SIG] WSGI - alternate ideas, part II In-Reply-To: <6654eac4040916050254f7297f@mail.gmail.com> References: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com> <6654eac4040915200673ed116e@mail.gmail.com> <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916111212.021585b0@mail.telecommunity.com> At 08:02 AM 9/16/04 -0400, Peter Hunt wrote: >I think that the application should be passed a finish() method as a >parameter or start_response return value. If the WSGI application is >not a generator and returns wsgi.NOT_DONE_YET (similar to >Twisted.web's NOT_DONE_YET), it is required to call finish(). >Otherwise, the gateway will call finish() after the generator is >finished or a string value is returned. > >That way, one could do all of the deferred calls they want, and simply >return NOT_DONE_YET and call finish(). > >How does that sound? Way too complicated in the general case. I'd prefer a solution that doesn't excessively complicate middleware or synchronous servers, just to support asynchronous applications that are unlikely to be portable anyway. From pje at telecommunity.com Thu Sep 16 17:22:36 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 17:21:33 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <4149AA43.6000803@xhaus.com> References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> At 03:59 PM 9/16/04 +0100, Alan Kennedy wrote: >And, of course, that's what we're really discussing here: server >scheduling, and how servers ensure that application output gets >transmitted to clients with maximum efficiency and timeliness. IMHO, >asynchronous server scheduling algorithms and concerns have no place in >core WSGI, although a well-designed optional extension to support effiency >might have a nice unification effect on python asynchronous server >architectures. Right. I'd encourage people to experiment with async extensions like my sleep/wake idea, and if there's sufficient consensus we could add a "standard extension" to the spec. But I don't want to disturb the write()+iterable model, since that allows middleware to be mostly oblivious to the sync/async issue, and only apps or servers that care have to deal with it. While asynchronous servers are fairly common, most existing asynchronous applications are going to be tied to a particular async server architecture no matter what we do in WSGI. From pje at telecommunity.com Thu Sep 16 17:27:44 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 17:26:41 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <874qly3805.fsf@blakie.riol> References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916112253.026bdd30@mail.telecommunity.com> At 10:58 AM 9/16/04 +0200, William Dode wrote: >But it doesn't mean that i whant to run my application on any >server. Generaly i use twisted server when i have specials need, like >telnet, irc... So this application will not run under cgi. But i like >to can reuse quickly somes litle cgi application under twisted. >I need the same framework for all the servers to can share 90% of my >api, to map the url to a resource, for session, cookies... > >So, i hope we can find a solution to run simple application anywhere, >and to be open for very specific uses. As I said, WSGI should let any WSGI application run under more sophisticated architectures like Twisted; it's just that an application that uses Twisted-specific features isn't going to be able to move to a server that's not Twisted-compatible. And, if you're using Twisted-specific features in a WSGI app (as opposed to just writing a pure Twisted app), you'll have some additional work needed to deal with the asynchrony. However, the only reason I can think of why you'd want to make such an application use the WSGI interface is if you wanted to be able to use WSGI-based middleware features. At some point, that may be attractive, but I really doubt that in the short term anybody using Twisted-specific features in an application would want to bother with making it WSGI-compatible. From pje at telecommunity.com Thu Sep 16 18:18:26 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 18:17:24 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <15672B46-07F3-11D9-AC9C-000A95A50FB2@fuhm.net> References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916113124.025e97d0@mail.telecommunity.com> At 11:14 AM 9/16/04 -0400, James Y Knight wrote: >On Sep 16, 2004, at 2:37 AM, Phillip J. Eby wrote: >>Reading the rest of your post, I see that you are actually addressing the >>issue of asynchronous *applications*, and I have only been addressing >>asynchronous *servers* in the spec to date. (Technically "half-async" >>servers, since to be properly portable, a WSGI server *must* support >>synchronous applications, and therefore an async WSGI server must have a >>thread pool for running applications, even if it contains only one thread.) > > From the point of view of Twisted as the server, running a WSGI > application, the big question is: >Can you (as a host server) assume WSGI applications will run non-blocking? > >The answer is clearly No and I don't imagine that would change. Right, because the ability to wrap existing applications is a must, and most existing applications are synchronous. > (well, right now it's currently not even possible to write a > non-blocking WSGI application, but even if it were..) That depends on what you define as "non-blocking". :) >The only sensible thing is to assume a WSGI app will block for some >arbitrarily long amount of time. Therefore, the only solution is to spawn >threads for simultaneous WSGI applications. Right; this has been in the discussions of WSGI since day one, last December. The assumption is that async servers would have to use a thread pool (e.g. via reactor.deferToThread) to run WSGI applications. Since the point was to allow non-Twisted applications and frameworks (e.g. Zope) to run under Twisted or any other web server, this was the only possible approach. >So, basically, I concur: WSGI is implementable for async servers, but only >to implement blocking applications. If by "blocking" you mean, you can't absolutely guarantee that no operation will tie up the current thread, then yes. If you mean "tie up the current thread for the entire request", then no, since it's possible to pause the output with a few minor changes to the spec. >>However, I'm not certain that it's actually possible to support >>*portable* asynchronous applications under WSGI, since such asynchrony >>requires additional support such as an event loop service. >>As a practical matter, asynchronous applications today require a toolset >>such as Twisted or peak.events in addition to the web server, and I don't >>really know of a way to make such applications portable across web >>servers, since the web server might use a different toolset that insists >>on having its own event loop. Or it might be like mod_python or CGI, and >>not really have any meaningful way to create an event loop: it could be >>utterly synchronous in nature and impossible to make otherwise. >> >>Thus, as a practical matter, applications that make use of asynchronous >>I/O *may* be effectively outside WSGI's scope, if they have no real >>chance of portability. As I once said on the Web-SIG, the idea of WSGI >>is more aimed at allowing non-Twisted apps to run under a Twisted web >>server, than at allowing Twisted applications to run under other web >>servers! The latter, obviously, is much more ambitious than the former. > >Yes, there is no way that I can see to make WSGI suitable for writing >async applications without significant work. There are two obvious >issues: the input stream only provides blocking read(), not a selectable >fd, and there is no way to pause output. The sleep/wake extensions I proposed would allow pausing output. I hadn't thought about the input stream issue. >If the write callback was extended into a write/finish callback, it >wouldn't completely fix the second problem. Twisted would have to call the >write() callback from its reactor loop (having no access to the original >request thread). Especially if there is any middleware, the *write* might >block! There's also the question of whether the write() and finish() >methods are threadsafe or not -- would it even be safe to call from a >separate thread from that in which the request was started? That's one reason why sleep/wake over iterables is a better solution than write/finish for the "pausing output" issue. >Writing an async application *is* an interesting question, because then, >possibly, you could take the framework half of twisted web and run it as a >WSGI application. However, if this question is punted by WSGI (as I think >is likely a good idea..), twisted web framework can continue to work with >other servers by using HTTP proxying -- which is a _perfectly good_ >solution, and something major webservers already support. HTTP is a pretty >good protocol for talking between webservers and webapps. > >Also, if WSGI becomes really popular on servers that cannot do HTTP >proxying natively, twisted could provide a WSGI "application" that simply >proxies the requests over a socket to a separate twisted web server >process. This would provide essentially no advantage to HTTP proxying >where that works, however. No *technical* advantage, true, but if WSGI becomes a popular buzzword, the mere existence of such a solution allows you to boast that Twisted Web can be used with any WSGI-compliant server, as well as any server that supports HTTP proxying, which makes it sound like you have twice as many deployment options from a "marketecture" perspective. :) From py-web-sig at xhaus.com Thu Sep 16 18:45:36 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 16 18:40:24 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> Message-ID: <4149C330.8060009@xhaus.com> [Phillip J. Eby] > I'd encourage people to experiment with async extensions like my > sleep/wake idea, Actually, the more I think about it, the more I like your idea. My solution of using a thread-safe condition variable as an optional attribute of application return objects is too heavyweight. Whereas your solution can be implement with complexity relative to the server. For example, on a single-process server, wsgi.sleep could be defined like this def sleep(): # return a closure wrapping a method which sets a simple binary whereas a threaded server might use import threading def sleep(): # return a closure wrapping a threading.Condition().set() Also, having the wrapper in the environment means that its meaning can be changed by middleware. The only thing I disagree on are the names "sleep" and "wake", which IMHO come with too many semantic hangovers from the threading world. When an application calls wsgi.sleep(), it's not really sleeping, it's just declaring that it currently has no output: a call to its iterator will succeed, but the returned value will be an empty string. So basically, WSGI is providing an on/off indicator for every instance of a middleware stack, which indicates to the server if there is currently output available. Thinking afresh. ================ The server is just acting as a mediator between the client and application. When the application has data, and the client is ready to receive data, the server transfers data between the two. But that client to application conversation is full-duplex, i.e. the client may be sending input to the application. In an asynchronous situation, the application cannot simply do a blocking read on the input: that will tie up the server thread. So we need a way for the application to be notified/called when input becomes available from the client. Perhaps we need to add an environment entry, e.g. "wsgi.input_handler", which the app uses to pass a callable to the server. This callable would be called whenever data became available on the input stream. So how would that work in the middleware stack? Would the first application in the stack set the callback for the input_stream, and perhaps not even invoke the next component up in the stack until some input has arrived? Does this mean that input handling will have to separated out into a new state in the server->application state model? Or would each component in the stack set its own callback? I'm beginning to think that we may have to treat output and input identically in WSGI: i.e. from the servers point of view, there is no difference between the application->client stream and the client->application stream: there is symmetry between server's connection to the client and server's "connection" to the application. Hmmm: Must think some more about this. Regards, Alan. From pje at telecommunity.com Thu Sep 16 19:41:54 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 19:40:52 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <4149C330.8060009@xhaus.com> References: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> At 05:45 PM 9/16/04 +0100, Alan Kennedy wrote: >The only thing I disagree on are the names "sleep" and "wake", which IMHO >come with too many semantic hangovers from the threading world. When an >application calls wsgi.sleep(), it's not really sleeping, it's just >declaring that it currently has no output: a call to its iterator will >succeed, but the returned value will be an empty string. > >So basically, WSGI is providing an on/off indicator for every instance of >a middleware stack, which indicates to the server if there is currently >output available. Well, I'm proposing it as an optional extension, not a required feature. And, I think I'd like to streamline it to a single 'wsgi.pause_output' function, e.g.: resume = environ['wsgi.pause_output']() Where 'resume' is then a callback function that can be invoked to resume iteration. This keeps it to a single extension key, helps ensure the correct sequence of actions, and makes it easier to implement in some cases, while not making other cases any harder. >In an asynchronous situation, the application cannot simply do a blocking >read on the input: that will tie up the server thread. What do you mean by "server thread"? A truly asynchronous server (one using "no threads") cannot serve multiple WSGI requests simultaneously. In the general case, a WSGI server can only serve as many requests simultaneously as it has available threads for. However, WSGI applications that use iteration in place of 'write()' can sometimes be run with fewer than one thread per simultaneous request -- that's why iteration is recommended for applications that can be implemented that way. > So we need a way for the application to be notified/called when input > becomes available from the client. > >Perhaps we need to add an environment entry, e.g. "wsgi.input_handler", >which the app uses to pass a callable to the server. This callable would >be called whenever data became available on the input stream. > >So how would that work in the middleware stack? You would have to pass either 'environ' or 'wsgi.input' *into* this input handler request function, so that the server can verify it hasn't been replaced by any middleware. This is the standard way in WSGI of providing enhanced communication facilities that could "bypass" middleware. See: http://www.python.org/peps/pep-0333.html#server-extension-apis So, in principle, if the spec is modified to require middleware to honor child applications' block boundaries, then you could use an extension API to pause iteration until input is available, in much the same way that you would pause iteration for any other reason. Neither of these "pause iteration" solutions are especially elegant, at least from the POV of an async application author. But my objective here is only to make it *possible*, not necessarily pretty. I imagine that if there's actual demand for async apps to run under WSGI, it should be possible to create wrappers to let an application written in Twisted's continuation-passing style be run as a WSGI app. Such a wrapper would basically be just a function returning an iterator, with a bunch of pausing logic and a queue to communicate with the actual asynchronous app. And, such wrappers should only need to be written once for each asynchronous API, which as a practical matter probably means only Twisted, anyway, as (IMO) it has no real competitors in the Python async framework space. From foom at fuhm.net Thu Sep 16 19:57:16 2004 From: foom at fuhm.net (James Y Knight) Date: Thu Sep 16 19:57:20 2004 Subject: [Web-SIG] WSGI & transfer-encodings Message-ID: It is unclear to me from the WSGI spec what parts of HTTP a WSGI application is responsible for handling, and what the host server or middleware has to expect from the app. Sorry if this has been discussed previously, but it doesn't appear in the PEP. 1) Does the server need to decode incoming chunked encoding? The CGI spec essentially forbids incoming requests with chunked (and thus all others as well) transfer-encoding, as the CONTENT_LENGTH header is required to be present when there is incoming content. Does WSGI do the same thing? I would suggest the answer should be that WSGI does *not* require CONTENT_LENGTH to be present when there is incoming data. This requires at least the modification of: > The server is not required to read past the client's specified > Content-Length, and is allowed to simulate an end-of-file condition if > the application attempts to read past that point. The application > should not attempt to read more data than is specified by the > CONTENT_LENGTH variable. This would have to state something like: "The server must simulate an end-of-file condition if the application attempts to read more data than is specified by the Content-Length or the incoming Transfer-Encoding." The only way to tell if there's incoming data is therefore to attempt to read() the input stream. read() will either immediately return an EOF condition (returning '') or else read the data. Also, it seems that read() with no args isn't allowed? Perhaps it should be. 2) The server is responsible for connection-oriented headers, and the spec states it may override the client's headers in this case. I would take this to mean I should just ignore the client provided Connection and Transfer-Encoding headers and supply those myself according to HTTP spec. But what about transfer-encoding? The spec says the server is allowed to add a chunked encoding. But, - Is an application allowed to yield data that has already been encoded into chunked form? - What if it does so and you're talking to a HTTP 1.0 client? Should the server decode the chunking? Or should it just let the application produce bogus output? - May the application provide data with a gzip transfer-encoding? - What if the server already handles all connection-oriented behavior transparently and doesn't even pass on the Connection, Keep-Alive, TE, Trailers, Transfer-Encoding, Upgrade headers to the client? Is that okay? - Wouldn't providing pre-encoded data screw up middleware that is expecting to do something useful with the data going through it? I would suggest that that the correct answer is: the application should have nothing to do with any connection oriented behavior. It should not send a Connection or Transfer-Encoding header and should not expect to receive the Connection, Keep-Alive, TE, Trailers, Transfer-Encoding, or Upgrade headers, although it is optional for the server to strip them. The application should not apply a transfer-encodng to its?output and the server should not give it a transfer-encoded input. James From pje at telecommunity.com Thu Sep 16 20:03:27 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 20:02:36 2004 Subject: [Web-SIG] Draft language for WSGI to forbid blocking by middleware Message-ID: <5.1.1.6.0.20040916135706.021445a0@mail.telecommunity.com> I'm proposing the following language to be added to PEP 333, as a subsection under "Buffering and Streaming", just before the subsection entitled, "The write() callable". It doesn't address pausing or resuming iteration (which I don't have a PEP-able proposal for yet), but it should ensure that middleware doesn't introduce any additional blocking issues: ===excerpt start=== Middleware Handling of Block Boundaries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to better support asynchronous applications and servers, middleware components **must not** block iteration waiting for multiple values from an application iterable. If the middleware needs to accumulate more data from the application before it can produce any output, it **must** yield an empty string. To put this requirement another way, a middleware component **must yield at least one value** each time its underlying application yields a value. If the middleware cannot yield any other value, it must yield an empty string. This requirement ensures that asynchronous applications and servers can conspire to reduce the number of threads that are required to run a given number of application instances simultaneously. Note also that this requirement means that middleware **must** return an iterable as soon as its underlying application returns an iterable. It is also forbidden for middleware to use the ``write()`` callable to transmit data that is yielded by an underlying application. Middleware may only use their parent server's ``write()`` callable to transmit data that the underlying application sent using a middleware-provided ``write()`` callable. ===excerpt end=== In addition to this insertion, I would modify the 'start_response()' specification to note that HTTP headers should not be sent until the first *non-empty* string is yielded from the iterable. Comments? From foom at fuhm.net Thu Sep 16 20:04:59 2004 From: foom at fuhm.net (James Y Knight) Date: Thu Sep 16 20:05:03 2004 Subject: [Web-SIG] Re: WSGI & transfer-encodings In-Reply-To: References: Message-ID: On Sep 16, 2004, at 1:57 PM, James Y Knight wrote: > 2) The server is responsible for connection-oriented override the > client's headers in this case. I would take this to mean I should just > ignore the client provided Connection and Transfer-Encoding headers > and supply those myself according to HTTP spec. I said "client" here a few times, but I meant WSGI "application" instead for all of them but the phrase "HTTP 1.0 client". Sorry for any confusion. James From pje at telecommunity.com Thu Sep 16 20:30:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 20:29:32 2004 Subject: [Web-SIG] WSGI & transfer-encodings In-Reply-To: Message-ID: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com> At 01:57 PM 9/16/04 -0400, James Y Knight wrote: >It is unclear to me from the WSGI spec what parts of HTTP a WSGI >application is responsible for handling, and what the host server or >middleware has to expect from the app. The general section for such issues is: http://www.python.org/peps/pep-0333.html#other-http-features The advice is that in general, a WSGI server should consider itself an HTTP proxy server, and should consider the application an HTTP origin server. However, this doesn't fully cover the two issues you've brought up, so thanks for bringing them to my attention! >1) Does the server need to decode incoming chunked encoding? The CGI spec >essentially forbids incoming requests with chunked (and thus all others as >well) transfer-encoding, as the CONTENT_LENGTH header is required to be >present when there is incoming content. Does WSGI do the same thing? > >I would suggest the answer should be that WSGI does *not* require >CONTENT_LENGTH to be present when there is incoming data. Hm. An interesting conundrum. Do any Python servers or applications exist today that *work* when there's no content-length? Personally, I'm thinking that WSGI should follow CGI here, and decode incoming transfer encodings. If this means HTTP/1.1 servers have to dump the incoming data to a file first, so be it. >The only way to tell if there's incoming data is therefore to attempt to >read() the input stream. read() will either immediately return an EOF >condition (returning '') or else read the data. Also, it seems that read() >with no args isn't allowed? Perhaps it should be. A no-argument read would be problematic in some environments -- CGI for example. >2) The server is responsible for connection-oriented headers, and the spec >states it may override the client's headers in this case. I would take >this to mean I should just ignore the client provided Connection and >Transfer-Encoding headers and supply those myself according to HTTP spec. > >But what about transfer-encoding? The spec says the server is allowed to >add a chunked encoding. But, >- Is an application allowed to yield data that has already been encoded >into chunked form? >- What if it does so and you're talking to a HTTP 1.0 client? Should the >server decode the chunking? Or should it just let the application produce >bogus output? >- May the application provide data with a gzip transfer-encoding? >- What if the server already handles all connection-oriented behavior >transparently and doesn't even pass on the Connection, Keep-Alive, TE, >Trailers, Transfer-Encoding, Upgrade headers to the client? Is that okay? The answer to all these questions, according to the current spec, is yes, absolutely. (Per the "server=proxy server, application=origin server" model). >- Wouldn't providing pre-encoded data screw up middleware that is >expecting to do something useful with the data going through it? Yes, it would. There are at least two ways to handle it, though: 1. Don't use middleware that's not smart enough to handle your app's output 2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other parameters on the way in to the application, so that the application (if written correctly) won't send data the server or middleware can't handle. >I would suggest that that the correct answer is: the application should >have nothing to do with any connection oriented behavior. It should not >send a Connection or Transfer-Encoding header and should not expect to >receive the Connection, Keep-Alive, TE, Trailers, Transfer-Encoding, or >Upgrade headers, although it is optional for the server to strip them. The >application should not apply a transfer-encodng to its output and the >server should not give it a transfer-encoded input. I like most of this, *except* that I'd like to leave open the option of an application providing transfer-encoding on its output. I'd rather have servers and middleware set HTTP_ACCEPT_ENCODING to "identity;q=1.0, *;q=0" (or an empty string, or delete the entry), if they interpret content, and have applications be required to respect this. Specifically, an application can only apply a content-encoding if it matches a non-zero quality in HTTP_ACCEPT_ENCODING. From foom at fuhm.net Thu Sep 16 21:03:53 2004 From: foom at fuhm.net (James Y Knight) Date: Thu Sep 16 21:03:56 2004 Subject: [Web-SIG] WSGI & transfer-encodings In-Reply-To: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com> References: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com> Message-ID: <26AD023A-0813-11D9-AC9C-000A95A50FB2@fuhm.net> On Sep 16, 2004, at 2:30 PM, Phillip J. Eby wrote: > Hm. An interesting conundrum. Do any Python servers or applications > exist today that *work* when there's no content-length? Unknown. > Personally, I'm thinking that WSGI should follow CGI here, and decode > incoming transfer encodings. If this means HTTP/1.1 servers have to > dump the incoming data to a file first, so be it. Following CGI means: do not allow requests without a Content-Length. No servers I know of will dump the data to a file to determine the length first before sending to a CGI. I would not ask them to either: that's like saying "Pleeease denial of service me!". And, really, the only place I've seen incoming chunked requests used is for streaming data -- and that will "never" finish. >> The only way to tell if there's incoming data is therefore to attempt >> to read() the input stream. read() will either immediately return an >> EOF condition (returning '') or else read the data. Also, it seems >> that read() with no args isn't allowed? Perhaps it should be. > > A no-argument read would be problematic in some environments -- CGI > for example. No -- CGI requires CONTENT_LENGTH, so in the CGI environment it is perfectly possible to simulate EOF at the end of the data. read could look something like this: class CGIReq: def __init__(self): self.maxlength = int(environ.get('CONTENT_LENGTH', 0)) def read(self, length=None): if length is None: length = self.maxlength else: length = min(self.maxlength, length) data = sys.stdin.read(length) self.maxlength -= len(data) return data >> - Wouldn't providing pre-encoded data screw up middleware that is >> expecting to do something useful with the data going through it? > > Yes, it would. There are at least two ways to handle it, though: > > 1. Don't use middleware that's not smart enough to handle your app's > output > > 2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other > parameters on the way in to the application, so that the application > (if written correctly) won't send data the server or middleware can't > handle. You've confused Content-Encoding with Transfer-Encoding. TE is the request header that goes with Transfer-Encoding response header. And according to HTTP 1.1, chunked is always acceptable, so no amount of header munging can change that. So under the "WSGI application is a HTTP origin server" interpretation, all pieces of middleware must be prepared to deal with chunked output. I think that's silly -- there is no reason for a WSGI application to produce chunked-encoded strings, as it already has a way to produce chunks via the iterator. >> I would suggest that that the correct answer is: the application >> should have nothing to do with any connection oriented behavior. It >> should not send a Connection or Transfer-Encoding header and should >> not expect to receive the Connection, Keep-Alive, TE, Trailers, >> Transfer-Encoding, or Upgrade headers, although it is optional for >> the server to strip them. The application should not apply a >> transfer-encodng to its output and the server should not give it a >> transfer-encoded input. > > I like most of this, *except* that I'd like to leave open the option > of an application providing transfer-encoding on its output. I'd > rather have servers and middleware set HTTP_ACCEPT_ENCODING to > "identity;q=1.0, *;q=0" (or an empty string, or delete the entry), if > they interpret content, and have applications be required to respect > this. Specifically, an application can only apply a content-encoding > if it matches a non-zero quality in HTTP_ACCEPT_ENCODING. Again: I'm talking only about Transfer-Encoding, not Content-Encoding. Content-Encoding is an end-to-end function and thus properly belongs to the application. Transfer-Encoding is a hop-by-hop header, and properly belongs to the server. If you want a transfer-encoded output, you can always request it via a server-specific extension or configuration mechanism. Both Transfer-Encoding and Content-Encoding have a gzip argument, but these mean significantly different things. The first is connection compression, the second is transferring a compressed file over an uncompressed connection. James From py-web-sig at xhaus.com Thu Sep 16 21:29:31 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 16 21:24:30 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> References: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> Message-ID: <4149E99B.3060003@xhaus.com> [Alan Kennedy] >> In an asynchronous situation, the application cannot simply do a >> blocking read on the input: that will tie up the server thread. [Phillip J. Eby] > What do you mean by "server thread"? A truly asynchronous server (one > using "no threads") cannot serve multiple WSGI requests > simultaneously. In the general case, a WSGI server can only serve as > many requests simultaneously as it has available threads for. Sorry, I should have paid more attention to phrasing in this context. By "server thread" I mean the thread of execution that is running the select/poll operation in the server (which needs at least *one* thread). If the application did a blocking read of the input running in a simple, single-threaded asyncore-style server, that single thread would block, holding up event processing. [Phillip J. Eby] > > [About asynchronous input handlers] > > Such a wrapper would basically be just a function returning an > iterator, with a bunch of pausing logic and a queue to communicate > with the actual asynchronous app. And, such wrappers should only > need to be written once for each asynchronous API, which as a > practical matter probably means only Twisted, anyway, as (IMO) it has > no real competitors in the Python async framework space. I see the need for returning an iterator: the application processing the input has to produce a response as well: for a form-processing app returning a "thank you for your submission" page. But I don't see the need for pausing logic or queues? Why can't the server simply call directly into the application, e.g. using a "process_input" method, in effect saying "you have some input ready". And I'm not sure I see the need for the application to check that the wsgi.input hasn't been replaced: if there were middleware further down that stack that was intercepting and transforming the input stream, then *it* should be the one receiving the asynchronous notification from the server. This lower level component would then read some input, process it, and then call a "process_input" method on the next component up in the stack, etc, etc. I suppose I'm talking about the server "pushing" the input through the middleware stack, whereas you're talking about the application at the stop of the stack "pulling" the data up through the stack. Is that right? And I'd be interested to see how your approach would handle a situation where there is both streaming input and output. For example, a server that takes strings of any length, say 10**9 bytes, and .encode('rot13')'s each byte in turn, before sending it back to the client. I'll be thinking about this some more. Regards, Alan. From pje at telecommunity.com Thu Sep 16 22:08:48 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 22:07:47 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <4149E99B.3060003@xhaus.com> References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> At 08:29 PM 9/16/04 +0100, Alan Kennedy wrote: >[Alan Kennedy] > >> In an asynchronous situation, the application cannot simply do a > >> blocking read on the input: that will tie up the server thread. > >[Phillip J. Eby] > > What do you mean by "server thread"? A truly asynchronous server (one > > using "no threads") cannot serve multiple WSGI requests > > simultaneously. In the general case, a WSGI server can only serve as > > many requests simultaneously as it has available threads for. > >Sorry, I should have paid more attention to phrasing in this context. > >By "server thread" I mean the thread of execution that is running the >select/poll operation in the server (which needs at least *one* thread). >If the application did a blocking read of the input running in a simple, >single-threaded asyncore-style server, that single thread would block, >holding up event processing. Right, which is (one reason) why a WSGI server can in the general case only serve as many WSGI requests simultaneously as it has available threads for, although it's possible to improve on that worst-case condition by appropriate use of iterators. >But I don't see the need for pausing logic or queues? Why can't the server >simply call directly into the application, e.g. using a "process_input" >method, in effect saying "you have some input ready". > >And I'm not sure I see the need for the application to check that the >wsgi.input hasn't been replaced: if there were middleware further down >that stack that was intercepting and transforming the input stream, then >*it* should be the one receiving the asynchronous notification from the >server. This lower level component would then read some input, process it, >and then call a "process_input" method on the next component up in the >stack, etc, etc. > >I suppose I'm talking about the server "pushing" the input through the >middleware stack, whereas you're talking about the application at the stop >of the stack "pulling" the data up through the stack. Is that right? That's correct, and that's what I'm trying to avoid if at all possible, because it enormously complicates middleware, to the sole benefit of asynchronous apps -- that mostly aren't going to be portable anyway. So, going by STASCTAP theory (Simple Things Are Simple, Complex Things Are Possible), the pause/resume approach makes asynchronous applications *possible*, while keeping the nominal synchronous cases and middleware *simple*. >And I'd be interested to see how your approach would handle a situation >where there is both streaming input and output. For example, a server that >takes strings of any length, say 10**9 bytes, and .encode('rot13')'s each >byte in turn, before sending it back to the client. Presumably, the function to pause for input needs to take a minimum length, or have some way to communicate available length to the application. I don't pretend to fully understand the needed use cases here, because I have little experience writing web applications that need to wait on other network services (other than databases) while a client is waiting. And if I were writing an asynchronous server, I'd probably at least consider using Greenlets to context-switch blocking operations so that they wouldn't tie up an active thread. Such an approach is conceptually easier to deal with, IMO, than writing everything in continuation-passing style. But I *do* want WSGI to make it *possible* to meet async apps' use cases, which is why I'm seeking input from those that do have the relevant experience. The trade-off is that it shouldn't excessively complicate nominal compliance with WSGI. In particular, I'd prefer that the current "example CGI gateway" in PEP 333 not require any major changes or significant expansion. From pje at telecommunity.com Thu Sep 16 22:22:04 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 22:21:03 2004 Subject: [Web-SIG] WSGI & transfer-encodings In-Reply-To: <26AD023A-0813-11D9-AC9C-000A95A50FB2@fuhm.net> References: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com> <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916160858.0277e1a0@mail.telecommunity.com> At 03:03 PM 9/16/04 -0400, James Y Knight wrote: >On Sep 16, 2004, at 2:30 PM, Phillip J. Eby wrote: > >>Hm. An interesting conundrum. Do any Python servers or applications >>exist today that *work* when there's no content-length? > >Unknown. > >>Personally, I'm thinking that WSGI should follow CGI here, and decode >>incoming transfer encodings. If this means HTTP/1.1 servers have to dump >>the incoming data to a file first, so be it. > >Following CGI means: do not allow requests without a Content-Length. No >servers I know of will dump the data to a file to determine the length >first before sending to a CGI. I would not ask them to either: that's like >saying "Pleeease denial of service me!". And, really, the only place I've >seen incoming chunked requests used is for streaming data -- and that will >"never" finish. Hm. I suppose it's in theory possible that one could write some kind of streaming-over-HTTP application with WSGI. So I guess we should consider allowing it. >>>The only way to tell if there's incoming data is therefore to attempt to >>>read() the input stream. read() will either immediately return an EOF >>>condition (returning '') or else read the data. Also, it seems that >>>read() with no args isn't allowed? Perhaps it should be. >> >>A no-argument read would be problematic in some environments -- CGI for >>example. > >No -- CGI requires CONTENT_LENGTH, so in the CGI environment it is >perfectly possible to simulate EOF at the end of the data. I mainly meant that environments like CGI already have a suitable file-like object for use as 'wsgi.input', and that supporting 'read()' with no arguments requires implementing a replacement 'wsgi.input'. >>>- Wouldn't providing pre-encoded data screw up middleware that is >>>expecting to do something useful with the data going through it? >> >>Yes, it would. There are at least two ways to handle it, though: >> >>1. Don't use middleware that's not smart enough to handle your app's output >> >>2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other >>parameters on the way in to the application, so that the application (if >>written correctly) won't send data the server or middleware can't handle. > >You've confused Content-Encoding with Transfer-Encoding. TE is the request >header that goes with Transfer-Encoding response header. And according to >HTTP 1.1, chunked is always acceptable, so no amount of header munging can >change that. So under the "WSGI application is a HTTP origin server" >interpretation, all pieces of middleware must be prepared to deal with >chunked output. I think that's silly -- there is no reason for a WSGI >application to produce chunked-encoded strings, as it already has a way to >produce chunks via the iterator. Fair enough; the only parts that has any business reading or writing chunked encoding is the "real" server; I'll update the PEP 333 "Other HTTP Features" section accordingly. >>I like most of this, *except* that I'd like to leave open the option of >>an application providing transfer-encoding on its output. I'd rather >>have servers and middleware set HTTP_ACCEPT_ENCODING to "identity;q=1.0, >>*;q=0" (or an empty string, or delete the entry), if they interpret >>content, and have applications be required to respect >>this. Specifically, an application can only apply a content-encoding if >>it matches a non-zero quality in HTTP_ACCEPT_ENCODING. > >Again: I'm talking only about Transfer-Encoding, not Content-Encoding. >Content-Encoding is an end-to-end function and thus properly belongs to >the application. Transfer-Encoding is a hop-by-hop header, and properly >belongs to the server. If you want a transfer-encoded output, you can >always request it via a server-specific extension or configuration mechanism. > >Both Transfer-Encoding and Content-Encoding have a gzip argument, but >these mean significantly different things. The first is connection >compression, the second is transferring a compressed file over an >uncompressed connection. Thanks for clearing up my confusion; between your explanation and RFC 2616 I think I can now see how to clarify this. In effect, WSGI applications *must not* send hop-by-hop headers or interpret them, and servers *should not* provide them to applications. And WSGI middleware *must* follow RFC 2616, section 13.5, regarding what headers may be changed in transit when. One way of looking at it is that WSGI servers and middleware are like HTTP proxy servers, but using a private inter-server transport mechanism that effectively replaces any normal HTTP hop-by-hop control mechanisms. From py-web-sig at xhaus.com Thu Sep 16 23:41:31 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Thu Sep 16 23:36:17 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> Message-ID: <414A088B.7040601@xhaus.com> [Alan Kennedy] >> I suppose I'm talking about the server "pushing" the input through >> the middleware stack, whereas you're talking about the application at >> the stop of the stack "pulling" the data up through the stack. Is >> that right? [Phillip J. Eby] > That's correct, and that's what I'm trying to avoid if at all > possible, because it enormously complicates middleware, to the sole > benefit of asynchronous apps -- that mostly aren't going to be > portable anyway. Hmmm. Perhaps I'll resort to explaining my idea through code rather than text. Here is my take on a putative blocking *and* asynchronous rot-13 stream encoder. But before showing you the blocking and async one, I want to show what I think the blocking one would look like class blocking_rot13_streamer: def __init__(self, environ, start_response): self.in_stream = environ['wsgi.input'] start_response("200 OK", [('context-type', 'text/plain-rot13')]) def __iter__(self): return self def next(self): try: return self.in_stream.read().encode('rot-13') except EndOfStream: raise StopIteration This looks nice and simple to me. The one that works in both async mode and blocking mode looks like this class rot13_streamer: def __init__(self, environ, start_response): self.in_stream = environ['wsgi.input'] self.buffer = [] self.end_of_stream = False if environ.has_key('wsgi.async_input_handler'): self.async = True environ['wsgi.async_input_handler'](self.input_handler) else: self.async = False self.pause_output = environ['wsgi.pause_output'] start_response("200 OK", [('context-type', 'text/plain-rot13')]) def input_handler(self): try: data = self.environ['wsgi.input'].read() self.buffer.append(data) if self.resume: self.resume() self.resume = None # Are resumes one-hit or "re-entrant"? except EndOfStream: self.end_of_stream = True def __iter__(self): return self def next(self): if async: if self.buffer: return self.buffer.pop().encode('rot-13') else: if self.end_of_stream: raise StopIteration else: self.resume = self.pause_output() return "" else: try: return self.in_stream.read().encode('rot-13') except EndOfStream: raise StopIteration In this way, there could be a middleware component below the rot13_streamer in the stack that, say, does chunked_transfer encoding and decoding. It would be the same in form as the above, except that it would 1. Change the environ entry for 'wsgi.async_input_handler' to be its own callable that records the callback for the next layer up in the stack, the rot13_streamer.input_handler. 2. Create its own buffer, into which it will store chunks decoded from the input stream. This buffer, e.g. a StringIO, then replaces 'wsgi.input' in the environ passed to next middleware component up. 3. When chunks arrive from the client, the server calls the dechunker input_handler. This reads the (possibly partial) chunk from the stream, decodes it and stores it in its StringIO buffer. 4. When it has a complete chunk it calls the input_handler of the next component in the stack, which will then read the decoded chunk from its wsgi.input stream, i.e. the dechunkers StringIO. I think that this proposed approach is clean, and not overly complex for async or blocking programmers to handle. But I think we do have to cleanly separate the two. I think there are problems associated with trying to run *all* components seamlessly across async or blocking servers. I think that middleware components that are always going to behave correctly in an async situation will have to be designed like that from the ground up. It's dangerous to take components written in a blocking environment and run them in an async environment. And lastly, if it is desired to spin jobs into a different thread, e.g. the rot-13 job above, then that should be a middleware concern, not the WSGI server's. So if a twisted component wants to pass a job to a service thread, some other twisted comonent lower down the stack, possibly the framework itself, must have already created the threads/queues to enable this. The twisted rot-13 component would then have very thin methods (run from the server's main thread) which interact with the twisted space i.e. transferring data and receiving data back through queues, and layer WSGI semantics on those interactions, i.e. pause_output, yield result, yield empty_string, etc. When I described your approach as "pulling data up the stack", I saw a bigger difference between the two approaches. I'm thinking now that there is little difference between our proposals, except that in mine it's the bottom component that gets notified of the input by the server, and in yours it's the top component. Though I suppose having the top component pulling input from an iterator chain mirrors nicely the situation where the server pulls output from an iterator chain. And my approach basically entails a bunch of nested calls, which might be less efficient elegant than if, say, generators were used in an input processing chain. You're right again Phillip :-) Regards, Alan. From py-web-sig at xhaus.com Fri Sep 17 00:12:28 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Fri Sep 17 00:07:04 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <414A088B.7040601@xhaus.com> References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> <414A088B.7040601@xhaus.com> Message-ID: <414A0FCC.50502@xhaus.com> [Alan Kennedy] > Though I suppose having the top > component pulling input from an iterator chain mirrors nicely the > situation where the server pulls output from an iterator chain. Which also means that the top component must be prepared to receive "" from the component below it in the input chain. Say for example that the headers for a new chunk body arrive on the client socket, but not a chunk-encoded body, yet. The top iterator, e.g. the uploaded-file processor, pulls data from the component below it, which is say the dechunker. The dechunker will read the headers and get the relevant metadata for the chunk. But since there is no actual data available now, it must yield "" to the next component up. I was wondering if we might need to mirror the pause/resume facility on the input stream. But it's not a required, because the application is getting a callback directly from the server when there is data available. It's just that the data on socket that gave rise to the notification may not translate to actual data for the called application. Regards, Alan. From pje at telecommunity.com Fri Sep 17 00:37:39 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 17 00:36:43 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <414A088B.7040601@xhaus.com> References: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com> At 10:41 PM 9/16/04 +0100, Alan Kennedy wrote: >In this way, there could be a middleware component below the >rot13_streamer in the stack that, say, does chunked_transfer encoding and >decoding. It would be the same in form as the above, except that it would FYI, middleware and apps are now banned from dealing in any kind of transfer-encodings, per James' very valuable input on that subject. Like connection properties, these should be the exclusive province of the actual web server. >1. Change the environ entry for 'wsgi.async_input_handler' to be its own >callable that records the callback for the next layer up in the stack, the >rot13_streamer.input_handler. This would lead to the unacceptable situation of every middleware component having to know in principle about extensions. The "Server Extension APIs" section of the PEP demands that any "bypass" API verify replacement for this very reason. >I think that this proposed approach is clean, and not overly complex for >async or blocking programmers to handle. Unless of course they're writing middleware that does something with the input. >But I think we do have to cleanly separate the two. I think there are >problems associated with trying to run *all* components seamlessly across >async or blocking servers. I think that middleware components that are >always going to behave correctly in an async situation will have to be >designed like that from the ground up. It's dangerous to take components >written in a blocking environment and run them in an async environment. It is a non-goal for WSGI to support running multiple requests simultaneously in a single-threaded asynchronous server, so the issue doesn't really come up. A WSGI server *must* allow for the fact that WSGI apps use up a thread while they're running or producing a value: that's the price of being able to run "traditional" web applications under WSGI. >And lastly, if it is desired to spin jobs into a different thread, e.g. >the rot-13 job above, then that should be a middleware concern, not the >WSGI server's. I agree with you -- for *asynchronous* applications. Synchronous web applications are the default case in WSGI and the world in general, so servers *must* use a thread pool to start applications and to run 'next()' calls, if they are asynchronous. But, asynchronous applications wish to yield control, to avoid hogging resources in that thread pool, so they need to delegate the work to their I/O thread, and then yield an empty string to pause output, freeing up that thread for another iterable next(), or application start. Notice, however, that if the server is *synchronous* (e.g. CGI, single-threaded FastCGI containers, mod_python under Apache 1.x, etc., ), then this is a complete waste of time, because you'll only be running one simultaneous request in this process anyway, so you're spinning off a second thread to keep from tying up the first thread, but all the first thread is doing is waiting for the second thread to finish! This is wasteful, to say the least. The only case where pausing output (whether for unrelated network I/O, or because of a need to read from the input stream) is actually useful is when the server is *also* asynchronous -- hence the value of making such pausing an optional extension API. The application can then detect when it's *useful* to pause, and synchronous applications needn't worry about it. Of course, even if the server and application are *both* asynchronous, that's no guarantee that they're using compatible event loops! If you try to run a Twisted app under asyncore or vice versa, you're going to be spinning off an extra thread to run a second event loop, so there's a bit of a trade-off to determining whether your asynchrony is going to actually *gain* anything. But that's a separate question. WSGI will allow you to be asynchronous if you really want to, no matter how bad an idea it might be in some cases. :) >The twisted rot-13 component would then have very thin methods (run from >the server's main thread) which interact with the twisted space i.e. >transferring data and receiving data back through queues, and layer WSGI >semantics on those interactions, i.e. pause_output, yield result, yield >empty_string, etc. You're pretty much describing what I suggested earlier: that async app frameworks like Twisted may want to have a model whereby a generic "thin wrapper" WSGI application object is used to communicate with an application that's written using the underlying framework's async idioms. So, for example, one might perhaps design a Twisted "Transport" that was implemented as a WSGI application. (I don't know if "Transport" is really the correct abstraction to use, I'm just giving an example here.) Anyway, for such a thing to really work, I think you might need server-specific reactor plugins, to integrate Twisted's event loop with that of the server. >When I described your approach as "pulling data up the stack", I saw a >bigger difference between the two approaches. I'm thinking now that there >is little difference between our proposals, except that in mine it's the >bottom component that gets notified of the input by the server, and in >yours it's the top component. Though I suppose having the top component >pulling input from an iterator chain mirrors nicely the situation where >the server pulls output from an iterator chain. Actually, I'm saying you pull data *down* the stack. The bottom-most application iterator calls 'read()' on an input stream provided by a parent middleware component, which then calls read on a higher-level component, and so on. >And my approach basically entails a bunch of nested calls, which might be >less efficient elegant than if, say, generators were used in an input >processing chain. > >You're right again Phillip :-) Not entirely, actually. For my approach to really work, the middleware would have to be guaranteed to return something from read(), as long as the parent's read() returns something. Otherwise, the resumption would block, unless the middleware were much smarter. I've got to think about it some more, because right now I'm still not happy with the specifics of any of the proposals for pausing and resuming output. From pje at telecommunity.com Fri Sep 17 00:59:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 17 00:58:31 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com> References: <414A088B.7040601@xhaus.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916184337.02ec9680@mail.telecommunity.com> At 06:37 PM 9/16/04 -0400, Phillip J. Eby wrote: >Not entirely, actually. For my approach to really work, the middleware >would have to be guaranteed to return something from read(), as long as >the parent's read() returns something. Otherwise, the resumption would >block, unless the middleware were much smarter. I've got to think about >it some more, because right now I'm still not happy with the specifics of >any of the proposals for pausing and resuming output. Aha! There's the problem. The 'read()' protocol is what's wrong. If 'wsgi.input' were an *iterator* instead of a file-like object, it would be fairly straightforward for async servers to implement "would block" reads as yielding empty strings. And, servers could actually support streaming input via chunked encoding, because they could just yield blocks once they've arrived. The downside to making 'wsgi.input' an iterator is that you lose control over how much data to read at a time: the upstream server or middleware determines how much data you get. But, it's quite possible to make a buffering, file-like wrapper over such an iterator, if that's what you really need, and your code is synchronous. (This will slightly increase the coding burden for interfacing applications and frameworks that expect to have a readable stream for CGI input.) For asynchronous code, you're just going to invoke some sort of callback with each block, and it's the callback's job to deal with it. What does everybody think? If combined with a "pause iterating me until there's input data available" extension API, this would let the input stream be non-blocking, and solve the chunked-encoding input issue all in one change to the protocol. Or am I missing something here? From py-web-sig at xhaus.com Fri Sep 17 01:04:36 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Fri Sep 17 00:59:19 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com> References: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com> Message-ID: <414A1C04.9010306@xhaus.com> [Alan Kennedy] >> When I described your approach as "pulling data up the stack", I saw a >> bigger difference between the two approaches. I'm thinking now that >> there is little difference between our proposals, except that in mine >> it's the bottom component that gets notified of the input by the >> server, and in yours it's the top component. Though I suppose having >> the top component pulling input from an iterator chain mirrors nicely >> the situation where the server pulls output from an iterator chain. [Phillip J. Eby] > Actually, I'm saying you pull data *down* the stack. The bottom-most > application iterator calls 'read()' on an input stream provided by a > parent middleware component, which then calls read on a higher-level > component, and so on. Hmm. That only makes sense to me if your stacks grow downwards :-) In my mental picture, stacks grow upwards. The server is level ground, and each middleware component is placed on top of the other, with the "most wrapped" component at the top. So to me what your description above says is that the component closest to the server is the one that gets to see the input last, after all the more wrapped components, with the most wrapped component getting first dibs on the input. Which doesn't make sense to me. Perhaps your stacks grow downwards? Anyway, I *think* we're talking about the same thing. Which leads onto the next question: Why not insist on an iterable for the input stream as well as the output stream. It appears to me that there should be symmetry between the output write()/iterable split and the input read()/iterable split. Regards, Alan. From pje at telecommunity.com Fri Sep 17 01:08:42 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 17 01:07:41 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <414A1C04.9010306@xhaus.com> References: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com> <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916190734.02f8aec0@mail.telecommunity.com> At 12:04 AM 9/17/04 +0100, Alan Kennedy wrote: >Which leads onto the next question: Why not insist on an iterable for the >input stream as well as the output stream. It appears to me that there >should be symmetry between the output write()/iterable split and the input >read()/iterable split. Looks like you had the same "aha" as I just did a few minutes ago, so I'll take your comment as a +1 on that approach. :) From floydophone at gmail.com Fri Sep 17 01:16:26 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Sep 17 01:16:32 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes Message-ID: <6654eac4040916161612849362@mail.gmail.com> Alan, that design looks okay. A bit complex, but it works well once you sit down to look at it. It would be nice if applications that didn't need a separate thread didn't use one up, so performance-oriented programmers (like the Twisted/Nevow guys) won't be able to have that excuse. Perhaps start_response() could have a "threaded" boolean optional argument that defaults to true which decides whether or not the iterable will be called in a separate thread. This, of course, requires that the application callable itself doesn't have any blocking code. Does this requirement overcomplicate things? From pje at telecommunity.com Fri Sep 17 01:57:59 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 17 01:56:59 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <6654eac4040916161612849362@mail.gmail.com> Message-ID: <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com> At 07:16 PM 9/16/04 -0400, Peter Hunt wrote: >Alan, that design looks okay. A bit complex, but it works well once >you sit down to look at it. > >It would be nice if applications that didn't need a separate thread >didn't use one up, so performance-oriented programmers (like the >Twisted/Nevow guys) won't be able to have that excuse. Perhaps >start_response() could have a "threaded" boolean optional argument >that defaults to true which decides whether or not the iterable will >be called in a separate thread. This, of course, requires that the >application callable itself doesn't have any blocking code. > >Does this requirement overcomplicate things? Yes. The vast majority of existing web applications are synchronous, and so are a significant number of Python web server environments that would run WSGI applications. Therefore the WSGI "common case" is to have synchronous behavior, and WSGI is most efficient with either a synchronous server/gateway, or a "half-async" server/gateway (i.e., one that runs application code in a thread pool, separate from the main I/O thread.) The few applications that can behave in a non-blocking fashion, can and should use the iterable interface to provide their output, producing empty strings when they are not yet ready to produce output. (Plus, when such applications are run in a synchronous server or gateway, they might as well behave synchronously, since they will actually incur more overhead by trying to be asynchronous!) The only scenario that isn't served by this approach is a single-threaded, asynchronous server with no threading capability. However, such a server *cannot* be WSGI-compatible and still serve multiple requests, and there is no way around that without forcing *every* application to be asynchronous, which just isn't an acceptable tradeoff. The idea of having a flag (whether passed to start_response, or introspected on the application object, etc.) doesn't help the fact that the server still has to be able to *have* multiple threads in such a case. Note, by the way, that the need for a second thread is caused by having a possible difference between the synchrony model of a server and an application. That is, if both are synchronous or both are asynchronous, no threading is required. However, a server is not limited to running just *one* application, so in the general case, a given server has to be able to handle both. However, since the common case is for apps to be synchronous, then the common case for an asynchronous server is that it must be threaded, and the common case for a synchronous server is that it need not be threaded. Thus, logically, the case of an asynchronous application is the "odd one out", in the sense that it is the only one that ever forces additional threading, beyond what was inherently required for that server model. In other words, an async server has to have threading in the common case, and a synchronous application doesn't. So, an async app in an async server doesn't *add* any threading requirement: the async server already has to have an I/O thread and at least one application thread. And a synchronous app doesn't add any additional threading requirements to either kind of server, for the same reason. Only an asynchronous application in a synchronous server forces any extra overhead beyond the effective default required threading configuration. Thus, it makes sense (to me, anyway) to in that case put the burden on the asynchronous application to manage communication with its extra thread, if any, or to have it adapt to local circumstances and behave synchronously (since that's more efficient in that case). But in the end, all of this comes down to a basically simple idea: I think that in WSGI, synchronous applications should be simple, and asynchronous applications possible, because that will best support the goals of the PEP. From floydophone at gmail.com Fri Sep 17 02:37:22 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Sep 17 02:37:29 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com> References: <6654eac4040916161612849362@mail.gmail.com> <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com> Message-ID: <6654eac4040916173766fa4cf1@mail.gmail.com> Yes, but an async app running in an async server in a thread is overkill, don't you think? We don't need to spawn an extra thread to run it. I'm not talking about "possible", I'm talking about "optimal". On Thu, 16 Sep 2004 19:57:59 -0400, Phillip J. Eby wrote: > > > At 07:16 PM 9/16/04 -0400, Peter Hunt wrote: > >Alan, that design looks okay. A bit complex, but it works well once > >you sit down to look at it. > > > >It would be nice if applications that didn't need a separate thread > >didn't use one up, so performance-oriented programmers (like the > >Twisted/Nevow guys) won't be able to have that excuse. Perhaps > >start_response() could have a "threaded" boolean optional argument > >that defaults to true which decides whether or not the iterable will > >be called in a separate thread. This, of course, requires that the > >application callable itself doesn't have any blocking code. > > > >Does this requirement overcomplicate things? > > Yes. The vast majority of existing web applications are synchronous, and > so are a significant number of Python web server environments that would > run WSGI applications. Therefore the WSGI "common case" is to have > synchronous behavior, and WSGI is most efficient with either a synchronous > server/gateway, or a "half-async" server/gateway (i.e., one that runs > application code in a thread pool, separate from the main I/O thread.) > > The few applications that can behave in a non-blocking fashion, can and > should use the iterable interface to provide their output, producing empty > strings when they are not yet ready to produce output. (Plus, when such > applications are run in a synchronous server or gateway, they might as well > behave synchronously, since they will actually incur more overhead by > trying to be asynchronous!) > > The only scenario that isn't served by this approach is a single-threaded, > asynchronous server with no threading capability. However, such a server > *cannot* be WSGI-compatible and still serve multiple requests, and there is > no way around that without forcing *every* application to be asynchronous, > which just isn't an acceptable tradeoff. The idea of having a flag > (whether passed to start_response, or introspected on the application > object, etc.) doesn't help the fact that the server still has to be able to > *have* multiple threads in such a case. > > Note, by the way, that the need for a second thread is caused by having a > possible difference between the synchrony model of a server and an > application. That is, if both are synchronous or both are asynchronous, no > threading is required. However, a server is not limited to running just > *one* application, so in the general case, a given server has to be able to > handle both. > > However, since the common case is for apps to be synchronous, then the > common case for an asynchronous server is that it must be threaded, and the > common case for a synchronous server is that it need not be > threaded. Thus, logically, the case of an asynchronous application is the > "odd one out", in the sense that it is the only one that ever forces > additional threading, beyond what was inherently required for that server > model. > > In other words, an async server has to have threading in the common case, > and a synchronous application doesn't. So, an async app in an async server > doesn't *add* any threading requirement: the async server already has to > have an I/O thread and at least one application thread. And a synchronous > app doesn't add any additional threading requirements to either kind of > server, for the same reason. Only an asynchronous application in a > synchronous server forces any extra overhead beyond the effective default > required threading configuration. Thus, it makes sense (to me, anyway) to > in that case put the burden on the asynchronous application to manage > communication with its extra thread, if any, or to have it adapt to local > circumstances and behave synchronously (since that's more efficient in that > case). > > But in the end, all of this comes down to a basically simple idea: I think > that in WSGI, synchronous applications should be simple, and asynchronous > applications possible, because that will best support the goals of the PEP. > > From dp at ulaluma.com Fri Sep 17 02:39:36 2004 From: dp at ulaluma.com (Donovan Preston) Date: Fri Sep 17 02:40:12 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> References: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> Message-ID: <0CD0CE45-0842-11D9-AF84-000A95864FC4@ulaluma.com> On Sep 16, 2004, at 1:41 PM, Phillip J. Eby wrote: > resume = environ['wsgi.pause_output']() > > Where 'resume' is then a callback function that can be invoked to > resume iteration. This keeps it to a single extension key, helps > ensure the correct sequence of actions, and makes it easier to > implement in some cases, while not making other cases any harder. Well, I guess I sparked some discussion here. Great! I am +1 on the above construct, calling pause_output and yielding an empty string. I'm glad this technique came up because I hadn't paid enough attention to the environ dict and how it could be used to do something like this. I think with servers providing a pause_output callable like this, asynchronous applications will be possible and the isolation between the layers can be preserved. I am going to try writing some code using this construct and provide further feedback after I do. dp From pje at telecommunity.com Fri Sep 17 02:58:37 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 17 02:58:00 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <6654eac4040916173766fa4cf1@mail.gmail.com> References: <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com> <6654eac4040916161612849362@mail.gmail.com> <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916204439.026362f0@mail.telecommunity.com> At 08:37 PM 9/16/04 -0400, Peter Hunt wrote: >Yes, but an async app running in an async server in a thread is >overkill, don't you think? We don't need to spawn an extra thread to >run it. I'm not talking about "possible", I'm talking about "optimal". Nothing in the spec stops an async server from providing a configuration option to say, "this app+middleware combination is completely non-blocking, so don't bother running it in a separate thread". I've just been speaking about the general case, and what the server is required to do to support the general case of "an arbitrary WSGI application", with no additional information. In the same way, nothing in the spec stops servers from providing per-application configuration options for any number of extended behaviors; WSGI is a starting point for server capabilities, not an ending point. Still, I will admit that I tend to speak of things almost as if WSGI were an ending point, because I just assume we're talking about what the spec should or should not *require* or *forbid*. When a use case doesn't need any "musts" or "must nots" added (like your use case above), I tend not to focus on it directly, because it seems obvious to me that anybody can add it on if they like, as a server-specific extension. So, this may lead sometimes to people getting the impression WSGI doesn't allow a use case that in fact it does; it's just that the use case should be implemented using an optional extension, rather than being considered a common case and made into a requirement. If I tried to enumerate every possible optional extension to WSGI, I'd go mad sooner than you can say "Content-Transfer-Encoding". :) From pje at telecommunity.com Fri Sep 17 03:02:31 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 17 03:01:31 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <0CD0CE45-0842-11D9-AF84-000A95864FC4@ulaluma.com> References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916205856.02637d40@mail.telecommunity.com> At 08:39 PM 9/16/04 -0400, Donovan Preston wrote: >On Sep 16, 2004, at 1:41 PM, Phillip J. Eby wrote: > >> resume = environ['wsgi.pause_output']() >> >>Where 'resume' is then a callback function that can be invoked to resume >>iteration. This keeps it to a single extension key, helps ensure the >>correct sequence of actions, and makes it easier to implement in some >>cases, while not making other cases any harder. > >Well, I guess I sparked some discussion here. Great! I am +1 on the above >construct, calling pause_output and yielding an empty string. I'm glad >this technique came up because I hadn't paid enough attention to the >environ dict and how it could be used to do something like this. > >I think with servers providing a pause_output callable like this, >asynchronous applications will be possible and the isolation between the >layers can be preserved. I am going to try writing some code using this >construct and provide further feedback after I do. Keep in mind that this is proposed as an optional construct, so if the server doesn't provide it, the application iterable will either need to be okay being next()-ed repeatedly, or else "go synchronous" and either do the work in-thread or block on a queue from the I/O thread. And, until I get some feedback on the other part of this (making 'wsgi.input' an iterator too, and having a way to "pause until input"), I'm not ready to add this to the PEP as 'wsgi.pause_output'. But again, nothing stops a server from providing e.g. a 'twisted.pause_output' extension API, with whatever semantics you'd like it to have. From floydophone at gmail.com Fri Sep 17 03:41:14 2004 From: floydophone at gmail.com (Peter Hunt) Date: Fri Sep 17 03:41:20 2004 Subject: [Web-SIG] Updated WSGIHTTPServer.py Message-ID: <6654eac404091618414494b1bb@mail.gmail.com> I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest PEP posted on python.org. http://st0rm.hopto.org/wsgi/ From pje at telecommunity.com Thu Sep 23 03:01:38 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 23 03:00:32 2004 Subject: [Twisted-web] Re: [Web-SIG] WSGI woes In-Reply-To: <0CD0CE45-0842-11D9-AF84-000A95864FC4@ulaluma.com> References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com> <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com> <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com> <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040922205847.024ffde0@mail.telecommunity.com> At 08:39 PM 9/16/04 -0400, Donovan Preston wrote: >On Sep 16, 2004, at 1:41 PM, Phillip J. Eby wrote: > >> resume = environ['wsgi.pause_output']() >> >>Where 'resume' is then a callback function that can be invoked to resume >>iteration. This keeps it to a single extension key, helps ensure the >>correct sequence of actions, and makes it easier to implement in some >>cases, while not making other cases any harder. > >Well, I guess I sparked some discussion here. Great! I am +1 on the above >construct, calling pause_output and yielding an empty string. I'm glad >this technique came up because I hadn't paid enough attention to the >environ dict and how it could be used to do something like this. > >I think with servers providing a pause_output callable like this, >asynchronous applications will be possible and the isolation between the >layers can be preserved. I am going to try writing some code using this >construct and provide further feedback after I do. So... how'd it work out? :) From pje at telecommunity.com Thu Sep 23 03:56:36 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 23 03:55:31 2004 Subject: [Web-SIG] A more Twisted approach to async apps in WSGI Message-ID: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> Hi all. I've been away for a few days due to loss of e-mail service when my dedicated server lost a hard drive. Unfortunately my ISP didn't support the OS version any more, so I had to rebuild everything for the new OS version. Anyway, on to the topic of my post. Should 'wsgi.input' become an iterator? Or should we develop a different API for asynchronous applications? On the positive side of the iterator approach, it could make it easier for asynchronous applications to pause waiting for input, and it could in principle support "chunked" transfer encoding of the input stream. However, since we last discussed this, I did some Googling on CGI and chunked encoding. By far and away, the most popular links regarding chunked encoding and CGI, are all about bugs in IIS and Apache leading to various vulnerabilities when chunked encoding is used. :( Once you get past those items (e.g. by adding "-IIS -vulnerability" to your search), you then find *our* discussion here on the Web-SIG! Finally, digging further, I found some 1998 discussion from the IPP (Internet Printing Protocol!) mailing list about what HTTP/1.1 servers support chunked encoding for CGI and which don't. Anyway, the long and short of it is that CGI and chunked encoding are quite simply incompatible, which means that relying on its availability would be nonportable in a WSGI application anyway. That leaves the asynchronous use case, but the benefit is rather strained at that point. Many frameworks reuse the 'cgi' module's 'FieldStorage' class in order to parse browser input, and the 'cgi' module's implementation requires an object with a 'readline()' method. That means that if we switch from an input stream to an iterator, a lot of people are going to be trying to make sensible wrappers to convert the iterator back to an input stream, and that's just getting ridiculous, especially since in many cases the server or gateway has a file-like object to start with. So, I'm thinking we should shift the burden to an async-specific API. But, in this case, "burden" means that we get to give asynchronous apps an API much more suited to their use cases. Suppose that we did something similar to 'wsgi.file_wrapper'? That is, suppose we had an optional extension that a server could provide, to wrap specialized application object(s) in a fashion that then provides backward compatibility to the spec? That is, suppose we had a 'wsgi.async_wrapper', used like this: if 'wsgi.async_wrapper' in environ: controller=environ['wsgi.async_wrapper'](environ) # do stuff with controller, like register its # methods as callbacks return controller The idea is that this would create an iterator that the server/gateway could recognize as "special", similar to the file-wrapper trick. But, the object returned would provide an extra API for use by the asynchronous application, maybe something like: put(data) -- queue data for retrieval when the controller is iterated over finish() -- mark the iterator finished, so it raises StopIteration on_get(length,callback) -- call 'callback(data)' when 'length' bytes are available on 'wsgi.input' (but return immediately from the 'on_get()' call) While this API is an optional extension, it seems it would be closer to what some async fans wanted, and less of a kludge. It won't do away with the possibility that middleware might block waiting for input, of course, but when no middleware is present or the middleware isn't transforming the input stream, it should work out quite well. In any case, the implementation of the methods and the iterator interface are pretty straightforward, either for synchronous or asynchronous servers. What do y'all think? I'd especially like feedback from Twisted folk, as to whether this looks anything like the right kind of API for async apps. (I expect it will need some tweaking and tuning.) But if this is the overall right approach, I'd like to drop the current proposals to make 'wsgi.input' an iterator and add optional 'pause'/'resume' APIs, since they were rather kludgy compared to giving async apps their own mini-API for nonblocking I/O. Comments? Questions? From pje at telecommunity.com Thu Sep 23 04:41:49 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 23 04:40:44 2004 Subject: [Web-SIG] Updated WSGIHTTPServer.py Message-ID: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com> >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest >PEP posted on python.org. > >http://st0rm.hopto.org/wsgi/ FYI, there's an error in your WSGIHTTPServer implementation: it sends a 'Status: XXX etc' header to the client, but the correct format for HTTP is just the "XXX etc" part. Looks like you might've copied that part from the PEP's CGI example. This error is probably being masked by the fact that you're also sending the status to the client when start_response is initially called, rather than delaying until the first write operation or non-empty yielded string. Also, 'start_response' doesn't actually re-raise 'exc_info' as it should; it only prints the exception to stderr. You should also not use 'map()' to wrap the application result iterator. It's not illegal, but it's ill-advised since an application is allowed to produce an unlimited number of empty strings in its output, resulting in unbounded growth of the list that could use up arbitrarily large amounts of memory. Finally, while this is not a violation of the spec in any way, I notice that your approach to loading application scripts will recompile and reload them on every hit. I don't know if this was intentional or not. Oh, and one last thing... you're checking for 'HTTPS=on' in the environment, but that's not where it would be found, because your code is the only code that could set it. I don't know if the stdlib HTTP server supports HTTPS, but if it does, you should check the appropriate attribute or method instead. Otherwise, it suffices to always set 'wsgi.url_scheme' to "http". From wilk-ml at flibuste.net Thu Sep 23 10:33:33 2004 From: wilk-ml at flibuste.net (William Dode) Date: Thu Sep 23 10:33:34 2004 Subject: [Web-SIG] Re: [Twisted-web] A more Twisted approach to async apps in WSGI In-Reply-To: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> (Phillip J. Eby's message of "Wed, 22 Sep 2004 21:56:36 -0400") References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> Message-ID: <87wtyliduq.fsf@blakie.riol> "Phillip J. Eby" writes: > Hi all. I've been away for a few days due to loss of e-mail service > when my dedicated server lost a hard drive. Unfortunately my ISP > didn't support the OS version any more, so I had to rebuild everything > for the new OS version. > > Anyway, on to the topic of my post. Should 'wsgi.input' become an > iterator? Or should we develop a different API for asynchronous > applications? > > On the positive side of the iterator approach, it could make it easier > for asynchronous applications to pause waiting for input, and it could > in principle support "chunked" transfer encoding of the input stream. > > However, since we last discussed this, I did some Googling on CGI and > chunked encoding. By far and away, the most popular links regarding > chunked encoding and CGI, are all about bugs in IIS and Apache leading > to various vulnerabilities when chunked encoding is used. :( > > Once you get past those items (e.g. by adding "-IIS -vulnerability" to > your search), you then find *our* discussion here on the Web-SIG! > Finally, digging further, I found some 1998 discussion from the IPP > (Internet Printing Protocol!) mailing list about what HTTP/1.1 servers > support chunked encoding for CGI and which don't. > > Anyway, the long and short of it is that CGI and chunked encoding are > quite simply incompatible, which means that relying on its > availability would be nonportable in a WSGI application anyway. I don't understand the problem with an iterator on CGI. A CGI script is by definition multi-process. If one block, a new script will be run and anyway the first client will wait... If no one block, an iterator or not will not change anything for him. It will be up to the server to decide if he can use chunked encoding or not. If the script block and doesn't use chunked encoding, it will be not possible to run the script in cgi anyway... I know people who use chunked encoding in cgi, they know what they do and it's fine, i'm sure they will use iterator. I don't see the difference between [sleep...] [sleep...] [sleep...] return data and [sleep...] yield [sleep...] yield [sleep...] yield for a cgi script if it's not possible to don't sleep. -- William Dod? - http://flibuste.net From pje at telecommunity.com Thu Sep 23 15:04:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 23 15:03:29 2004 Subject: [Web-SIG] Re: [Twisted-web] A more Twisted approach to async apps in WSGI In-Reply-To: <87wtyliduq.fsf@blakie.riol> References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040923090205.02f25bd0@mail.telecommunity.com> At 10:33 AM 9/23/04 +0200, William Dode wrote: >I don't see the difference between > >[sleep...] >[sleep...] >[sleep...] >return data > >and > >[sleep...] >yield >[sleep...] >yield >[sleep...] >yield > >for a cgi script if it's not possible to don't sleep. As previously discussed, the existence of an asynchronous API only matters for asynchronous servers and gateways. From floydophone at gmail.com Thu Sep 23 21:36:04 2004 From: floydophone at gmail.com (Peter Hunt) Date: Thu Sep 23 21:36:19 2004 Subject: [Web-SIG] Updated WSGIHTTPServer.py In-Reply-To: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com> References: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com> Message-ID: <6654eac404092312365ff04728@mail.gmail.com> Thanks for taking a look. I very very quickly upgraded it by ripping out a lot of the spec's code, and my example app ran OK, so I put it up. I'll make those fixes soon. Also, I'm pretty sure that execfile _will_ reload application scripts, but I may be wrong. On Wed, 22 Sep 2004 22:41:49 -0400, Phillip J. Eby wrote: > >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest > >PEP posted on python.org. > > > >http://st0rm.hopto.org/wsgi/ > > FYI, there's an error in your WSGIHTTPServer implementation: it sends a > 'Status: XXX etc' header to the client, but the correct format for HTTP is > just the "XXX etc" part. Looks like you might've copied that part from the > PEP's CGI example. This error is probably being masked by the fact that > you're also sending the status to the client when start_response is > initially called, rather than delaying until the first write operation or > non-empty yielded string. Also, 'start_response' doesn't actually re-raise > 'exc_info' as it should; it only prints the exception to stderr. > > You should also not use 'map()' to wrap the application result > iterator. It's not illegal, but it's ill-advised since an application is > allowed to produce an unlimited number of empty strings in its output, > resulting in unbounded growth of the list that could use up arbitrarily > large amounts of memory. > > Finally, while this is not a violation of the spec in any way, I notice > that your approach to loading application scripts will recompile and reload > them on every hit. I don't know if this was intentional or not. > > Oh, and one last thing... you're checking for 'HTTPS=on' in the > environment, but that's not where it would be found, because your code is > the only code that could set it. I don't know if the stdlib HTTP server > supports HTTPS, but if it does, you should check the appropriate attribute > or method instead. Otherwise, it suffices to always set 'wsgi.url_scheme' > to "http". > > From floydophone at gmail.com Sun Sep 26 16:29:37 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sun Sep 26 16:29:40 2004 Subject: [Web-SIG] Updated WSGIHTTPServer.py In-Reply-To: <6654eac404092312365ff04728@mail.gmail.com> References: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com> <6654eac404092312365ff04728@mail.gmail.com> Message-ID: <6654eac404092607293c0b3e1e@mail.gmail.com> I uploaded the fixed WSGIHTTPServer.py. I'm going to rework it pretty substantially pretty soon (probably implemented using Medusa or Twisted) and streamline it. It's pretty rough as it is right now, but it works. On Thu, 23 Sep 2004 15:36:04 -0400, Peter Hunt wrote: > Thanks for taking a look. I very very quickly upgraded it by ripping > out a lot of the spec's code, and my example app ran OK, so I put it > up. > > I'll make those fixes soon. > > Also, I'm pretty sure that execfile _will_ reload application scripts, > but I may be wrong. > > > > On Wed, 22 Sep 2004 22:41:49 -0400, Phillip J. Eby > wrote: > > >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest > > >PEP posted on python.org. > > > > > >http://st0rm.hopto.org/wsgi/ > > > > FYI, there's an error in your WSGIHTTPServer implementation: it sends a > > 'Status: XXX etc' header to the client, but the correct format for HTTP is > > just the "XXX etc" part. Looks like you might've copied that part from the > > PEP's CGI example. This error is probably being masked by the fact that > > you're also sending the status to the client when start_response is > > initially called, rather than delaying until the first write operation or > > non-empty yielded string. Also, 'start_response' doesn't actually re-raise > > 'exc_info' as it should; it only prints the exception to stderr. > > > > You should also not use 'map()' to wrap the application result > > iterator. It's not illegal, but it's ill-advised since an application is > > allowed to produce an unlimited number of empty strings in its output, > > resulting in unbounded growth of the list that could use up arbitrarily > > large amounts of memory. > > > > Finally, while this is not a violation of the spec in any way, I notice > > that your approach to loading application scripts will recompile and reload > > them on every hit. I don't know if this was intentional or not. > > > > Oh, and one last thing... you're checking for 'HTTPS=on' in the > > environment, but that's not where it would be found, because your code is > > the only code that could set it. I don't know if the stdlib HTTP server > > supports HTTPS, but if it does, you should check the appropriate attribute > > or method instead. Otherwise, it suffices to always set 'wsgi.url_scheme' > > to "http". > > > > > From floydophone at gmail.com Sun Sep 26 17:10:49 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sun Sep 26 17:10:51 2004 Subject: [Web-SIG] Updated WSGIHTTPServer.py In-Reply-To: <6654eac404092607293c0b3e1e@mail.gmail.com> References: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com> <6654eac404092312365ff04728@mail.gmail.com> <6654eac404092607293c0b3e1e@mail.gmail.com> Message-ID: <6654eac4040926081030a7ada6@mail.gmail.com> In addition, I fixed an embarrassing bug in which it deleted querystrings. I'm going to improve on it a lot as time goes on: moving away from using execfile and dealing with headers in a cleaner fashion. I also uploaded my testhttpserver.py script, which contains three simple test scripts for it. It depends on my new middleware.py module, something which may turn into a sort of WSGI middleware library. Maybe we should collaborate on a "standard extensions" type of library? By the way, to avoid embarrassing bugs such as mine, and since the spec is finally nearing completion, we should write some unit tests to ensure compatibility across WSGI implementations. On Sun, 26 Sep 2004 10:29:37 -0400, Peter Hunt wrote: > I uploaded the fixed WSGIHTTPServer.py. I'm going to rework it pretty > substantially pretty soon (probably implemented using Medusa or > Twisted) and streamline it. It's pretty rough as it is right now, but > it works. > > > > > On Thu, 23 Sep 2004 15:36:04 -0400, Peter Hunt wrote: > > Thanks for taking a look. I very very quickly upgraded it by ripping > > out a lot of the spec's code, and my example app ran OK, so I put it > > up. > > > > I'll make those fixes soon. > > > > Also, I'm pretty sure that execfile _will_ reload application scripts, > > but I may be wrong. > > > > > > > > On Wed, 22 Sep 2004 22:41:49 -0400, Phillip J. Eby > > wrote: > > > >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest > > > >PEP posted on python.org. > > > > > > > >http://st0rm.hopto.org/wsgi/ > > > > > > FYI, there's an error in your WSGIHTTPServer implementation: it sends a > > > 'Status: XXX etc' header to the client, but the correct format for HTTP is > > > just the "XXX etc" part. Looks like you might've copied that part from the > > > PEP's CGI example. This error is probably being masked by the fact that > > > you're also sending the status to the client when start_response is > > > initially called, rather than delaying until the first write operation or > > > non-empty yielded string. Also, 'start_response' doesn't actually re-raise > > > 'exc_info' as it should; it only prints the exception to stderr. > > > > > > You should also not use 'map()' to wrap the application result > > > iterator. It's not illegal, but it's ill-advised since an application is > > > allowed to produce an unlimited number of empty strings in its output, > > > resulting in unbounded growth of the list that could use up arbitrarily > > > large amounts of memory. > > > > > > Finally, while this is not a violation of the spec in any way, I notice > > > that your approach to loading application scripts will recompile and reload > > > them on every hit. I don't know if this was intentional or not. > > > > > > Oh, and one last thing... you're checking for 'HTTPS=on' in the > > > environment, but that's not where it would be found, because your code is > > > the only code that could set it. I don't know if the stdlib HTTP server > > > supports HTTPS, but if it does, you should check the appropriate attribute > > > or method instead. Otherwise, it suffices to always set 'wsgi.url_scheme' > > > to "http". > > > > > > > > > From floydophone at gmail.com Sun Sep 26 17:31:15 2004 From: floydophone at gmail.com (Peter Hunt) Date: Sun Sep 26 17:31:17 2004 Subject: [Web-SIG] PEP suggestions Message-ID: <6654eac40409260831b7e05ec@mail.gmail.com> I've been reading through the PEP, and I've been having trouble following the code in some parts. Eventually, I got it, but I really think that run_with_cgi() could use some heavy commenting. Perhaps the other code samples could, too. From pje at telecommunity.com Mon Sep 27 04:24:48 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Sep 27 04:23:34 2004 Subject: [Web-SIG] PEP suggestions In-Reply-To: <6654eac40409260831b7e05ec@mail.gmail.com> Message-ID: <5.1.1.6.0.20040926222434.03880b90@mail.telecommunity.com> At 11:31 AM 9/26/04 -0400, Peter Hunt wrote: >I've been reading through the PEP, and I've been having trouble >following the code in some parts. Eventually, I got it, but I really >think that run_with_cgi() could use some heavy commenting. Perhaps the >other code samples could, too. Feel free to send diffs. ;) From paul.boddie at ementor.no Mon Sep 27 13:13:12 2004 From: paul.boddie at ementor.no (Paul Boddie) Date: Mon Sep 27 13:13:28 2004 Subject: [Web-SIG] WebStack 0.7 Message-ID: <0F4BD34E02639E428B4654DCBAB4502D0B1BAB@100NOOSLMSG004.common.alpharoot.net> Hello, Just a quick note to say that WebStack 0.7 has been released. More information here: http://www.python.org/pypi?%3Aaction=search&name=WebStack Compared to previous releases, this one is a lot more strict and specific about various things such as character encodings, request parameters, authentication, cookies and so on, but additional functionality has also been introduced: for example, Zope 2.x products can now be written using the WebStack API. Have fun, Paul From mnot at mnot.net Tue Sep 28 20:02:13 2004 From: mnot at mnot.net (Mark Nottingham) Date: Tue Sep 28 20:02:18 2004 Subject: [Web-SIG] HTTP 1.1 trailers Message-ID: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net> I just realised that WGSI doesn't allow applications to send headers as trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's OK, as pretty much nobody uses them, and it would require a pretty radical change in WGSI's design to support them, but I think the PEP should mention it. Cheers, -- Mark Nottingham http://www.mnot.net/ From foom at fuhm.net Tue Sep 28 23:01:02 2004 From: foom at fuhm.net (James Y Knight) Date: Tue Sep 28 23:01:06 2004 Subject: [Web-SIG] HTTP 1.1 trailers In-Reply-To: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net> References: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net> Message-ID: <81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net> On Sep 28, 2004, at 2:02 PM, Mark Nottingham wrote: > I just realised that WGSI doesn't allow applications to send headers > as trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's > OK, as pretty much nobody uses them, and it would require a pretty > radical change in WGSI's design to support them, but I think the PEP > should mention it. Nah, it's pretty easy for a webserver to add this feature as a WSGI extension, and for a client to do: if 'mycoolwebserver.set_trailers' in environ: environ['mycoolwebserver.set_trailers']([('Content-MD5', 'blahblah')]) Since it's easy to add as an implementation specific enhancement, and since trailers are very close to completely useless, I don't think it really needs to be in the core standard. James From mnot at mnot.net Tue Sep 28 23:06:09 2004 From: mnot at mnot.net (Mark Nottingham) Date: Tue Sep 28 23:06:13 2004 Subject: [Web-SIG] HTTP 1.1 trailers In-Reply-To: <81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net> References: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net> <81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net> Message-ID: <387227E4-1192-11D9-88DC-000A95BD86C0@mnot.net> /me hits head; good point. Cheers, On Sep 28, 2004, at 2:01 PM, James Y Knight wrote: > > On Sep 28, 2004, at 2:02 PM, Mark Nottingham wrote: > >> I just realised that WGSI doesn't allow applications to send headers >> as trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's >> OK, as pretty much nobody uses them, and it would require a pretty >> radical change in WGSI's design to support them, but I think the PEP >> should mention it. > > Nah, it's pretty easy for a webserver to add this feature as a WSGI > extension, and for a client to do: > if 'mycoolwebserver.set_trailers' in environ: > environ['mycoolwebserver.set_trailers']([('Content-MD5', > 'blahblah')]) > > Since it's easy to add as an implementation specific enhancement, and > since trailers are very close to completely useless, I don't think it > really needs to be in the core standard. > > James > > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net > -- Mark Nottingham http://www.mnot.net/ From pje at telecommunity.com Wed Sep 29 00:51:33 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 29 00:51:46 2004 Subject: [Web-SIG] HTTP 1.1 trailers In-Reply-To: <81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net> References: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net> <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net> Message-ID: <5.1.1.6.0.20040928184536.02a884d0@mail.telecommunity.com> At 05:01 PM 9/28/04 -0400, James Y Knight wrote: >On Sep 28, 2004, at 2:02 PM, Mark Nottingham wrote: > >>I just realised that WGSI doesn't allow applications to send headers as >>trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's OK, as >>pretty much nobody uses them, and it would require a pretty radical >>change in WGSI's design to support them, but I think the PEP should mention it. > >Nah, it's pretty easy for a webserver to add this feature as a WSGI >extension, and for a client to do: > if 'mycoolwebserver.set_trailers' in environ: > environ['mycoolwebserver.set_trailers']([('Content-MD5', 'blahblah')]) It's actually a bit more complex than that, since it needs to follow the procedures for "safe exts", from paragraph 4 of: http://www.python.org/peps/pep-0333.html#server-extension-apis Keep in mind that an intervening piece of middleware might want to munge some headers, and if it doesn't support the trailer extension, stuff can break. Essentially, the set_trailers extension would need to take start_response as a parameter so it can ensure that middleware hasn't replaced it. Anyway, this definitely falls into the "diminishing returns" bucket. (By the way, James, did you see my proposal for "A more Twisted approach to async apps in WSGI"? Do you think it's better than the previous "pause iteration" proposal, or worse? I'd really like to get a WSGI async API nailed down soon so we can look into finalizing the PEP.) From mnot at mnot.net Wed Sep 29 02:24:59 2004 From: mnot at mnot.net (Mark Nottingham) Date: Wed Sep 29 02:25:19 2004 Subject: [Web-SIG] PEP 333 (19-Sep-04) Feedback Message-ID: Overall, this PEP looks really good; these comments are mostly nits and editorial points to make it more precise, clear, etc. * In "Specification Details," the start_response callable has illustrative arguments of "status" and "headers." It would be *very* helpful if the latter were called "response_headers," for clarity. * The same section later states "The application object must return an iterable yielding strings." Return when? We're cautioned that the write() callable should not be used; how is the iterable returned, then? * Later, "The server or gateway must not modify supplied strings in any way..." This effectively rules out the server/gateway implementing transfer-encodings, range requests, delta encoding, automatic content encoding, etc. Suggest dropping this paragraph; it doesn't really add any value, as servers that are malicious or incorrect in this respect won't really be stopped by it anyway. * In "environ Variables," it is specified that "In general, a server or gateway should attempt to provide as many other CGI variables as are applicable, including e.g. the nonstandard SSL variables such as HTTPS=on , if an SSL connection is in effect." This sentence hedges in four different ways; "In general," "should," "attempt," "as many... as are applicable." Besides the redundancy, I'm concerned about the inclusion of nonstandard variables; how will people know which ones to include? I'd suggest listing those that aren't in the CGI standard, so there's an even playing field. * Later in the same section, a construct called a 'stream' is defined. It would be good to directly relate this to a 'file-like object,' for the benefit of readers familiar with the terms used in the documentation of Python's standard library. * The same section defines a number of environment variables with Boolean values (e.g., wsgi.multithread). When these definitions say "This value should be true if..." does it mean that they should be a Python types.BooleanType, or that it should evaluate to true (e.g., if wgsi.multithread: ...)? * In 'Input and Error Streams', item 4 in the numbered list of notes to the table says 'Since the errors stream may not be rewound, a container..." This is the first instance of the term 'container'; could an existing term be used? * In "The start_response() Callable", it says "The status argument is an HTTP "status" string like "200 OK" or "404 Not Found." This should reference the definition of status strings in the specification; suggest "The status argument is a string consisting of a Status-Code and a Reason-Phrase, in that order and separated by a single space, with no surrounding whitespace or other characters. See RFC2616, Section 6.1.1 for more information." * In the next paragraph, "Each header_name must be a valid HTTP header name." For the same reasons as above, suggest "Each header_name must be a HTTP header field-name, as defined in RFC2616 Section 4.2." * In the next paragraph, "If the application omits a needed header, the server or gateway should add it." Who determines whether it's needed? Suggest "If the application omits a header required by HTTP or other relevant specifications in effect, the server or gateway must add it." (note must, not should) * The next paragraph is confusingly worded; I'd suggest "The server or gateway must not actually transmit the HTTP headers until the first write call, or until after the first iteration of the application return value that yeilds a non-empty string...." * "Buffering and Streaming," is, again, confusing about when an iterator is supposed to be returned. * Finally, "Other HTTP Features" states that "In a sense, a server should consider itself to be like an HTTP 'proxy server'..." This isn't a good analogy; the function it performs is much closer to an HTTP gateway; See the terminology section of RFC2616. Cheers, -- Mark Nottingham http://www.mnot.net/ From ianb at colorstudy.com Wed Sep 29 06:47:55 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Sep 29 06:47:58 2004 Subject: [Web-SIG] WSGI tests Message-ID: <415A3E7B.4020706@colorstudy.com> I've written some code for testing WSGI applications and servers. As before, it's at svn://colorstudy.com/trunk/WSGI , or http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/ The test so far has three parts. There's a simple "echo" application; it actually does several things depending on what variables you give it. There's a "lint" middleware. It checks for both server and application compliance with WSGI. Then there's a test that fetches pages (via urllib) and interprets the response (tests/echotest.py). The idea is that these can be recombined in some ways. The echo application will probably be expanded to do more things, and to better exercise WSGI; e.g., calling start_response twice, using write and iterators at the same time, etc. It could also be expanded to perform illegal operations, e.g., call write inside the iterator, to see what happens in these cases. Another option would be some middleware that takes the output of any application, and plays around with it to exercise all of WSGI. Either way, the echo application could be implemented under different frameworks, and once it's implemented you could run these other tests against your framework. Then there's the lint middleware. This doesn't modify the request in any way (though it does wrap start_response and other objects). It just checks various things; right now it mostly checks that required environmental variables are there and that everything is of the right type. It doesn't test any of the more subtle aspects of WSGI, or test any failure cases. It doesn't test the exc_info stuff either; I haven't kept up, and I only partly understand the motivation there. Then there's the system/functional test (echotest). Right now it's just a bunch of asserts, but I'll refactor it for unittest soon. The idea is that in addition to doing some tests directly against echo, this also exercises portions that lint or other middleware is implicitly testing. Anyway, that's what I got now. Not a ton of code (despite this long email). Suggestions welcome. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Sep 29 06:57:46 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 29 06:58:05 2004 Subject: [Web-SIG] PEP 333 (19-Sep-04) Feedback In-Reply-To: Message-ID: <5.1.1.6.0.20040929004017.02813a60@mail.telecommunity.com> At 05:24 PM 9/28/04 -0700, Mark Nottingham wrote: >Overall, this PEP looks really good; these comments are mostly nits and >editorial points to make it more precise, clear, etc. > >* In "Specification Details," the start_response callable has illustrative >arguments of "status" and "headers." It would be *very* helpful if the >latter were called "response_headers," for clarity. Will do. >* The same section later states "The application object must return an >iterable yielding strings." Return when? When it's called, of course. I'll change that to, "When called, the application object must..." > We're cautioned that the write() callable should not be used; how is the > iterable returned, then? Huh? >* Later, "The server or gateway must not modify supplied strings in any >way..." This effectively rules out the server/gateway implementing >transfer-encodings, range requests, delta encoding, automatic content >encoding, etc. Suggest dropping this paragraph; it doesn't really add any >value, as servers that are malicious or incorrect in this respect won't >really be stopped by it anyway. I'll take out the modify supplied strings in any way part, but I think it's important to point out that the strings are binary byte sequences. I'll consider some alternatives here. >* In "environ Variables," it is specified that "In general, a server or >gateway should attempt to provide as many other CGI variables as are >applicable, including e.g. the nonstandard SSL variables such as HTTPS=on >, if an SSL connection is in effect." This sentence hedges in four >different ways; "In general," "should," "attempt," "as many... as are >applicable." Besides the redundancy, I'm concerned about the inclusion of >nonstandard variables; how will people know which ones to include? I'd >suggest listing those that aren't in the CGI standard, so there's an even >playing field. Is there a standard for SSL extensions to CGI? These are really the only "non-standard" variables I actually care about. I'll tweak the rest of this more or less as you suggest. >* Later in the same section, a construct called a 'stream' is defined. It >would be good to directly relate this to a 'file-like object,' for the >benefit of readers familiar with the terms used in the documentation of >Python's standard library. Will do. >* The same section defines a number of environment variables with Boolean >values (e.g., wsgi.multithread). When these definitions say "This value >should be true if..." does it mean that they should be a Python >types.BooleanType, or that it should evaluate to true (e.g., if >wgsi.multithread: ...)? The latter; I thought this was obvious by virtue of the fact that it doesn't say ``True`` in typewriter font. Good Python style (and performance) demands that one never perform truth tests by comparing directly to ``True`` or ``False``, so in theory it shouldn't matter unless you want to be tricky and use the value as an index. Were you actually confused by this bit, or are you just looking for ambiguities? I'd like to avoid cluttering these definitions further, if possible. >* In 'Input and Error Streams', item 4 in the numbered list of notes to >the table says 'Since the errors stream may not be rewound, a >container..." This is the first instance of the term 'container'; could an >existing term be used? Argh. Pollution carried through from the original December 2003 draft... will fix. >* In "The start_response() Callable", it says "The status argument is an >HTTP "status" string like "200 OK" or "404 Not Found." This should >reference the definition of status strings in the specification; suggest >"The status argument is a string consisting of a Status-Code and a >Reason-Phrase, in that order and separated by a single space, with no >surrounding whitespace or other characters. See RFC2616, Section 6.1.1 for >more information." Okay. >* In the next paragraph, "Each header_name must be a valid HTTP header >name." For the same reasons as above, suggest "Each header_name must be a >HTTP header field-name, as defined in RFC2616 Section 4.2." Okay. >* In the next paragraph, "If the application omits a needed header, the >server or gateway should add it." Who determines whether it's needed? >Suggest "If the application omits a header required by HTTP or other relevant >specifications in effect, the server or gateway must add it." (note must, >not should) Sure. >* The next paragraph is confusingly worded; I'd suggest "The server or >gateway must not actually transmit the HTTP headers until the first write >call, or until after the first iteration of the application return value >that yeilds a non-empty string...." Your phrasing doesn't work either, because 'start_response()' can't wait around until those things happen; it has to return immediately. I'll try another phrasing. >* "Buffering and Streaming," is, again, confusing about when an iterator >is supposed to be returned. An application always returns an iterable when called, as per "The application object must return an iterable yielding strings." in "Specification Details". I'll put another note about this under "The Application/Framework Side". Keep in mind that applications *always always always* MUST return an iterable, with absolutely no exceptions ever. Use of 'write()' does not absolve an application from returning an iterable. (I'll add a note to this effect in the section on 'write()'. >* Finally, "Other HTTP Features" states that "In a sense, a server should >consider itself to be like an HTTP 'proxy server'..." This isn't a good >analogy; the function it performs is much closer to an HTTP gateway; See >the terminology section of RFC2616. Will do. From pje at telecommunity.com Wed Sep 29 07:05:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 29 07:05:47 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <415A3E7B.4020706@colorstudy.com> Message-ID: <5.1.1.6.0.20040929010139.020e8af0@mail.telecommunity.com> At 11:47 PM 9/28/04 -0500, Ian Bicking wrote: >I've written some code for testing WSGI applications and servers. As >before, it's at svn://colorstudy.com/trunk/WSGI , or >http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/ > >The test so far has three parts. There's a simple "echo" application; it >actually does several things depending on what variables you give it. FYI, 'echo.application' does not return an iterable, and is therefore not a valid application object. The 'lint' application also has a path that returns None. The part of the spec that allowed applications to return None instead of an iterable has been gone from the spec for weeks; I mentioned its removal in one of my regular "recent changes to the spec" posts here. Applications *must* always return an iterable. From pje at telecommunity.com Wed Sep 29 07:21:11 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 29 07:21:29 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <415A3E7B.4020706@colorstudy.com> Message-ID: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> At 11:47 PM 9/28/04 -0500, Ian Bicking wrote: >Then there's the lint middleware. This doesn't modify the request in any >way (though it does wrap start_response and other objects). The wrapper is broken: 'exc_info = args[3]' should be 'exc_info = args[2]'. > It just checks various things; right now it mostly checks that required > environmental variables are there and that everything is of the right type. Some of the variables you're checking for are not actually required any more; see http://www.python.org/peps/pep-0333.html#environ-variables for details. Also, your header checks are requiring non-duplicated headers, but duplicate header names are in fact allowed, per discussion on the list. But, this isn't explicitly stated in the spec, so I should fix that. I'm also not positive that a Content-Type header is absolutely required, e.g. for redirects. I guess I should dig up the HTTP spec on this point. > It doesn't test any of the more subtle aspects of WSGI, or test any > failure cases. Apart from the fact that it doesn't always return an iterable, the lint app is WSGI compliant, but "overprotective", in that it requires things not required by the spec. Other than those nits, it's a pretty nice piece of middleware and I'll probably use it to help in writing a WSGI "reference library". > It doesn't test the exc_info stuff either; I haven't kept up, and I > only partly understand the motivation there. exc_info should be a three-element tuple containing a type, an instance of the type, and a traceback object. If start_response() is called more than once, it's a fatal error not to include exc_info (because the only time it's valid to call start_response() a second time is if an error occurred while you were writing or yielding output). If exc_info is supplied and headers have already been sent to the server, the server *must* raise an error, and *should* raise the supplied exc_info triplet. So, some of these things can be tested by your 'lint' program. See also: http://www.python.org/peps/pep-0333.html#the-start-response-callable from paragraph 7 on. From ianb at colorstudy.com Wed Sep 29 09:19:28 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Sep 29 09:19:33 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> References: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> Message-ID: <415A6200.4000802@colorstudy.com> Phillip J. Eby wrote: > At 11:47 PM 9/28/04 -0500, Ian Bicking wrote: > >> Then there's the lint middleware. This doesn't modify the request in >> any way (though it does wrap start_response and other objects). > > > The wrapper is broken: 'exc_info = args[3]' should be 'exc_info = args[2]'. Fixed. > >> It just checks various things; right now it mostly checks that >> required environmental variables are there and that everything is of >> the right type. > > > Some of the variables you're checking for are not actually required any > more; see > > http://www.python.org/peps/pep-0333.html#environ-variables > > for details. The only one I was mistakenly requiring seems to be QUERY_STRING; from my reading, all these are required: 'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', 'SERVER_PORT' Well, maybe SCRIPT_NAME isn't required. > Also, your header checks are requiring non-duplicated headers, but > duplicate header names are in fact allowed, per discussion on the list. > But, this isn't explicitly stated in the spec, so I should fix that. > > I'm also not positive that a Content-Type header is absolutely required, > e.g. for redirects. I guess I should dig up the HTTP spec on this point. I believe it is required for any response that has a body, but it's true that's not all responses. There's some 2xx responses that have no body. I've taken out the requirement, but noted that it should be in there somewhere. I'm okay if this embodies some requirements of HTTP inaddition to specifically WSGI requirements. >> It doesn't test any of the more subtle aspects of WSGI, or test any >> failure cases. > > > Apart from the fact that it doesn't always return an iterable, the lint > app is WSGI compliant, but "overprotective", in that it requires things > not required by the spec. > > Other than those nits, it's a pretty nice piece of middleware and I'll > probably use it to help in writing a WSGI "reference library". > > >> It doesn't test the exc_info stuff either; I haven't kept up, and I >> only partly understand the motivation there. > > > exc_info should be a three-element tuple containing a type, an instance > of the type, and a traceback object. If start_response() is called more > than once, it's a fatal error not to include exc_info (because the only > time it's valid to call start_response() a second time is if an error > occurred while you were writing or yielding output). If exc_info is > supplied and headers have already been sent to the server, the server > *must* raise an error, and *should* raise the supplied exc_info > triplet. So, some of these things can be tested by your 'lint' program. I suppose I could trigger these conditions in echo, and then test that they are handled properly in lint. I'll have to think about what exactly "properly" is first. > See also: > > http://www.python.org/peps/pep-0333.html#the-start-response-callable > > from paragraph 7 on. I read that, and didn't feel entirely clear on the intention. An example in that section would probably be helpful. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Sep 29 17:07:10 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 29 17:07:36 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <415A6200.4000802@colorstudy.com> References: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> At 02:19 AM 9/29/04 -0500, Ian Bicking wrote: >The only one I was mistakenly requiring seems to be QUERY_STRING; from my >reading, all these are required: > >'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', 'SERVER_PORT' > >Well, maybe SCRIPT_NAME isn't required. Or PATH_INFO - if the request is addressed directly to the application, and there's no trailing '/', it can be empty, and is therefore allowed to be missing, as in CGI. >I suppose I could trigger these conditions in echo, and then test that >they are handled properly in lint. I'll have to think about what exactly >"properly" is first. > >>See also: >> http://www.python.org/peps/pep-0333.html#the-start-response-callable >>from paragraph 7 on. > >I read that, and didn't feel entirely clear on the intention. An example >in that section would probably be helpful. I'll see what I can do. By the way, I found another issue with lint: IteratorWrapper doesn't close the original iterable if it had a close() method. From ianb at colorstudy.com Wed Sep 29 18:21:52 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Sep 29 18:22:48 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> References: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> Message-ID: <415AE120.10609@colorstudy.com> Phillip J. Eby wrote: > At 02:19 AM 9/29/04 -0500, Ian Bicking wrote: > >> The only one I was mistakenly requiring seems to be QUERY_STRING; from >> my reading, all these are required: >> >> 'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', >> 'SERVER_PORT' >> >> Well, maybe SCRIPT_NAME isn't required. > > > Or PATH_INFO - if the request is addressed directly to the application, > and there's no trailing '/', it can be empty, and is therefore allowed > to be missing, as in CGI. OK, fixed. > By the way, I found another issue with lint: IteratorWrapper doesn't > close the original iterable if it had a close() method. Fixed as well. Also, I added back in the content-type check, unless there's a response code of 204 No Content; I think that's the only response code where there shouldn't be a content-type. I'd rather be a little overly restrictive. It's a useful check, because most frameworks have default content-types, and WSGI does not. And some browsers (specifically IE) try to fix broken content-types. And some servers add default content-types, e.g., Apache's DefaultType. So it's a bug that might be missed. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From mnot at mnot.net Wed Sep 29 18:34:42 2004 From: mnot at mnot.net (Mark Nottingham) Date: Wed Sep 29 18:34:45 2004 Subject: [Web-SIG] PEP 333 (19-Sep-04) Feedback In-Reply-To: <5.1.1.6.0.20040929004017.02813a60@mail.telecommunity.com> References: <5.1.1.6.0.20040929004017.02813a60@mail.telecommunity.com> Message-ID: <7748360D-1235-11D9-88DC-000A95BD86C0@mnot.net> Thanks for the quick response. Answers inline below. On Sep 28, 2004, at 9:57 PM, Phillip J. Eby wrote: >> * The same section later states "The application object must return >> an iterable yielding strings." Return when? > > When it's called, of course. I'll change that to, "When called, the > application object must..." > > >> We're cautioned that the write() callable should not be used; how is >> the iterable returned, then? > > Huh? I found the flow of calls confusing in this section; I'll think on how to improve it and make a concrete suggestion if I come up with something. >> * In "environ Variables," it is specified that "In general, a server >> or gateway should attempt to provide as many other CGI variables as >> are applicable, including e.g. the nonstandard SSL variables such as >> HTTPS=on , if an SSL connection is in effect." This sentence hedges >> in four different ways; "In general," "should," "attempt," "as >> many... as are applicable." Besides the redundancy, I'm concerned >> about the inclusion of nonstandard variables; how will people know >> which ones to include? I'd suggest listing those that aren't in the >> CGI standard, so there's an even playing field. > > Is there a standard for SSL extensions to CGI? These are really the > only "non-standard" variables I actually care about. I'll tweak the > rest of this more or less as you suggest. Not to my knowledge; maybe just document that one and don't mention others. >> * The same section defines a number of environment variables with >> Boolean values (e.g., wsgi.multithread). When these definitions say >> "This value should be true if..." does it mean that they should be a >> Python types.BooleanType, or that it should evaluate to true (e.g., >> if wgsi.multithread: ...)? > > The latter; I thought this was obvious by virtue of the fact that it > doesn't say ``True`` in typewriter font. Good Python style (and > performance) demands that one never perform truth tests by comparing > directly to ``True`` or ``False``, so in theory it shouldn't matter > unless you want to be tricky and use the value as an index. > > Were you actually confused by this bit, or are you just looking for > ambiguities? I'd like to avoid cluttering these definitions further, > if possible. Looking for ambiguities. Couldn't you fix this by saying "The value should evaluate to true if..."? Cheers, -- Mark Nottingham http://www.mnot.net/ From pje at telecommunity.com Wed Sep 29 18:36:16 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 29 18:36:44 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <415AE120.10609@colorstudy.com> References: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com> At 11:21 AM 9/29/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>At 02:19 AM 9/29/04 -0500, Ian Bicking wrote: >> >>>The only one I was mistakenly requiring seems to be QUERY_STRING; from >>>my reading, all these are required: >>> >>>'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', 'SERVER_PORT' >>> >>>Well, maybe SCRIPT_NAME isn't required. >> >>Or PATH_INFO - if the request is addressed directly to the application, >>and there's no trailing '/', it can be empty, and is therefore allowed to >>be missing, as in CGI. > >OK, fixed. Actually, it just occurred to me that there *is* a legitimate test you can do for SCRIPT_NAME and PATH_INFO: at least *one* of them must be present and non-blank, because if you're at the site root, SCRIPT_NAME is empty and PATH_INFO has to be '/'. (Or the other way around, the CGI spec isn't clear on this, but Apache CGI puts the '/' in PATH_INFO.) Anyway, it's never valid to have both empty or missing, so you can: assert environ.get('SCRIPT_NAME') or environ.get('PATH_INFO') Also, if present and non-empty, both of these variables must *begin* with a '/', so it's more like: script_name = environ.get('SCRIPT_NAME','') path_info = environ.get('PATH_INFO','') assert not script_name or script_name.startswith('/') assert not path_info or path_info.startswith('/') assert script_name or path_info >>By the way, I found another issue with lint: IteratorWrapper doesn't >>close the original iterable if it had a close() method. > >Fixed as well. Actually, no. Lint's iterator close() is still broken. You have to use close() on the *iterable*, not on iter(iterable). The two may be different objects, since an iterable may return a separate iterator object. Also, pycgiwrapper returns None from __call__, when it should return an iterator. A simple way to fix that would be to just 'return [body]' after calling start_respsonse. I'm pretty much coming to the conclusion that WSGI is no longer "simple", alas. For it to actually be usable, there's going to have to be a reference library, as well as tests. I'm going to keep pecking away at your lint program, and eventually your other test facilities as well, so that I'll have something to test the reference library with. :) From ianb at colorstudy.com Wed Sep 29 19:23:54 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Sep 29 19:24:44 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com> References: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com> Message-ID: <415AEFAA.8070405@colorstudy.com> Phillip J. Eby wrote: > Actually, it just occurred to me that there *is* a legitimate test you > can do for SCRIPT_NAME and PATH_INFO: at least *one* of them must be > present and non-blank, because if you're at the site root, SCRIPT_NAME > is empty and PATH_INFO has to be '/'. (Or the other way around, the CGI > spec isn't clear on this, but Apache CGI puts the '/' in PATH_INFO.) OK... I guess the root of a domain is an odd case, because I can't imagine what the difference between SCRIPT_NAME="/", PATH_INFO="" or SCRIPT_NAME="", PATH_INFO="/" would mean. On further thought, I think it doesn't make sense for SCRIPT_NAME to be "/". Because PATH_INFO must always start with a "/", SCRIPT_NAME must be "" if there's any path (unless we get double /'s when reconstructing the URL, which wouldn't be good). So I think I'm going to make the test include SCRIPT_NAME != "/". The general case would say that SCRIPT_NAME should not end with a /, but I don't feel 100% confident that that's correct. > Anyway, it's never valid to have both empty or missing, so you can: > > assert environ.get('SCRIPT_NAME') or environ.get('PATH_INFO') > > Also, if present and non-empty, both of these variables must *begin* > with a '/', so it's more like: > > script_name = environ.get('SCRIPT_NAME','') > path_info = environ.get('PATH_INFO','') > assert not script_name or script_name.startswith('/') > assert not path_info or path_info.startswith('/') > assert script_name or path_info Yes, the '/' tests were already in there. >>> By the way, I found another issue with lint: IteratorWrapper doesn't >>> close the original iterable if it had a close() method. >> >> >> Fixed as well. > > > Actually, no. Lint's iterator close() is still broken. You have to use > close() on the *iterable*, not on iter(iterable). The two may be > different objects, since an iterable may return a separate iterator object. This was something I felt a little ambiguous about. I assume the server always must iterate over iter(app_iter), it can't iterate over app_iter directly. When using a "for" loop there's not much distinction, but if you access the .next() methods directly there would be. Anyway, I'm a little fuzzy when __iter__ gets called implicitly. I was suprised that it seemed to get called twice when iterating with a simple for look, and I had to add IteratorWrapper.__iter__. > Also, pycgiwrapper returns None from __call__, when it should return an > iterator. A simple way to fix that would be to just 'return [body]' > after calling start_respsonse. I've added a check in lint specifically for None or False for the iterator; it would still fail implicitly before, but this way the error should be better. I haven't tested pycgiwrapper yet, or some of the other code I wrote before, so there might be other bugs in there (e.g., unnecessary use of write(), or returning None). > I'm pretty much coming to the conclusion that WSGI is no longer > "simple", alas. For it to actually be usable, there's going to have to > be a reference library, as well as tests. I'm going to keep pecking > away at your lint program, and eventually your other test facilities as > well, so that I'll have something to test the reference library with. :) The basic mechanics are still reasonably simple, but there's a lot of smaller things to consider. So I don't think WSGI has become that much more complicated, we've just come to appreciate complexities that were there all along. Also, should we be putting all of this code in a single repository? -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org From pje at telecommunity.com Wed Sep 29 19:38:28 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 29 19:38:27 2004 Subject: [Web-SIG] WSGI tests In-Reply-To: <415AEFAA.8070405@colorstudy.com> References: <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com> <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com> <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com> <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040929133005.020e6d40@mail.telecommunity.com> At 12:23 PM 9/29/04 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>Actually, it just occurred to me that there *is* a legitimate test you >>can do for SCRIPT_NAME and PATH_INFO: at least *one* of them must be >>present and non-blank, because if you're at the site root, SCRIPT_NAME is >>empty and PATH_INFO has to be '/'. (Or the other way around, the CGI >>spec isn't clear on this, but Apache CGI puts the '/' in PATH_INFO.) > >OK... I guess the root of a domain is an odd case, because I can't imagine >what the difference between SCRIPT_NAME="/", PATH_INFO="" or >SCRIPT_NAME="", PATH_INFO="/" would mean. > >On further thought, I think it doesn't make sense for SCRIPT_NAME to be >"/". Because PATH_INFO must always start with a "/", SCRIPT_NAME must be >"" if there's any path (unless we get double /'s when reconstructing the >URL, which wouldn't be good). So I think I'm going to make the test >include SCRIPT_NAME != "/". The general case would say that SCRIPT_NAME >should not end with a /, but I don't feel 100% confident that that's correct. Actually, you're right: SCRIPT_NAME should not end with a '/', because it would have to be part of PATH_INFO in that case. >>>>By the way, I found another issue with lint: IteratorWrapper doesn't >>>>close the original iterable if it had a close() method. >>> >>> >>>Fixed as well. >> >>Actually, no. Lint's iterator close() is still broken. You have to use >>close() on the *iterable*, not on iter(iterable). The two may be >>different objects, since an iterable may return a separate iterator object. > >This was something I felt a little ambiguous about. I assume the server >always must iterate over iter(app_iter), it can't iterate over app_iter >directly. Not precisely true; see below. > When using a "for" loop there's not much distinction, but if you access > the .next() methods directly there would be. Anyway, I'm a little fuzzy > when __iter__ gets called implicitly. I was suprised that it seemed to > get called twice when iterating with a simple for look, and I had to add > IteratorWrapper.__iter__. PEP 234 describes the iterator protocol, but here's a short summary: * An "iterable" has an __iter__ method (tp_iter slot at the C level) * An "iterator" has an __iter__ method *and* a next method (tp_iter_next slot) 'for' loops work on "iterables", so they call __iter__. Typically, an iterator's __iter__ returns self, so this is idempotent if you're iterating over an iterator. WSGI apps must return an *iterable*. An iterator is of course also an iterable. >>I'm pretty much coming to the conclusion that WSGI is no longer "simple", >>alas. For it to actually be usable, there's going to have to be a >>reference library, as well as tests. I'm going to keep pecking away at >>your lint program, and eventually your other test facilities as well, so >>that I'll have something to test the reference library with. :) > >The basic mechanics are still reasonably simple, but there's a lot of >smaller things to consider. So I don't think WSGI has become that much >more complicated, we've just come to appreciate complexities that were >there all along. > >Also, should we be putting all of this code in a single repository? Eventually, we should probably use the Python CVS sandbox. For now, we don't really have any duplication taking place AFAICT. Once I have something resembling a coherent reference library, I'll put it there, anyway. From ianb at colorstudy.com Wed Sep 29 23:48:49 2004 From: ianb at colorstudy.com (Ian Bicking) Date: Wed Sep 29 23:49:37 2004 Subject: [Web-SIG] WSGI Webware/WebKit Message-ID: <415B2DC1.70007@colorstudy.com> I just committed some code to the repository (svn://colorstudy.com/trunk/WSGI/) that implements WebKit ontop of WSGI. It's not complete, but much of the core is there. There's no configuration, no AppServer or Application object, no session, and the path (URL introspection) methods are missing. The path methods in Webware are a mess, which is why I left them out. AppServer and Application objects don't really apply in this context. I hope to create dummy objects for those few places where they are exposed. Configuration probably will be implemented in a different layer, and all the configuration will change, since it's a different environment you are configuring. Session will probably be in a different layer as well, maybe with a wrapper to implement the WebKit interface around a session that may not have that interface. The path methods will just wait. Also, there's no URL resolution. I'm just using dispatch.py for now, which is a naive way of dispatching. But I plan to keep dispatching in a separate layer -- this way different frameworks can live side-by-side. Instances of WSGI.WSGIWebKit.wkservlet.Page are WSGI applications; basically Page.__call__ does the work. Most of the rest is copied from WebKit with some cleanup; there's some small portions that were changed, like HTTPResponse.__init__, write, commit, and deliver, and HTTPRequest.__init__. The changes were fairly easy to do. I also changed the tests to be unittests. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org