From janssen at parc.com  Wed Sep  1 03:16:19 2004
From: janssen at parc.com (Bill Janssen)
Date: Wed Sep  1 03:16:45 2004
Subject: [Web-SIG] SIG charter 
In-Reply-To: Your message of "Fri, 27 Aug 2004 10:51:08 PDT."
	<20040827175108.GA29376@rogue.amk.ca> 
Message-ID: <04Aug31.181620pdt."58612"@synergy1.parc.xerox.com>

> I think the charter was written by Bill Janssen, who doesn't seem to
> be actively participating on the list any more.  The charter doesn't
> necessarily bear any relevance to what the individuals in the SIG are
> actually doing.

Oh, I'm here, but I've been on vacation the last couple of weeks.

I'd say, keep the current charter, and let's keep up the great
conversation that's been going on.

Bill
From pje at telecommunity.com  Wed Sep  1 04:39:26 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  1 04:38:59 2004
Subject: [Web-SIG] Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040831223709.02368cf0@mail.telecommunity.com>

At 05:56 PM 8/31/04 -0400, Phillip J. Eby wrote:
>I'm just about to check in a major update to the PEP, per the details 
>below.  It will be a while before it shows up in the HTML version of the 
>PEP or the sourceforge ViewCVS, though.

FYI: these changes have now propagated to the HTML version at:

http://www.python.org/peps/pep-0333.html

and the CVS history at:

http://cvs.sourceforge.net/viewcvs.py/python/python/nondist/peps/pep-0333.txt


From andrew at andreweland.org  Wed Sep  1 11:17:05 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Wed Sep  1 11:27:38 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <4134CB04.2010803@xhaus.com>
References: <4134CB04.2010803@xhaus.com>
Message-ID: <41359391.5000108@andreweland.org>

Alan Kennedy wrote:

> Problem is that jython doesn't support file descriptors, or the fileno() 
> method. If you invoke fileno() on an org.python.core.PyFile, you get an 
> Py.IOError("fileno() is not supported in jpython") exception.

I guess the fileno() method could be renamed something like os_file() or 
os_stream(). CPython could return a file descriptor, Jython could return 
something like a java.nio.Channel, IronPython could return a 
System.IO.Stream, or something like that.

   -- Andrew (http://www.andreweland.org)
From py-web-sig at xhaus.com  Wed Sep  1 12:52:02 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  1 12:47:28 2004
Subject: [Web-SIG] Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com>
References: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com>
Message-ID: <4135A9D2.9060800@xhaus.com>

[Phillip J. Eby]
 > I'm just about to check in a major update to the PEP, per the details
 > below.

Phillip,

Thanks for all your hard work: I think you're doing a great job, and I 
think that the WSGI initiative is the best thing ever to happen to 
python web APIs.

But I do have one problem :-(

[Phillip J. Eby]
 > I've also clarified that 'fileno()', if present, *must* be an OS file
 > descriptor, and is only relevant to servers on platforms where file
 > descriptors exist.

This will break portability across jython and IronPython, and any other 
platforms that don't have the concept of file descriptor tables: thus it 
prevents WSGI applications from returning file-like objects on these 
platforms.

The requirement, as is, can only work on platforms that use file 
descriptor tables, i.e. where every process has an array of open 
files/file-likes, where the "fileno()" is an integer index into that 
table. Granted, all *nixes, Windows, MacOS, etc, etc, all have 
per-process file descriptor tables, thus belying their C/unix heritage.

Neither jython nor ironpython have file descriptor tables. Since the 
concept of file descriptor tables is platform specific, both the JVM and 
the .Net CLR eliminated them, and modelled all file-like objects as 
specific object classes, e.g. java.io.OutputStream, 
java.nio.SelectableChannel, System.IO.*, etc. If you want to create a 
file-like object, you must use one of platform-supplied classes: there 
is no global table of such file-like objects. You can no longer pass 
around "file descriptors", i.e. indexes into a table of file objects, 
because the semantics of what you can with various file-like objects 
varies between those objects.

Some pythonistas don't like this object specialization for file-like 
objects, and prefer the *nix file descriptor approach, since it is 
comparable to python's late-binding approach to datatypes. However, lack 
of file descriptor tables is an unavoidable reality on the JVM and CLR: 
the two most widespread virtual-machines in the world.

Insisting on the "fileno()" method returning a file descriptor makes it 
impossible to return a file like object to a jython or ironpython 
implemented WSGI container.

IMHO, the correct approach is for the appplication to return an actual 
file-like object, e.g. one with a read() method, and for the 
server/framework to then map that file-like object to whatever 
high-performance byte-stream-type object is appropriate on the platform. 
On java, for example, this could be a java.nio.FileChannel. Once one of 
these had been obtained from the returned file-like object, the high 
performance FileChannel.transferTo() could then be used to transfer the 
file contents to the socket return stream.

http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/FileChannel.html#transferTo(long,%20long,%20java.nio.channels.WritableByteChannel)

So, please can we have WSGI require the return of a file-like object, 
which the WSGI server/framework is then free to map to a 
high-performance channel in whatever way is appropriate?

The "must return a file descriptor approach" is broken.

Kind regards,

Alan.
From py-web-sig at xhaus.com  Wed Sep  1 12:59:33 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  1 12:54:58 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <41359391.5000108@andreweland.org>
References: <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org>
Message-ID: <4135AB95.8040108@xhaus.com>

[Alan Kennedy]
>> Problem is that jython doesn't support file descriptors, or the 
>> fileno() method. If you invoke fileno() on an org.python.core.PyFile, 
>> you get an Py.IOError("fileno() is not supported in jpython") exception.

[Andrew Eland]
> I guess the fileno() method could be renamed something like os_file() or 
> os_stream(). CPython could return a file descriptor, Jython could return 
> something like a java.nio.Channel, IronPython could return a 
> System.IO.Stream, or something like that.

Hmm, I'm not sure I understand what you are saying here Andrew.

The use-case we're trying to cover is where the application wants to 
return a file-like object to the WSGI server/framework. The applications 
intention should be that the contents of the file-like object, from the 
current file-pointer onwards, should be transferred to the return socket 
for the HTTP request.

On jython, and I'm guessing on ironpython, file-like objects don't have 
a fileno() method, or an os_file() method or an os_stream() method. They 
just have file like methods, e.g. read(), readline(), write(), etc.

What we need is a way for the application to return a file-like object, 
in a platform-independent way, so that whatever platform/framework the 
application is running in can

1. Simply read the file contents and transfer that back to the user
2. Possibly do so using a high-performance channel or stream.

Regards,

Alan.
From andrew at andreweland.org  Wed Sep  1 13:30:47 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Wed Sep  1 13:41:15 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <4135AB95.8040108@xhaus.com>
References: <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org>
	<4135AB95.8040108@xhaus.com>
Message-ID: <4135B2E7.5060708@andreweland.org>

Alan Kennedy wrote:

> Hmm, I'm not sure I understand what you are saying here Andrew. 
> The use-case we're trying to cover is where the application wants to 
> return a file-like object to the WSGI server/framework. The applications 
> intention should be that the contents of the file-like object, from the 
> current file-pointer onwards, should be transferred to the return socket 
> for the HTTP request.

The intent, I think, is to special-case the sending of static files, 
allowing a server to use the most efficient method of transferring data 
from a file to a socket that the platform provides.

Under CPython, the server could use something like sendfile() or epoll() 
  to transfer data, if it has access to the underlying file descriptor.
Under Jython, with a server written in Java, it would be nice to allow 
the use the most efficient Java mechanism to transfer data from the file 
to the client, which I guess is the functionality under java.nio. To do 
this, the server would need to access the underlying Java object 
representing the file, a java.nio.Channel or similar.

   -- Andrew (http://www.andreweland.org)
From py-web-sig at xhaus.com  Wed Sep  1 14:39:46 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  1 14:35:11 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <4135B2E7.5060708@andreweland.org>
References: <4134CB04.2010803@xhaus.com> <41359391.5000108@andreweland.org>
	<4135AB95.8040108@xhaus.com> <4135B2E7.5060708@andreweland.org>
Message-ID: <4135C312.2060009@xhaus.com>

[Alan Kennedy]
>> Hmm, I'm not sure I understand what you are saying here Andrew. The 
>> use-case we're trying to cover is where the application wants to 
>> return a file-like object to the WSGI server/framework. The 
>> applications intention should be that the contents of the file-like 
>> object, from the current file-pointer onwards, should be transferred 
>> to the return socket for the HTTP request.

[Andrew Eland]
> The intent, I think, is to special-case the sending of static files, 
> allowing a server to use the most efficient method of transferring data 
> from a file to a socket that the platform provides.

Agreed that special-casing static files for performance reasons is a 
good thing.

But we also need to consider what happens when the application returns, 
for example, a StringIO.StringIO, or a gzip.GzipFile.

I'm trying to come up with a scheme whereby applications can do those 
things transparently across cpython, jython and ironpython. So when I 
said "I'm not sure I understand", I should have said "I don't understand 
how your proposed os_file() or os_stream() approach would work, without 
forcing application authors to detect the platform they are running on 
and alter their applications behaviour accordingly".

> Under CPython, the server could use something like sendfile() or epoll() 
>  to transfer data, if it has access to the underlying file descriptor.
> Under Jython, with a server written in Java, it would be nice to allow 
> the use the most efficient Java mechanism to transfer data from the file 
> to the client, which I guess is the functionality under java.nio. To do 
> this, the server would need to access the underlying Java object 
> representing the file, a java.nio.Channel or similar.

Precisely: maximizing efficiency is high on my priority list.

As a datapoint, using java.nio.Channel would currently not be possible 
under most existing J2EE containers, since they tend to use the old 
java.net APIs for socket creation. Such java.net-created sockets don't 
have java.nio.Channel's: you have to use the java.nio APIs to get 
java.nio.Channels.

Which will be a breeze pythonistas when I'm finished my jynio modules, 
e.g. non-blocking support for jython: e.g. select, asyncore, etc, which 
is completely based on java.nio. Hopefully we will then see the cpython 
asynch frameworks, e.g. Medusa, Twisted, etc, running on java as well. I 
would then expect to see some serious performance competition between 
cpython and jython, especially since jython is not restricted by a GIL.

Regards,

Alan.
From pje at telecommunity.com  Wed Sep  1 15:17:13 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  1 15:16:56 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <4135C312.2060009@xhaus.com>
References: <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
Message-ID: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>

At 01:39 PM 9/1/04 +0100, Alan Kennedy wrote:
>[Alan Kennedy]
>>>Hmm, I'm not sure I understand what you are saying here Andrew. The 
>>>use-case we're trying to cover is where the application wants to return 
>>>a file-like object to the WSGI server/framework. The applications 
>>>intention should be that the contents of the file-like object, from the 
>>>current file-pointer onwards, should be transferred to the return socket 
>>>for the HTTP request.
>
>[Andrew Eland]
>>The intent, I think, is to special-case the sending of static files, 
>>allowing a server to use the most efficient method of transferring data 
>>from a file to a socket that the platform provides.
>
>Agreed that special-casing static files for performance reasons is a good 
>thing.
>
>But we also need to consider what happens when the application returns, 
>for example, a StringIO.StringIO, or a gzip.GzipFile.

No, we don't.  WSGI does not support that.  You must return an 
*iterable*.  As Andrew says, 'fileno()' was added to allow special-casing 
operating system file descriptors on platforms that have them, and have 
APIs like 'sendfile()' that can copy data directly from one descriptor to 
another.

If you would like to support special Java stuff, or CLR stuff, you can 
always have your server look for some other attribute name and support that 
as a platform-specific, optional extension for higher performance.

But that's *all* the 'fileno()' support is: a *platform-specific* *optional 
extension* to boost performance in certain cases.  The server isn't even 
required to *check* for a fileno attribute, and the application certainly 
isn't required to provide it.

The application is required to return an iterable.  That's the 
protocol.  You want to return a "file-like" object, you *must* wrap it in 
an iterable of some kind.  For example:

     return [some_io.getvalue()]

is a perfectly reasonable way to return a StringIO.

From pje at telecommunity.com  Wed Sep  1 15:19:50 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  1 15:19:30 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <41359391.5000108@andreweland.org>
References: <4134CB04.2010803@xhaus.com>
 <4134CB04.2010803@xhaus.com>
Message-ID: <5.1.1.6.0.20040901091738.03116190@mail.telecommunity.com>

At 10:17 AM 9/1/04 +0100, Andrew Eland wrote:
>Alan Kennedy wrote:
>
>>Problem is that jython doesn't support file descriptors, or the fileno() 
>>method. If you invoke fileno() on an org.python.core.PyFile, you get an 
>>Py.IOError("fileno() is not supported in jpython") exception.
>
>I guess the fileno() method could be renamed something like os_file() or 
>os_stream(). CPython could return a file descriptor, Jython could return 
>something like a java.nio.Channel, IronPython could return a 
>System.IO.Stream, or something like that.

No; if developers on those platforms want to support optional 
platform-specific performance boosting, they should define 
platform-specific names for the attribute.  This improves the ease of 
portability for applications: they just provide what they know how to 
provide, and the server only invokes the attribute appropriate to the 
platform, if it invokes any attribute at all.

From py-web-sig at xhaus.com  Wed Sep  1 15:40:00 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  1 15:35:24 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
References: <4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
Message-ID: <4135D130.4090108@xhaus.com>

[Alan Kennedy]
>> But we also need to consider what happens when the application 
>> returns, for example, a StringIO.StringIO, or a gzip.GzipFile.

[Phillip J. Eby]
> No, we don't.  WSGI does not support that.  You must return an 
> *iterable*.  As Andrew says, 'fileno()' was added to allow 
> special-casing operating system file descriptors on platforms that have 
> them, and have APIs like 'sendfile()' that can copy data directly from 
> one descriptor to another.
> 
> If you would like to support special Java stuff, or CLR stuff, you can 
> always have your server look for some other attribute name and support 
> that as a platform-specific, optional extension for higher performance.

But that is explicitly forbidden: "Finally, servers must not directly 
use any other attributes of the iterable returned by the application. 
For example, it[sic] the iterable is a file object, it may have a read() 
method, but the server must not utilize it. Only attributes specified 
here, or accessed via e.g. the PEP 234 iteration APIs are acceptable."

> But that's *all* the 'fileno()' support is: a *platform-specific* 
> *optional extension* to boost performance in certain cases.  The server 
> isn't even required to *check* for a fileno attribute, and the 
> application certainly isn't required to provide it.

Fair enough, it is good to support recognition of file-like objects on 
platforms that have file descriptor tables.

But I don't see any WSGI compliant way in jython that I can take a 
static file object returned by a WSGI application and do anything with 
it at all.

For example, if the application works like this, which I'd imagine is a 
common expected usage pattern, then I can do nothing

def app_object(environ, start_response):
   start_response("200 OK", [ ('content-type', 'image/jpg') ])
   return open("%s.jpg" % environ['PATH_INFO'], 'rb')

This will work on cpython, of course, because of implicit fileno() 
method on the (cpython) file object. But will fail on jython, which will 
confuse the hell out of appliction authors.

Regards,

Alan.
From pje at telecommunity.com  Wed Sep  1 15:54:06 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  1 15:53:46 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <4135D130.4090108@xhaus.com>
References: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>

At 02:40 PM 9/1/04 +0100, Alan Kennedy wrote:
>[Alan Kennedy]
>>>But we also need to consider what happens when the application returns, 
>>>for example, a StringIO.StringIO, or a gzip.GzipFile.
>
>[Phillip J. Eby]
>>No, we don't.  WSGI does not support that.  You must return an 
>>*iterable*.  As Andrew says, 'fileno()' was added to allow special-casing 
>>operating system file descriptors on platforms that have them, and have 
>>APIs like 'sendfile()' that can copy data directly from one descriptor to 
>>another.
>>If you would like to support special Java stuff, or CLR stuff, you can 
>>always have your server look for some other attribute name and support 
>>that as a platform-specific, optional extension for higher performance.
>
>But that is explicitly forbidden: "Finally, servers must not directly use 
>any other attributes of the iterable returned by the application. For 
>example, it[sic] the iterable is a file object, it may have a read() 
>method, but the server must not utilize it. Only attributes specified 
>here, or accessed via e.g. the PEP 234 iteration APIs are acceptable."

I've changed the spec now to allow authors to define a platform-specific 
special method name for this purpose.


>But I don't see any WSGI compliant way in jython that I can take a static 
>file object returned by a WSGI application and do anything with it at all.
>
>For example, if the application works like this, which I'd imagine is a 
>common expected usage pattern, then I can do nothing
>
>def app_object(environ, start_response):
>   start_response("200 OK", [ ('content-type', 'image/jpg') ])
>   return open("%s.jpg" % environ['PATH_INFO'], 'rb')
>
>This will work on cpython, of course, because of implicit fileno() method 
>on the (cpython) file object. But will fail on jython, which will confuse 
>the hell out of appliction authors.

If they want to support Python versions prior to 2.2, they can't return a 
file object.  The above code simply isn't portable to Python 2.1.

But, since your use case is, "try to allow 2.2 code to run anyway", it's 
also reasonable for you to hack in support for objects of type 'file' (and 
whatever type Jython uses for pipes) and pretend they're iterables.  You're 
specifically trying to support some 2.2 idioms rather than deal with 2.1 
limitations, so this is just another one for you.

Don't let the spec stop you from supporting your use case.  The problem is 
simply that your use case is outside the spec's scope, and I don't want to 
expand the spec's scope to make *everybody else* have to implement the 
extras you're implementing.  I don't want to force everybody else to try to 
support 2.2 features in a 2.1 Python.

Does that make sense?

From py-web-sig at xhaus.com  Wed Sep  1 16:50:52 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  1 16:46:16 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
References: <5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
Message-ID: <4135E1CC.1060605@xhaus.com>

[Phillip J. Eby]
 > Does that make sense?

Phillip, sorry to be such a PITA, but no, it doesn't.

[Phillip J. Eby]
 >>> If you would like to support special Java stuff, or CLR stuff, you
 >>> can always have your server look for some other attribute name and
 >>> support that as a platform-specific, optional extension for higher
 >>> performance.

[Alan Kennedy]
 >> But that is explicitly forbidden

[Phillip J. Eby]
 > I've changed the spec now to allow authors to define a platform-specific
 > special method name for this purpose.

But there is no special method name or attribute on file-like objects 
that I can look for: file methods such as read() are the only options. 
Jython file objects have an identical interface to cpython file objects, 
except that they don't have fileno() methods.

Though I suppose could check the class of the returned object, e.g.

if isinstance(app_return, types.FileType):
   # Attempt high-performance stuff

[Alan Kennedy]
 >> For example, if the application works like this, which I'd imagine is
 >> a common expected usage pattern, then I can do nothing
 >>
 >> def app_object(environ, start_response):
 >>   start_response("200 OK", [ ('content-type', 'image/jpg') ])
 >>   return open("%s.jpg" % environ['PATH_INFO'], 'rb')
 >>
 >> This will work on cpython, of course, because of implicit fileno()
 >> method on the (cpython) file object. But will fail on jython, which
 >> will confuse the hell out of appliction authors.

[Phillip J. Eby]
 > If they want to support Python versions prior to 2.2, they can't return
 > a file object.  The above code simply isn't portable to Python 2.1.

A couple of points to make here

1. I see nothing 2.2 specific in my above code sample: it works on all 
pythons. I don't see what differs between 2.1 vs. 2.2 in this case.

2. The spec, as is, explicitly permits authors of cpython applications 
to return file-like objects, due to the cpython-specific special case 
"your application object may have a fileno()". Of course, most 
application authors won't know that the reason why their file return is 
succeeding is because the file object has a fileno() method, and then 
wonder why their app doesn't work on jython.

 > But, since your use case is, "try to allow 2.2 code to run anyway", it's
 > also reasonable for you to hack in support for objects of type 'file'
 > (and whatever type Jython uses for pipes) and pretend they're
 > iterables.  You're specifically trying to support some 2.2 idioms rather
 > than deal with 2.1 limitations, so this is just another one for you.

Sorry, I'm confused: what 2.2 idioms do you mean?

 > Don't let the spec stop you from supporting your use case.  The problem
 > is simply that your use case is outside the spec's scope, and I don't
 > want to expand the spec's scope to make *everybody else* have to
 > implement the extras you're implementing.  I don't want to force
 > everybody else to try to support 2.2 features in a 2.1 Python.

Sorry, Phillip, I'm confused. I don't see that this has anything to do 
with 2.1 vs. 2.2: it's got to do with how to recognise the case where 
the application returns a file-like object, which can then be treated 
specially, e.g. for high-performance reasons.

I think we should explicitly allow return of a file-like object, and 
thus freedom to use the read() method, etc. That's the 
platform-independent way to solve this problem. Then each 
server/framework author can map that to a high-performance 
descriptor/stream/channel in whatever way is appropriate for their 
platform. Or not bother with high-performance, and just read() all the 
file contents and transmit that.

Is there a specific reason, perhaps relating to python 2.2, that you 
want to prevent appplication authors from returning files?

Regards,

Alan.
From pje at telecommunity.com  Wed Sep  1 17:59:30 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  1 17:59:15 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <4135E1CC.1060605@xhaus.com>
References: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com>

At 03:50 PM 9/1/04 +0100, Alan Kennedy wrote:

>But there is no special method name or attribute on file-like objects that 
>I can look for: file methods such as read() are the only options. Jython 
>file objects have an identical interface to cpython file objects, except 
>that they don't have fileno() methods.

"File-like" is a complete red herring: the spec has never supported them 
(and IMO never will).

What the spec calls for is an *iterable*: an object that can be used in a 
"for" loop.  In Python 2.2 and up, file objects are iterable.  In older 
versions of Python, they are not.

Thus, an application that returns a file object implicitly requires Python 
2.2 or up.  (This issue is mentioned in the spec, where it warns that if 
you are using an older version of Python, you may *not* return a file object.)

WSGI does not support returning files or file-like objects: it is simply an 
artifact of Python 2.2 and up that returning a file works at all!

Appealing to the 'fileno()' case as supporting file-like objects is also a 
red herring: the object must *still* be an iterable, because not every 
server or gateway will support 'fileno()'.  Thus, code that relies on an 
object that has a 'fileno()' but isn't iterable, is in violation of the 
spec and is inherently non-portable.

But, because file objects are iterable in 2.2 or up, if the application 
doesn't care about older versions, it is free to return file objects.

The fact that you would like such code to run in a Jython 2.1 server 
doesn't mean that the spec should expand its scope to cover even *file 
objects*, let alone "file-like" objects.  It simply means that you'll have 
to deal with the special cases that entails, until Jython 2.2 is ready for 
prime-time.


>1. I see nothing 2.2 specific in my above code sample: it works on all 
>pythons. I don't see what differs between 2.1 vs. 2.2 in this case.

2.1 doesn't allow iteration over file objects.


>2. The spec, as is, explicitly permits authors of cpython applications to 
>return file-like objects,

Only if they are *iterable*, which is only true of the 'file' object in 2.2 
and up.


>due to the cpython-specific special case "your application object may have 
>a fileno()".

You misunderstand: in CPython 2.1 returning a file is *not* acceptable 
under the spec.  It is purely coincidental that it will happen to work if 
the server checks for 'fileno()' and supports doing something with it.  But 
it's not portable behavior for 2.1 and the spec has said that as soon as 
the "Supporting Older Versions" section was added.


>Of course, most application authors won't know that the reason why their 
>file return is succeeding is because the file object has a fileno() 
>method, and then wonder why their app doesn't work on jython.

You're effectively arguing for removing the 'fileno()' special case 
altogether, or else adding language to require the server to *first* check 
for iterability and raise an error if the return isn't iterable, so that 
running a 2.2 app in a 2.1 server won't "accidentally" succeed when the 2.1 
server supports 'fileno()'.

However, this is such an obscure use case as to be ludicrous to worry 
about.  So far, yours is the only server that has suggested that supporting 
2.2 apps under Python 2.1 is anything even approaching a good idea.  I find 
it hard to imagine any reason to do that, other than the lack of 
availability of a Python 2.2 implementation.  Other than Jython, I'm not 
aware of any other platforms where this is the case.

I applaud your bravery in trying to make it work for Jython, but changing 
the spec to allow other kinds of objects isn't going to decrease the amount 
of work you have to do, only increase it for other people who *aren't* 
trying to support 2.2 apps in a server running under Python 2.1.


> > Don't let the spec stop you from supporting your use case.  The problem
> > is simply that your use case is outside the spec's scope, and I don't
> > want to expand the spec's scope to make *everybody else* have to
> > implement the extras you're implementing.  I don't want to force
> > everybody else to try to support 2.2 features in a 2.1 Python.
>
>Sorry, Phillip, I'm confused. I don't see that this has anything to do 
>with 2.1 vs. 2.2: it's got to do with how to recognise the case where the 
>application returns a file-like object, which can then be treated 
>specially, e.g. for high-performance reasons.

It's not about "file-like" objects, only *actual* file objects.  Returning 
a "file-like" object offers no meaningful performance boost, and it is 
*not* supported -- and never was.


>I think we should explicitly allow return of a file-like object, and thus 
>freedom to use the read() method, etc.

I disagree.  In 2.2, you can return a file-like object thus:

     return iter(lambda: filelike.read(bufsize), "")

In 2.1 and prior, you can do this:

     class Reader:
         def __init__(self,filelike,bufsize=4096):
             self.stream = filelike
             self.bufsize = bufsize
             if hasattr(filelike,'fileno'):
                 self.fileno = filelike.fileno

         def __getitem__(self,ind):
             data = self.stream.read(self.bufsize)
             if data:
                 return data
             raise IndexError

     return Reader(filelike)

or even:

    return xreadlines.xreadlines(filelike)

Any of these approaches results in a spec-compliant iterable for the 
applicable or higher version of Python.

You are trying to let 2.2 code run in a 2.1 Python.  But you don't need to 
support "file-like" objects to do that.  You need only special case for an 
*actual* file object, because such an object *would* be iterable under 
2.2.  The issue there isn't high-performance, it's merely that file objects 
are unacceptable return values in Python 2.1, but code written for 2.2. 
will expect that returning a file object is valid.

There are other objects that technically would need to be special-cased for 
this.  For example, a dictionary object is iterable in 2.2 but not in 
2.1.  In practice, it would be silly to bother since nobody in their right 
mind is going to use a dictionary as a WSGI return value...  unless of 
course it had only one key.

The point is that trying to run 2.2 code in a 2.1 Python is necessarily a 
collection of special case hacks.  The spec calls for *iterability*, and 
2.2 code may return objects of built-in types that are iterable in 2.2, but 
not in 2.1.

That is why this is a Python versioning issue, and specific to your attempt 
to run 2.2 code in a 2.1 Python.  It has absolutely nothing to do with 
accepting "file-like" objects in the spec, which never accepted them, nor 
is it intended to ever do so.

Is this getting any clearer?

From py-web-sig at xhaus.com  Wed Sep  1 19:08:46 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  1 19:04:11 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com>
References: <5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
	<5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com>
Message-ID: <4136021E.6070907@xhaus.com>

Phillip,

I'm fairly sure I understand your position now. But I think I don't 
agree with it ;-)

[Phillip J. Eby]
 > "File-like" is a complete red herring: the spec has never supported them
 > (and IMO never will).
 >
 > What the spec calls for is an *iterable*: an object that can be used in
 > a "for" loop.  In Python 2.2 and up, file objects are iterable.  In
 > older versions of Python, they are not.
 >
 > Thus, an application that returns a file object implicitly requires
 > Python 2.2 or up.  (This issue is mentioned in the spec, where it warns
 > that if you are using an older version of Python, you may *not* return a
 > file object.)
 >
 > WSGI does not support returning files or file-like objects: it is simply
 > an artifact of Python 2.2 and up that returning a file works at all!

My position here is that the iterator-ness of a returned file object is 
secondary when the returned object has a fileno() method: most cpython 
framework code is going to do this

if (hasattr(app_object, 'fileno') and callable(app_object.fileno):
   send_file(app_object)
else:
   treat_app_object_as_iterable(app_object)

I would summarise the position of the current spec as "you must return 
an iterable, except when you want to return a file object, which will 
work fine under cpython 2.2+, because files are iterable under cpython 
2.2+, even though they don't need to be iterable when they have fileno()".

 > The fact that you would like such code to run in a Jython 2.1 server
 > doesn't mean that the spec should expand its scope to cover even *file
 > objects*, let alone "file-like" objects.  It simply means that you'll
 > have to deal with the special cases that entails, until Jython 2.2 is
 > ready for prime-time.

Call me old-fashioned, but I'm a great believer in "practicality beats 
purity". I think we should be seeking to be as inclusive as possible, 
which means supporting as wide a software base as possible.

I'm just afraid that people will steam ahead writing WSGI middleware 
applications which return file-objects, which will fail on jython simply 
because putting the following lines in my code is a violation of the spec

if type(app_return) is types.FileType:
   do_file_stuff(app_return)

[Alan Kennedy]
 >> 2. The spec, as is, explicitly permits authors of cpython applications
 >> to return file-like objects,

[Phillip J. Eby]
 > Only if they are *iterable*, which is only true of the 'file' object in
 > 2.2 and up.

Which seems to me an arbitrary criterion, especially in the light that 
the iterator nature of the file object will possibly (likely) not be 
actually used, as described in the snippet above.

 > You're effectively arguing for removing the 'fileno()' special case
 > altogether, or else adding language to require the server to *first*
 > check for iterability and raise an error if the return isn't iterable,
 > so that running a 2.2 app in a 2.1 server won't "accidentally" succeed
 > when the 2.1 server supports 'fileno()'.

Not at all.

I'm arguing for us to be practical about applications returning file 
objects.

1. It's a very common use case
2. It's trivial to deal with
3. There are no python version dependency issues

In cpython frameworks, the code would look like this

if hasattr(app_object, 'fileno'):
   do_file_stuff(app_object.fileno())
else:
   do_iterator_stuff(app_object)

On jython

if type(app_object) is types.FileType:
   do_file_stuff(app_object)
else:
   do_iterator_stuff(app_object)

Is that so difficult to accept?

[Phillip J. Eby]
 > I applaud your bravery in trying to make it work for Jython, but
 > changing the spec to allow other kinds of objects isn't going to
 > decrease the amount of work you have to do, only increase it for other
 > people who *aren't* trying to support 2.2 apps in a server running under
 > Python 2.1.

It's not really about bravery, it's about wanting to maximize 
portability between available python platforms. I hope to achieve that 
through the application of a little pythonic simplicity.

After all, we're just trying to move byte streams from one place to 
another: do we have to be this complex about it?

 > It's not about "file-like" objects, only *actual* file objects.
 > Returning a "file-like" object offers no meaningful performance boost,
 > and it is *not* supported -- and never was.

Except when it is supported, for whatever complicated reasons, e.g. 
iterable objects with fileno()s.

[Alan Kennedy]
 >> I think we should explicitly allow return of a file-like object, and
 >> thus freedom to use the read() method, etc.

[Phillip J. Eby]
 > You are trying to let 2.2 code run in a 2.1 Python.

Well, I see it as WSGI forcing me to jump through hoops in order to 
support the notion of iterability, even when that notion is NOT 
universally applicable, as the fileno() exception proves.

 > That is why this is a Python versioning issue, and specific to your
 > attempt to run 2.2 code in a 2.1 Python.  It has absolutely nothing to
 > do with accepting "file-like" objects in the spec, which never accepted
 > them, nor is it intended to ever do so.
 >
 > Is this getting any clearer?

Crystal.

However, I think the absolute insistence on return objects being 
iterable is slightly arbitrary and unnecessarily constraining.

I understand your desire to keep the spec clean and simple, and also 
your desire to use modern python facilities to do it. But those modern 
python facilities are not universally available, and, strictly speaking, 
not absolutely required. I'm suppose I'm just pleading for a little 
pythonic practicality.

Maybe I'm just wasting my time? Maybe I'm the only one who is interested 
in seeing a jython WSGI server into which users can drop universal WSGI 
components and have them just work? Is anyone else interested in such a 
jython WSGI container? Or should I just toddle off back to J2EE servlets?

Lastly, since the spec is still potentially a moving target, I've 
translated as much of my java as possible into jython, which will 
greatly speed up the prototyping process. Once the spec is finalized, I 
may translate it back to java, if there is a sufficient 
performance/other requirement for that. (I should have prototyped it in 
jython from the start, and saved myself a load of time).

Kind regards,

Alan.
From pje at telecommunity.com  Wed Sep  1 19:42:04 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  1 19:41:53 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <4136021E.6070907@xhaus.com>
References: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com>
	<5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
	<5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040901131431.0329baf0@mail.telecommunity.com>

At 06:08 PM 9/1/04 +0100, Alan Kennedy wrote:
>I would summarise the position of the current spec as "you must return an 
>iterable, except when you want to return a file object, which will work 
>fine under cpython 2.2+, because files are iterable under cpython 2.2+, 
>even though they don't need to be iterable when they have fileno()".

No, it's just that you must return an iterable, *period*.  The fact that 
2.2 allows this to be a file object is irrelevant, as is the fact that 2.1 
doesn't allow this to be a file object.  There are thousands of classes out 
there for both 2.1 and 2.2 that either are, or aren't, iterable, and that's 
equally irrelevant.


>I'm just afraid that people will steam ahead writing WSGI middleware 
>applications which return file-objects,

The part you keep leaving out is that such middleware is thereby targeted 
at Python 2.2, not 2.1.  The spec explicitly mentions this.


>which will fail on jython simply because putting the following lines in my 
>code is a violation of the spec
>
>if type(app_return) is types.FileType:
>   do_file_stuff(app_return)

What you're doing here isn't a "violation", IMO, merely "out of 
scope".  It's not up to the spec to explain how to make Python 2.1 support 
2.2 features; IMO, that's all you're doing here, and it doesn't hurt anybody.


>[Alan Kennedy]
> >> 2. The spec, as is, explicitly permits authors of cpython applications
> >> to return file-like objects,
>
>[Phillip J. Eby]
> > Only if they are *iterable*, which is only true of the 'file' object in
> > 2.2 and up.
>
>Which seems to me an arbitrary criterion, especially in the light that the 
>iterator nature of the file object will possibly (likely) not be actually 
>used, as described in the snippet above.

The CGI runner won't use fileno(), and neither will many other servers.  I 
don't see how the "iterable" criterion is arbitrary because some objects 
are iterable and others aren't.  Any criterion we choose will by definition 
include some objects and not others.


>In cpython frameworks, the code would look like this
>
>if hasattr(app_object, 'fileno'):
>   do_file_stuff(app_object.fileno())
>else:
>   do_iterator_stuff(app_object)
>
>On jython
>
>if type(app_object) is types.FileType:
>   do_file_stuff(app_object)
>else:
>   do_iterator_stuff(app_object)
>
>Is that so difficult to accept?

But that's exactly what the spec says to do *now*, except that it doesn't 
explicitly bless the type check.  If you really want to have that blessing 
written into the spec, so be it.  I just don't see it as a matter that's in 
scope of the spec, because it's not only Jython-specific, but specific to 
your server as well.  Document your extension as you would any such 
extension.  There's no law against being more *permissive* than the spec 
requires.

I do not see any reason to burden *other* server authors by requiring them 
to support your extension, because no use cases have been presented for 
this for any situation *other* than a Jython 2.1 server trying to run a 
Python 2.2 application.


>[Alan Kennedy]
> >> I think we should explicitly allow return of a file-like object, and
> >> thus freedom to use the read() method, etc.
>
>[Phillip J. Eby]
> > You are trying to let 2.2 code run in a 2.1 Python.
>
>Well, I see it as WSGI forcing me to jump through hoops in order to 
>support the notion of iterability, even when that notion is NOT 
>universally applicable, as the fileno() exception proves.

It's stretching a 2.2 spec to work with older versions of Python, largely 
intended for your benefit, as you were the first person who presented a 
strong use case for supporting *any* pre-2.2 version of Python.


>However, I think the absolute insistence on return objects being iterable 
>is slightly arbitrary and unnecessarily constraining.

For whom?  I've given numerous examples of how trivial it is for code 
targeted to 2.1 or earlier to support making files and even file-like 
objects into iterables.  This is a small burden for those who want their 
code to be portable to such versions.

Similarly, it's not an unreasonable burden for your server to support 
extensions to 2.1 behavior in order to accommodate code not written for 
Python 2.1 compatibility.

It *is* unreasonable to expand the spec to place those burdens on people 
who don't care about supporting 2.1, or who don't care about supporting 2.2 
code under 2.1


>Maybe I'm just wasting my time? Maybe I'm the only one who is interested 
>in seeing a jython WSGI server into which users can drop universal WSGI 
>components and have them just work? Is anyone else interested in such a 
>jython WSGI container? Or should I just toddle off back to J2EE servlets?

I agree with your intentions; I just don't agree that *other* server 
authors should be forced to duplicate your efforts if they don't have that 
use case.  Iterability is the single simplest protocol that is universally 
accessible in any Python used in the last several years.  It doesn't 
require any introspection.  Currently, the common case code for a server 
looks like this:

         result = application(environ, start_response)
         try:
             for data in result:
                 write(data)
         finally:
             if hasattr(result,'close'):
                 result.close()

This is perfectly valid implementation under the spec.  Changing the spec 
to allow the application to return anything *but* iterables means 
complicating *every* server, for the sole benefit of applications that want 
to use 2.2 idioms under Python 2.1.

If I were a server author targeting 2.2 and up (and I will be), I would 
rightly object to adding extra introspection to the above, when it will not 
benefit me or any user in my target audience.  If my server requires 2.2, 
then obviously applications running under it can safely use 2.2 
idioms.  And if they're written for 2.1 they also work.

So here's the resolution: I will slightly expand the section on supporting 
older versions of Python, to explicitly allow a 2.1 server to 
"forward-compatibly" check for 2.2 idioms such as returning a file object.

I'd prefer not to do that, but not because I dislike the approach.  We're 
in "violent agreement" on what *your* server should do about this, and I 
encourage you to implement it.  Our disagreement (as I understand it) is that:

1. I think this is a "server-specific extension" that's outside the spec's 
scope to rule on the validity of, and

2. I don't think that requiring others to do what your code will be doing 
is a good idea, because they don't *need* to, unless they're trying to run 
2.2 code on a 2.1 Python, which should *definitely* not be a requirement of 
the spec.

From py-web-sig at xhaus.com  Wed Sep  1 20:19:27 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  1 20:14:51 2004
Subject: [Web-SIG] Returned application object and fileno.
In-Reply-To: <5.1.1.6.0.20040901131431.0329baf0@mail.telecommunity.com>
References: <5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com>
	<5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<4135B2E7.5060708@andreweland.org> <4134CB04.2010803@xhaus.com>
	<41359391.5000108@andreweland.org> <4135AB95.8040108@xhaus.com>
	<4135B2E7.5060708@andreweland.org>
	<5.1.1.6.0.20040901090854.03107020@mail.telecommunity.com>
	<5.1.1.6.0.20040901094525.02137130@mail.telecommunity.com>
	<5.1.1.6.0.20040901112145.02f84bf0@mail.telecommunity.com>
	<5.1.1.6.0.20040901131431.0329baf0@mail.telecommunity.com>
Message-ID: <413612AF.8040605@xhaus.com>

Phillip,

[Phillip J. Eby]
 > So here's the resolution: I will slightly expand the section on
 > supporting older versions of Python, to explicitly allow a 2.1 server to
 > "forward-compatibly" check for 2.2 idioms such as returning a file 
object.
 >
 > I'd prefer not to do that, but not because I dislike the approach.
 > We're in "violent agreement" on what *your* server should do about this,
 > and I encourage you to implement it.  Our disagreement (as I understand
 > it) is that:
 >
 > 1. I think this is a "server-specific extension" that's outside the
 > spec's scope to rule on the validity of, and
 >
 > 2. I don't think that requiring others to do what your code will be
 > doing is a good idea, because they don't *need* to, unless they're
 > trying to run 2.2 code on a 2.1 Python, which should *definitely* not be
 > a requirement of the spec.

That solution works for me.

Although it may seem that we're in disagreement, I like to see that as a 
necessary part of moving forward :-) Possibly the reason why our points 
are slipping by each other somewhat is because you're making technical 
arguments and I'm making a primarily community/social argument: 
supporting the most up-to-date jython available (which is sadly 
out-of-date wrt cpython).

And I've got to say you've got a much better and cleaner handle on the 
technics than I: I'm just a simple implementer who wants to make his 
framework as useful as possible to as wide an audience as possible.

Just a last few points

1. It was never my intention to force complication of other people's 
frameworks, but I see now that that would be unavoidable if returning a 
file object was a part of the spec, and that would be a bad thing.

2. This whole problem *will* finally go away when jython 2.(2|3|4) 
appears (which I believe it will, though to do this properly will 
require Sun to open their chequebook). If I had the time or resources, 
I'd be putting all my efforts into getting jython 2.2 out the door. But 
I don't have that time or resource, so I'm falling back to doing the 
best that I can with what's available. And jython 2.1 is *rock-solid*, 
and in use all over the place: people trust it.

3. Your solution allows me to address the most common case that I 
believe would cause problems: that of framework authors returning a 
file-object (without realising that cpython 2.2+ was creating an 
iterator for them behind the scenes). I think this is going to be a very 
common design paradigm for WSGI middleware.

4. I'll be doing my level best to get all python code to run under 
modjy, regardless of the version it was written for. There might be a 
lot of frantic paddling going on underneath the surface, but above the 
waterline hopefully everything will be calm and serene .....

Thanks again for this initiative: I believe that WSGI is *definitely* 
the future for python web servers. Great job!

Kind regards,

Alan.
From janssen at parc.com  Thu Sep  2 03:07:19 2004
From: janssen at parc.com (Bill Janssen)
Date: Thu Sep  2 03:07:40 2004
Subject: [Web-SIG] Bill's comments on WSGI draft 1.4
Message-ID: <04Sep1.180724pdt."58612"@synergy1.parc.xerox.com>

Well, thanks to Andrew's comment about my non-participation, I've
finally read PEP 333, version 1.4, and have a few comments.

Phillip, great job, nice reasoning.  I like the general design.  I
think the project as a whole is quite useful.

I've been using a custom framework together with Medusa, and as I read
I tried to imagine how my framework could be implemented under WSGI.
There seem to be no show-stoppers, though I have yet to try it.

A meta comment on commenting on PEP drafts: Without numbered sections,
paragraphs, and lines, there's no effective way to point back to
specific wording in the draft without quoting it.

A few nits about WSGI:

1.  The "environ" parameter must be a Python dict: I think subclasses
should be allowed.  A true subclass supports all methods of its
ancestors, so the rationale presented in the back of the PEP for
excluding them doesn't hold water.  I think the appropriate check
would be to see if the returned class is a subclass of the "dict"
class.  That is, "isinstance(e, dict)" should return True.

2.  The "fileno" attribute on the returned iterable.  I'm a bit
concerned about using operating system file descriptors, due to
resource constraints; I think a better check would be to see if the
returned iterable is a subclass of the "file" class.  That is,
"isinstance(f, file)" should return true.

3.  Comments about "The [status-line] string must be 7-bit
ASCII...containing no control characters."  That's overly restrictive;
I think it would be better to simply refer to RFC 2616 and say that it
should follow the rules defined there for "Reason-Phrase".

4.  Similarly, the rules about header values are more restrictive than
HTTP; they therefore prevent perfectly valid HTTP header values from
being returned.  That's bad.  Again, I think the PEP should simply
refer to RFC 2616 and say, "Use those rules".

5.  The phrase about "if a server or gateway discards or overrides any
application header for any reason, it must record this in a log"; that
should be "should" instead of "must".  Otherwise you'll have your log
cluttered with innocuous header re-write messages, and no way to turn
that off.

6.  The "write()" callable is important; it should not be deprecated
or in some other way made a poor stepchild of the iterable.

7.  If an application returns an iterable after calling write(), are
the strings produced by iteration written after those written by calls
to write?

8.  The note on Unicode: Unfortunately, Web standards like HTTP rely
on using proper character sets.  By *not* using Unicode strings, and
by *not* specifying the character set encoding of the "raw" byte
strings, we open the door for disastrous misunderstandings.  The
safest thing to do would be to require the framework to traffic in
Unicode strings for things like header values, which the WSGI
middleware would translate to or from the various required encodings
used by the server and external protocols.  At least with Unicode
strings you know what encoding is being used.

A riskier, more error-prone option would be to require the byte
strings to be in particular encodings.

The content strings, those written to the "write()" calls, or returned
by the iterable, should in fact be byte vectors, exactly as they are
currently specified.

9.  There should be a non-optional way of indicating the URL scheme,
whether it is "http", "https", or "ftp".  I'd suggest "wsgi.scheme" in
the environ.

Bill
From fumanchu at amor.org  Thu Sep  2 04:12:12 2004
From: fumanchu at amor.org (Robert Brewer)
Date: Thu Sep  2 04:17:50 2004
Subject: [Web-SIG] Bill's comments on WSGI draft 1.4
Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>

Bill Janssen wrote:
> ...
> 6.  The "write()" callable is important; it should not be deprecated
> or in some other way made a poor stepchild of the iterable.

That's been my only question so far. I'd like to at least hear the
rationale behind favoring iterables so heavily over write().


Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org
From pje at telecommunity.com  Thu Sep  2 05:25:56 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  2 05:25:53 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <04Sep1.180724pdt."58612"@synergy1.parc.xerox.com>
Message-ID: <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>

At 06:07 PM 9/1/04 -0700, Bill Janssen wrote:

>1.  The "environ" parameter must be a Python dict: I think subclasses
>should be allowed.  A true subclass supports all methods of its
>ancestors, so the rationale presented in the back of the PEP for
>excluding them doesn't hold water.  I think the appropriate check
>would be to see if the returned class is a subclass of the "dict"
>class.  That is, "isinstance(e, dict)" should return True.

Paradoxically, allowing subclasses eliminates the usefulness of allowing 
subclasses.  Presumably, the purpose of using a subclass is to provide some 
extended behavior, e.g. as an attribute/method, or as a byproduct of 
requesting particular keys or values.  In both cases, these extended 
behaviors would be destroyed the minute that a piece of middleware decides 
to use its *own* dictionary subclass.

This also ignores the issue that creating a dictionary subclass that 
*consistently* enforces some extended behavior (e.g. lazy evaluation of a 
key) is intrinsically difficult and fragile, because new versions of Python 
often introduce new dictionary methods that are not implemented in terms of 
other existing methods, thus breaking a previously "perfect" subclass when 
a new Python version is released.

These are "practicality beats purity" argument, so I need to see some 
*practical* applications of dictionary subclasses that would be useful 
enough to outweigh both of the above issues.


>2.  The "fileno" attribute on the returned iterable.  I'm a bit
>concerned about using operating system file descriptors, due to
>resource constraints; I think a better check would be to see if the
>returned iterable is a subclass of the "file" class.  That is,
>"isinstance(f, file)" should return true.

The purpose of 'fileno' is specifically to allow the use of operating 
system APIs that copy data from one file descriptor to another.  Many 
Python objects have valid 'fileno' attributes besides files, including 
sockets and pipes.  Many non-stdlib objects in common use have 'fileno' 
attributes that serve this purpose.  'select.select' takes objects with 
'fileno', and so on.

Because 'file' has a 'fileno' attribute, 'isinstance(f,file)' implies 
'hasattr(f,"fileno")'.  Therefore, the latter is the preferred behavior 
here, because it doesn't unnecessarily exclude other valid wrappers of file 
descriptors.


>3.  Comments about "The [status-line] string must be 7-bit
>ASCII...containing no control characters."  That's overly restrictive;
>I think it would be better to simply refer to RFC 2616 and say that it
>should follow the rules defined there for "Reason-Phrase".
>
>4.  Similarly, the rules about header values are more restrictive than
>HTTP; they therefore prevent perfectly valid HTTP header values from
>being returned.  That's bad.  Again, I think the PEP should simply
>refer to RFC 2616 and say, "Use those rules".

These restrictions are intended to simplify servers and middleware; nobody 
has yet presented an example of a scenario where this imposed any practical 
limitation.

The fallback position would be that the status string and headers must not 
be CR or CRLF terminated.  But, I'd prefer to stick with a "no embedded 
control characters" approach, mainly to avoid situations where people embed 
'\n' and think that will be correct.

Here's what RFC 2616 has to say about TEXT, which is the format of the 
status message and of header values:

    The TEXT rule is only used for descriptive field contents and values
    that are not intended to be interpreted by the message parser. Words
    of *TEXT MAY contain characters from character sets other than ISO-
    8859-1 [22] only when encoded according to the rules of RFC 2047
    [14].

        TEXT           = <any OCTET except CTLs,
                         but including LWS>

    A CRLF is allowed in the definition of TEXT only as part of a header
    field continuation. It is expected that the folding LWS will be
    replaced with a single SP before interpretation of the TEXT value.

In other words, no control characters except for folding, and 7-bit ASCII 
with optional ISO-8859-1.  In practice, however, RFC 2047 allows for 
encoding ISO-8859-1 *in* 7-bit ASCII as well.  So, the only actual 
limitation being imposed by the PEP is on folding, and on the necessary 
encoding of non-ASCII characters.

Again, this is a practicality v. purity issue.  Are you aware of any 
applications that currently fold their headers, or transmit ISO-8859-1 
characters without using the encoding prescribed by RFC 2047?  Is there a 
practical use case for either one?

I'm willing to listen on this point, but as of the moment I find it hard to 
imagine what the use case for either of these features is.  By contrast, I 
do have very specific use cases in mind where supporting those features 
causes problems:

* Applications creating broken headers (e.g. with '\n' instead of '\r\n') 
or broken folds

* Applications mistakenly transmitting Unicode without considering encoding 
issues

* Middleware and servers forgetting to factor out folds when parsing data 
for interpretation

* In order to ensure safe interpretation, smart middleware and server 
developers will have to write routines to *unfold* potentially-folded 
headers; why not just disallow folding to begin with?


>5.  The phrase about "if a server or gateway discards or overrides any
>application header for any reason, it must record this in a log"; that
>should be "should" instead of "must".  Otherwise you'll have your log
>cluttered with innocuous header re-write messages, and no way to turn
>that off.

How about "must provide the *option*" and "must be enabled by default"? Or, 
leave it as is, but add something like, "may provide the user with the 
option of suppressing this output, so that users who cannot fix a broken 
application are not forced to bear the pain of its error."


>6.  The "write()" callable is important; it should not be deprecated
>or in some other way made a poor stepchild of the iterable.

But it *is* one.  The presence of the 'write()' facility significantly 
increases the implementation complexity for middleware and server 
authors.  If it weren't necessary to support existing streaming APIs, it 
wouldn't exist.

Earlier drafts treated it as a peer, which led to people making bad 
assumptions about its proper use.  Making it a "poor stepchild" encourages 
people to investigate it only if they really need it, and only a very few 
applications actually need it.


>7.  If an application returns an iterable after calling write(), are
>the strings produced by iteration written after those written by calls
>to write?

Yes.  This is implicit in the way 'write()' and the iterable are defined, 
because the server must transmit a block yielded or passed to write() 
before returning control to the application.  The only way to meet this 
constraint is for them to occur in sequence.

However, the language should perhaps be clarified to be explicit about this 
point, and to address what happens if code *within* the iterator calls 
'write()'.  (I don't think it should be allowed to, but I'm open to 
arguments either way.)


>8.  The note on Unicode: Unfortunately, Web standards like HTTP rely
>on using proper character sets.  By *not* using Unicode strings, and
>by *not* specifying the character set encoding of the "raw" byte
>strings, we open the door for disastrous misunderstandings.  The
>safest thing to do would be to require the framework to traffic in
>Unicode strings for things like header values, which the WSGI
>middleware would translate to or from the various required encodings
>used by the server and external protocols.  At least with Unicode
>strings you know what encoding is being used.

This seems at odds with your previous desire to use RFC 2616, which is 
pretty clear that it's ISO-8859-1 or RFC 2047.  PEP 333 goes further and 
says, it's ASCII, dammit, and use MIME header encodings (RFC 2047) if you 
need to do something special, because God help you if you're trying to mess 
with non-ASCII in HTTP headers and you don't know how to deal with that stuff.

Granted, that part could be more explicit in the PEP, so I'll work on that.  :)

(Maybe not this week; I expect to spend tomorrow putting hurricane panels 
on my house, just ahead of Frances' arrival...)


>A riskier, more error-prone option would be to require the byte
>strings to be in particular encodings.

That's actually what's required, it's merely implied by the PEP rather than 
explicitly stated.  But it's a fully RFC-compliant way to do it.


>The content strings, those written to the "write()" calls, or returned
>by the iterable, should in fact be byte vectors, exactly as they are
>currently specified.

Glad there was something you liked.  ;)  (j/k)


>9.  There should be a non-optional way of indicating the URL scheme,
>whether it is "http", "https", or "ftp".  I'd suggest "wsgi.scheme" in
>the environ.

I rather like this, although I don't at all see how FTP gets into 
this.  What the heck would CGI variables for FTP look like, I 
wonder?  Anyway, it's handy for "http" and "https" at the very least.  I'd 
prefer "wsgi.url_scheme" for the name, though, as it's otherwise a somewhat 
ambiguous name.

From pje at telecommunity.com  Thu Sep  2 05:32:12 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  2 05:32:10 2004
Subject: [Web-SIG] Bill's comments on WSGI draft 1.4
In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amo
	rhq.net>
Message-ID: <5.1.1.6.0.20040901232754.02323050@mail.telecommunity.com>

At 07:12 PM 9/1/04 -0700, Robert Brewer wrote:
>Bill Janssen wrote:
> > ...
> > 6.  The "write()" callable is important; it should not be deprecated
> > or in some other way made a poor stepchild of the iterable.
>
>That's been my only question so far. I'd like to at least hear the
>rationale behind favoring iterables so heavily over write().

One important reason: the server can suspend an iterable's execution 
without tying up a thread.  It can therefore potentially use a much smaller 
thread pool to handle a given number of connections, because the threads 
are only tied up while they're executing an iterator 'next()' call.

By contrast, 'write()' occurs *within* the application execution, so the 
only way to suspend execution is to suspend the thread (e.g. waiting for a 
lock).

From ianb at colorstudy.com  Thu Sep  2 07:48:56 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Sep  2 07:49:04 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
References: <5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
Message-ID: <4136B448.8070707@colorstudy.com>

Phillip J. Eby wrote:
> At 06:07 PM 9/1/04 -0700, Bill Janssen wrote:
> 
>> 1.  The "environ" parameter must be a Python dict: I think subclasses
>> should be allowed.  A true subclass supports all methods of its
>> ancestors, so the rationale presented in the back of the PEP for
>> excluding them doesn't hold water.  I think the appropriate check
>> would be to see if the returned class is a subclass of the "dict"
>> class.  That is, "isinstance(e, dict)" should return True.
> 
> 
> Paradoxically, allowing subclasses eliminates the usefulness of allowing 
> subclasses.  Presumably, the purpose of using a subclass is to provide 
> some extended behavior, e.g. as an attribute/method, or as a byproduct 
> of requesting particular keys or values.  In both cases, these extended 
> behaviors would be destroyed the minute that a piece of middleware 
> decides to use its *own* dictionary subclass.

I agree strongly with you on this.  Subclassing built in types is almost 
only useful for showing off clever tricks and distracting people who 
want to change the language.  Code constantly contains assumptions that 
you can recreate built in types from their components, and then you lose 
the subclass.  I also don't see any advantage, beyond theoretical.  Any 
attempt to leverage a subclass is just as likely to cause problems as be 
a help.

>> 9.  There should be a non-optional way of indicating the URL scheme,
>> whether it is "http", "https", or "ftp".  I'd suggest "wsgi.scheme" in
>> the environ.
> 
> 
> I rather like this, although I don't at all see how FTP gets into this.  
> What the heck would CGI variables for FTP look like, I wonder?  Anyway, 
> it's handy for "http" and "https" at the very least.  I'd prefer 
> "wsgi.url_scheme" for the name, though, as it's otherwise a somewhat 
> ambiguous name.

This sounds good to me too.  I wanted HTTPS=on to be required, but 
wsgi.url_scheme would be more general anyway.

It's pretty easy to imagine translating FTP to CGI variables, really. 
The requested URL (SCRIPT_NAME+PATH_INFO) is the file you are getting or 
putting, the REQUEST_METHOD is maybe GET or PUT (or maybe STOR and RETR, 
but GET and PUT would be more natural).  Most of the other commands map 
to WeDAV methods.  Obviously the server has to keep track of some state, 
but typically that state is boring to the application anyway.  But 
that's all an aside.  I can imagine mailto as well, when you pipe email 
to your application.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Thu Sep  2 08:24:55 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Sep  2 08:25:02 2004
Subject: [Web-SIG] Status code, status header
In-Reply-To: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
Message-ID: <4136BCB7.8090309@colorstudy.com>

Phillip J. Eby wrote:
> At 10:01 PM 8/30/04 -0500, Ian Bicking wrote:
> 
>> After a little thought, I'm -1 on a status header, even with 
>> email.Message.
> 
> 
> I think email.Message is also dead, due to its absence in Python 
> versions prior to 2.2.
> 
> 
>> I'm also +1 on turning status into an integer.  I think it makes 
>> things a little simpler, and those message strings are just a 
>> distraction.  The final server can put that string in ("200 OK", etc) 
>> if it wants to, but if it doesn't it doesn't matter.
> 
> 
> I'm still -1 on this, for the reasons stated previously.  I might be 
> convinced if you can show me that a significant number of popular 
> servers already have the necessary table(s) to do this with; e.g. 
> Twisted, ZServer, Apache (CGI/FastCGI), mod_python, etc.

* Twisted does, in twisted.protocols.http
* mod_python must somewhere; I don't think it allows you to provide a 
reason, you can only provide an integer code.
* Zope does in ZPublisher.HTTPResponse
* Apache does not add the reason string to CGI scripts that provide an 
explicit Status header but no reason.  But it provides reasons for any 
status that it generates.  I don't know about FastCGI.

Part of why I think it's not useful is that in many cases the reason 
string is hard coded.  In that case the reason string is synonymous with 
the code, and cannot be changed.  Nor is anyone paying attention if you 
do change it, and there's nothing constructive that can be done with 
that string.

> In theory, the "reason-phrase" can be null.  In practice, I wonder.  
> Also, I don't think the message strings are "just a distraction": they 
> clarify the intent of the code that contains them.

No one would ever pay attention to the string when there's that pleasant 
integer code to parser out.  Plus the spec says not to.

The names are fine, but the code and the reason string are redundant. 
The names are better represented with Python names, not a string that 
gets tacked on.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Thu Sep  2 08:28:11 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Sep  2 08:33:02 2004
Subject: [Web-SIG] wsgi.fatal_errors
In-Reply-To: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
Message-ID: <4136BD7B.90308@colorstudy.com>

Phillip J. Eby wrote:
> At 11:15 PM 8/30/04 -0700, tony@lownds.com wrote:
> 
>> > Here are some changes I've proposed in the last few days to resolve 
>> issues
>> > people brought up, but which I haven't gotten much feedback on:
>> >
>> > * 'wsgi.fatal_errors' key for exceptions that apps and middleware
>> > shouldn't
>> > trap
>> >
>>
>> What about defining an exception class that applications can raise 
>> with an
>> HTML payload, which servers are supposed to send the to the client?
>> Middleware should be free to alter the payload as much as they like. The
>> server should not send the payload when content-type is not html.
>>
>> By using exceptions as a backchannel, the application and middleware do
>> not have to keep track of the state to sanely handle an error.
> 
> 
> Interesting.  But I think you've just given me an idea for a possibly 
> simpler way to do this, with some other advantages.
> 
> Suppose that instead of 'start_response(status,headers)' we had 
> 'set_response(status,headers,body=None)'.  And the difference would be 
> that our 'set_response' does nothing until/unless you call write() or 
> yield a result from the return iterable.  Therefore, you could call 
> 'set_response' multiple times, with only the last such call taking 
> effect.  (If you supply a non-None 'body', then calling write() or 
> returning an iterable is an error.)

This seems pretty reasonable.  How necessary is that optional body 
argument?  Couldn't you just use the write argument or return an iterable?

> Now consider error handling middleware: it simply calls 
> 'set_response(error_status,error_headers,error_body)', and returns None.
> 
> At this point, we've isolated the complexity to exist only for streaming 
> responses once the first body chunk has been generated.  We can handle 
> this by making a call to 'set_response()' a fatal error if a body chunk 
> has been generated.  Thus, no special handling is needed by an exception 
> handler: it just tries to do 'set_response()', and allows the fatal 
> error (if any) to propagate.  Now, the server can catch the fatal error 
> and deal with it.
> 
> I think this will let us keep all of the complications in the server, 
> where they always have to exist, no matter what else we do.  
> Exception-handling middleware is then delightfully simple.
> 
> On the other hand, output-transforming middleware becomes somewhat more 
> complex, as it would now have three output sources to transform (body 
> param to set_response(), write(), and output iterable).
> 
> This is a fairly significant change to the spec, that introduces lots of 
> new angles to cover.  But, I think it could be an "exceptionally" clean 
> solution to the problem.  ;)

It sounded good until then; now I don't know.  I think I'm -1 on that pun.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From neel at mediapulse.com  Thu Sep  2 15:19:27 2004
From: neel at mediapulse.com (Michael C. Neel)
Date: Thu Sep  2 15:19:04 2004
Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on
	WSGI draft 1.4
In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
Message-ID: <1094131167.4727.27.camel@mike.mediapulse.com>

Well, I've seen alot of back and forth on file objects, write(), etc.  I
think it's of little issue myself, not that hard to return an interface
that will support both methods.   Let the programming working on the
middlware/application decide against the tradeoffs from one method to
another.

In the framework I use, I've actually altered it to allow it's context
object (which is connected to the output stream, among other things) to
be used as a file object.  The first need for this was to allow me to
pass the object off to a cvs.writer object, when I then called with the
result of a DB-API 2.0 fetchall(); and made a "Download as CSV" button
work in no more than 4 lines of code.  I could also see doing this with
XML classes for a WSDL/SOAP system.  Really off the wall, you could do
this with the logging module, and send your logging statments to another
server.

I suppose with any of these I could grab the StringIO module and add a
few extra lines to my code.  Then again, a WSGI system could also do
that in it's implementation and ever offer me the options of buffered or
non-buffered output.

As it's been said here before, adoption of the frameworks and server is
going to be critical to WSGI.  So I'd opt for more choice and
flexibility; we're all smart guys here and I don't think we would turn
down a good idea because of complexity.

Mike

From pje at telecommunity.com  Thu Sep  2 15:31:45 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  2 15:31:45 2004
Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's
	comments on WSGI draft 1.4
In-Reply-To: <1094131167.4727.27.camel@mike.mediapulse.com>
References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
	<3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
Message-ID: <5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com>

At 09:19 AM 9/2/04 -0400, Michael C. Neel wrote:
>Well, I've seen alot of back and forth on file objects, write(), etc.  I
>think it's of little issue myself, not that hard to return an interface
>that will support both methods.   Let the programming working on the
>middlware/application decide against the tradeoffs from one method to
>another.
>
>In the framework I use, I've actually altered it to allow it's context
>object (which is connected to the output stream, among other things) to
>be used as a file object.  The first need for this was to allow me to
>pass the object off to a cvs.writer object, when I then called with the
>result of a DB-API 2.0 fetchall(); and made a "Download as CSV" button
>work in no more than 4 lines of code.  I could also see doing this with
>XML classes for a WSDL/SOAP system.  Really off the wall, you could do
>this with the logging module, and send your logging statments to another
>server.
>
>I suppose with any of these I could grab the StringIO module and add a
>few extra lines to my code.  Then again, a WSGI system could also do
>that in it's implementation and ever offer me the options of buffered or
>non-buffered output.

Sorry, I've read through the above a few times and I haven't been able to 
figure out exactly what it is that you're proposing, or if you're proposing 
something at all.  :(


>As it's been said here before, adoption of the frameworks and server is
>going to be critical to WSGI.  So I'd opt for more choice and
>flexibility; we're all smart guys here and I don't think we would turn
>down a good idea because of complexity.

These sentences seem diametrically opposed to me; choice and flexibility is 
precisely what we *don't* want in WSGI, as it dramatically increases the 
opportunity for breaking interoperability.  Right now, it's still possible 
to write "dirt simple" implementations, because the requirements are 
minimal even though there are some options for improved performance.

There's a *big* difference between an option and a choice.  Choices double 
the work for everybody, while options only affect people who want to use 
them.  To the greatest extent possible, we should eliminate choices, and 
keep the number of options reasonable.  (For example, a few revisions ago, 
we dropped the "choice" of not returning an iterable.)

From pje at telecommunity.com  Thu Sep  2 15:39:12 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  2 15:39:10 2004
Subject: [Web-SIG] Status code, status header
In-Reply-To: <4136BCB7.8090309@colorstudy.com>
References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
	<5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040902093214.033888f0@mail.telecommunity.com>

At 01:24 AM 9/2/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>At 10:01 PM 8/30/04 -0500, Ian Bicking wrote:
>>>I'm also +1 on turning status into an integer.  I think it makes things 
>>>a little simpler, and those message strings are just a distraction.  The 
>>>final server can put that string in ("200 OK", etc) if it wants to, but 
>>>if it doesn't it doesn't matter.
>>
>>I'm still -1 on this, for the reasons stated previously.  I might be 
>>convinced if you can show me that a significant number of popular servers 
>>already have the necessary table(s) to do this with; e.g. Twisted, 
>>ZServer, Apache (CGI/FastCGI), mod_python, etc.
>
>* Twisted does, in twisted.protocols.http
>* mod_python must somewhere; I don't think it allows you to provide a 
>reason, you can only provide an integer code.
>* Zope does in ZPublisher.HTTPResponse

Technically, ZPublisher is part of the *application* side, not the server 
side, which is a point in favor of the application side setting the reason.


>* Apache does not add the reason string to CGI scripts that provide an 
>explicit Status header but no reason.

So, a CGI gateway would have to have a table, or else generate messages 
like "502 Dude, this is whack!".  :)


>>In theory, the "reason-phrase" can be null.  In practice, I wonder.
>>Also, I don't think the message strings are "just a distraction": they 
>>clarify the intent of the code that contains them.
>
>No one would ever pay attention to the string when there's that pleasant 
>integer code to parser out.  Plus the spec says not to.

Huh?  Are you saying that:

      start_response(405,headers)

is more readable than:

      start_response("405 Method Not Allowed",headers)

????

From pje at telecommunity.com  Thu Sep  2 15:42:51 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  2 15:42:48 2004
Subject: [Web-SIG] wsgi.fatal_errors
In-Reply-To: <4136BD7B.90308@colorstudy.com>
References: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040902093923.03386ec0@mail.telecommunity.com>

At 01:28 AM 9/2/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>At 11:15 PM 8/30/04 -0700, tony@lownds.com wrote:
>>>What about defining an exception class that applications can raise with an
>>>HTML payload, which servers are supposed to send the to the client?
>>>Middleware should be free to alter the payload as much as they like. The
>>>server should not send the payload when content-type is not html.
>>>
>>>By using exceptions as a backchannel, the application and middleware do
>>>not have to keep track of the state to sanely handle an error.
>>
>>Interesting.  But I think you've just given me an idea for a possibly 
>>simpler way to do this, with some other advantages.
>>Suppose that instead of 'start_response(status,headers)' we had 
>>'set_response(status,headers,body=None)'.  And the difference would be 
>>that our 'set_response' does nothing until/unless you call write() or 
>>yield a result from the return iterable.  Therefore, you could call 
>>'set_response' multiple times, with only the last such call taking 
>>effect.  (If you supply a non-None 'body', then calling write() or 
>>returning an iterable is an error.)
>
>This seems pretty reasonable.  How necessary is that optional body 
>argument?  Couldn't you just use the write argument or return an iterable?

The idea was to use it as a way to bypass non-exception middleware, without 
raising a fatal error.  OTOH, maybe Tony's approach is actually better.


>>This is a fairly significant change to the spec, that introduces lots of 
>>new angles to cover.  But, I think it could be an "exceptionally" clean 
>>solution to the problem.  ;)
>
>It sounded good until then; now I don't know.  I think I'm -1 on that pun.

I get the humor of the second sentence; is the first sentence also humor, 
or is it serious?

From neel at mediapulse.com  Thu Sep  2 15:55:48 2004
From: neel at mediapulse.com (Michael C. Neel)
Date: Thu Sep  2 15:55:27 2004
Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments
	on WSGI draft 1.4
In-Reply-To: <5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com>
References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
	<3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
	<5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com>
Message-ID: <1094133348.4727.45.camel@mike.mediapulse.com>

On Thu, 2004-09-02 at 09:31, Phillip J. Eby wrote:

> Sorry, I've read through the above a few times and I haven't been able to 
> figure out exactly what it is that you're proposing, or if you're proposing 
> something at all.  :(

Sorry, I guess i'm not clear, but I was making a case for file objects
based upon my past use of them.

> These sentences seem diametrically opposed to me; choice and flexibility is 
> precisely what we *don't* want in WSGI, as it dramatically increases the 
> opportunity for breaking interoperability.  Right now, it's still possible 
> to write "dirt simple" implementations, because the requirements are 
> minimal even though there are some options for improved performance.

At the risk of angering a mob; what's on the table isn't perl level of
'there is more than one way to do it'; it's a object that supports two
interfaces.  Python's standard lib is full of objects that are
file-like, so I don't even see this as something that is a stretch from
the norm.

> There's a *big* difference between an option and a choice.  Choices double 
> the work for everybody, while options only affect people who want to use 
> them.  To the greatest extent possible, we should eliminate choices, and 
> keep the number of options reasonable.  (For example, a few revisions ago, 
> we dropped the "choice" of not returning an iterable.)

Again, I don't see how this is alot of work or enough work that it
prevents anyone from using it.  The WSGI can simple state that the
return can be used both as a file object and an iterable (which isn't
that a bit redundant, I'll have to check but file objects are iterable
correct?)

I think this is the only issue over the PEP, at least the only major one
from the amount of posts.  Allowing both interfaces would be acceptable
I think here, and solves the problem.  Also, this is just a pre-PEP on a
SIG atm; from PEPs I've followed in the past things are going to get
worse when it's before the python community, and you'll really want the
support of your SIG to help keep your sanity though the process, lol.

Mike

From py-web-sig at xhaus.com  Thu Sep  2 16:33:26 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep  2 16:28:46 2004
Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments
	on WSGI draft 1.4
In-Reply-To: <1094133348.4727.45.camel@mike.mediapulse.com>
References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>	<3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>	<5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com>
	<1094133348.4727.45.camel@mike.mediapulse.com>
Message-ID: <41372F36.2020806@xhaus.com>

[Michael C. Neel]
 > The WSGI can simple state that the
 > return can be used both as a file object and an iterable (which isn't
 > that a bit redundant, I'll have to check but file objects are iterable
 > correct?)

I spent yesterday discussing this with Phillip, and now that I 
understand his design decision, I think it's the right one.

Having frameworks and *all* middleware components deal with both files 
and iterables is an extra and unnecessary complication.

And under python 2.2+, it's irrelevant anyway, because files *are* 
iterables. A problem only arises on <= 2.1 interpreters, which don't 
support iterators nearly as well as 2.2. And that's only a problem 
because of jython being 2.1 only: a problem I seem determined to make my 
own ;-)

The strength of returning an iterable is that the framework can then 
control *when* the output is generated and sent. This fits perfectly 
with python's greatest strength in the web arena: it's simple and 
powerful mechanisms for event-driven processing.

Robert Oschler asked earlier about the write callable vs. returning an 
iterator. I was going to reply, but Phillip got there before me. I would 
only add the following to his excellent explanation.

1. The write callable is only there to support "push" applications, 
where the application generates output and then pushes it through a 
channel set-up by the server/framework, thus relegating the framework to 
a kind of dumb switchboard. This sort of design is usually used in 
threaded servers, which can present scalability problems.

2. The main focus on iterators is the right one because it not only 
supports "push", as described above, but it also supports "pull", i.e. 
where the framework "pulls" output from the application when the time is 
right. The reason why this is a good thing is because the framework is 
in the best position to know when the client is ready to actually 
receive the output, through the use of events/readiness-notification on 
the client socket. The output is only transiently created when required 
and transmitted immediately to the user (potentially with no copying or 
buffering at all!): you don't have large lumps of output hanging around, 
consuming memory.

If you want to create an architecture that works for both "push" and 
"pull", iterators are the way to go

I do find it interesting that we've had no comments from the Zope or 
Twisted people. Glad to see Medusa people here though :-)

Kind regards,

Alan.

P.S. Phillip, I hope you're not affected by that hurricane! I have 
friends in Tampa who counted themselves lucky to have escaped Charley: 
now here comes another one! It appears on the surface that the frequency 
of hurricanes in the gulf is increasing.
From pje at telecommunity.com  Thu Sep  2 16:36:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  2 16:36:29 2004
Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's
	comments on WSGI draft 1.4
In-Reply-To: <41372F36.2020806@xhaus.com>
References: <1094133348.4727.45.camel@mike.mediapulse.com>
	<3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
	<3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
	<5.1.1.6.0.20040902092500.033742b0@mail.telecommunity.com>
	<1094133348.4727.45.camel@mike.mediapulse.com>
Message-ID: <5.1.1.6.0.20040902103439.0244d5f0@mail.telecommunity.com>

At 03:33 PM 9/2/04 +0100, Alan Kennedy wrote:

>The strength of returning an iterable is that the framework can then 
>control *when* the output is generated and sent. This fits perfectly with 
>python's greatest strength in the web arena: it's simple and powerful 
>mechanisms for event-driven processing.

For clarity's sake, please don't call gateways and servers "frameworks"; 
we're reserving that term for the application side.


>P.S. Phillip, I hope you're not affected by that hurricane!

I'm directly in its path, and have still not yet obtained anything to cover 
my windows with, which is why I'm now signing off discussion indefinitely.  :(

From ianb at colorstudy.com  Thu Sep  2 17:47:10 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Sep  2 17:47:38 2004
Subject: [Web-SIG] wsgi.fatal_errors
In-Reply-To: <5.1.1.6.0.20040902093923.03386ec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
	<5.1.1.6.0.20040902093923.03386ec0@mail.telecommunity.com>
Message-ID: <4137407E.7060308@colorstudy.com>

Phillip J. Eby wrote:
>>> This is a fairly significant change to the spec, that introduces lots 
>>> of new angles to cover.  But, I think it could be an "exceptionally" 
>>> clean solution to the problem.  ;)
>>
>> It sounded good until then; now I don't know.  I think I'm -1 on that 
>> pun.
> 
> I get the humor of the second sentence; is the first sentence also 
> humor, or is it serious?

No, I was just commenting on the pun ;)

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From ianb at colorstudy.com  Thu Sep  2 17:58:47 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Sep  2 17:59:21 2004
Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments
	on	WSGI draft 1.4
In-Reply-To: <1094131167.4727.27.camel@mike.mediapulse.com>
References: <3A81C87DC164034AA4E2DDFE11D258E3022EAC@exchange.hqamor.amorhq.net>
	<1094131167.4727.27.camel@mike.mediapulse.com>
Message-ID: <41374337.4090807@colorstudy.com>

Michael C. Neel wrote:
> Well, I've seen alot of back and forth on file objects, write(), etc.  I
> think it's of little issue myself, not that hard to return an interface
> that will support both methods.   Let the programming working on the
> middlware/application decide against the tradeoffs from one method to
> another.
> 
> In the framework I use, I've actually altered it to allow it's context
> object (which is connected to the output stream, among other things) to
> be used as a file object.  The first need for this was to allow me to
> pass the object off to a cvs.writer object, when I then called with the
> result of a DB-API 2.0 fetchall(); and made a "Download as CSV" button
> work in no more than 4 lines of code.  I could also see doing this with
> XML classes for a WSDL/SOAP system.  Really off the wall, you could do
> this with the logging module, and send your logging statments to another
> server.

FWIW, using WSGI I've handled like:

class FakeFile: pass

write = start_response(status, headers)
f = FakeFile()
f.write = write
# now f is my file-like object...


Or, it was suggested:

start_response(status, headers)
lst = []
f = FakeFile()
f.write = lst.append
# use f...
return lst


This way you are missing a couple methods that files typically have 
(writelines I guess); then again, you could add those to FakeFile easily 
enough.  I find it feels a little hackish, but I think it should be 
reliable enough.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From ianb at colorstudy.com  Thu Sep  2 18:19:11 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Sep  2 18:19:39 2004
Subject: [Web-SIG] Status code, status header
In-Reply-To: <5.1.1.6.0.20040902093214.033888f0@mail.telecommunity.com>
References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
	<5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
	<5.1.1.6.0.20040902093214.033888f0@mail.telecommunity.com>
Message-ID: <413747FF.4030803@colorstudy.com>

Phillip J. Eby wrote:
> At 01:24 AM 9/2/04 -0500, Ian Bicking wrote:
> 
>> Phillip J. Eby wrote:
>>
>>> At 10:01 PM 8/30/04 -0500, Ian Bicking wrote:
>>>
>>>> I'm also +1 on turning status into an integer.  I think it makes 
>>>> things a little simpler, and those message strings are just a 
>>>> distraction.  The final server can put that string in ("200 OK", 
>>>> etc) if it wants to, but if it doesn't it doesn't matter.
>>>
>>>
>>> I'm still -1 on this, for the reasons stated previously.  I might be 
>>> convinced if you can show me that a significant number of popular 
>>> servers already have the necessary table(s) to do this with; e.g. 
>>> Twisted, ZServer, Apache (CGI/FastCGI), mod_python, etc.
>>
>>
>> * Twisted does, in twisted.protocols.http
>> * mod_python must somewhere; I don't think it allows you to provide a 
>> reason, you can only provide an integer code.
>> * Zope does in ZPublisher.HTTPResponse
> 
> Technically, ZPublisher is part of the *application* side, not the 
> server side, which is a point in favor of the application side setting 
> the reason.
> 
> 
>> * Apache does not add the reason string to CGI scripts that provide an 
>> explicit Status header but no reason.
> 
> 
> So, a CGI gateway would have to have a table, or else generate messages 
> like "502 Dude, this is whack!".  :)

It could generate no message, which would work just fine.  Or it could 
include the table, which is finite and known.

>>> In theory, the "reason-phrase" can be null.  In practice, I wonder.
>>> Also, I don't think the message strings are "just a distraction": 
>>> they clarify the intent of the code that contains them.
>>
>>
>> No one would ever pay attention to the string when there's that 
>> pleasant integer code to parser out.  Plus the spec says not to.
> 
> 
> Huh?  Are you saying that:
> 
>      start_response(405,headers)
> 
> is more readable than:
> 
>      start_response("405 Method Not Allowed",headers)

I would say that start_response(http.METHOD_NOT_ALLOWED, headers) is 
more readable.  Or:
   start_response(405, headers) # method not allowed
is just as readable.  "Method not allowed" is just a comment, it isn't 
program information.  Why propagate a comment through the system? 
Especially a comment that's assumed to be fixed and derivative?  Or if 
it's not derivative, then you are just messing with people's heads.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From py-web-sig at xhaus.com  Thu Sep  2 19:15:39 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep  2 19:20:28 2004
Subject: [Web-SIG] Integer status codes.
Message-ID: <4137553B.5000208@xhaus.com>

Dear Web-Sig,

Just a datapoint on status codes about J2EE.

J2EE uses integer status codes, with human readable constants available 
in the javax.servlet.http.HttpServletRequest class, which works well.

http://java.sun.com/j2ee/1.4/docs/api/index.html

But I suppose that since WSGI has no classes to hang such constants on, 
it cannot use that tidy approach.

Perhaps an environ variable called "wsgi.status"? Which could be a 
dictionary mapping integers to status strings? E.G. applications would 
write code like this

def handler(environ, start_response):
   start_response(environ['wsgi.status']['FILE_NOT_FOUND'], [] )

Or maybe just a simple object containing integer constants?

def handler(environ, start_response):
   start_response(environ['wsgi.status'].FILE_NOT_FOUND, [] )

I don't think I'd find the management of such a table/mapping that 
onerous. After all, there's only a few tens of status codes, and they 
don't change very often.

And the code to implement it would be universal, i.e. easily copyable 
and pastable. If I can paste it into email, is it that much of a code 
management hassle?

#----------------------------------------
status_to_int = {
  'CONTINUE'                        : 100,
  'SWITCHING_PROTOCOLS'             : 101,
  'OK'                              : 200,
  'CREATED'                         : 201,
  'ACCEPTED'                        : 202,
  'NON_AUTHORITATIVE_INFORMATION'   : 203,
  'NO_CONTENT'                      : 204,
  'RESET_CONTENT'                   : 205,
  'PARTIAL_CONTENT'                 : 206,
  'MULTIPLE_CHOICES'                : 300,
  'MOVED_PERMANENTLY'               : 301,
  'MOVED_TEMPORARILY'               : 302,
  'SEE_OTHER'                       : 303,
  'NOT_MODIFIED'                    : 304,
  'USE_PROXY'                       : 305,
  'TEMPORARY_REDIRECT'              : 307,
  'BAD_REQUEST'                     : 400,
  'UNAUTHORIZED'                    : 401,
  'PAYMENT_REQUIRED'                : 402,
  'FORBIDDEN'                       : 403,
  'NOT_FOUND'                       : 404,
  'METHOD_NOT_ALLOWED'              : 405,
  'NOT_ACCEPTABLE'                  : 406,
  'PROXY_AUTHENTICATION_REQUIRED'   : 407,
  'REQUEST_TIMEOUT'                 : 408,
  'CONFLICT'                        : 409,
  'GONE'                            : 410,
  'LENGTH_REQUIRED'                 : 411,
  'PRECONDITION_FAILED'             : 412,
  'REQUEST_ENTITY_TOO_LARGE'        : 413,
  'REQUEST_URI_TOO_LONG'            : 414,
  'UNSUPPORTED_MEDIA_TYPE'          : 415,
  'REQUESTED_RANGE_NOT_SATISFIABLE' : 416,
  'EXPECTATION_FAILED'              : 417,
  'INTERNAL_SERVER_ERROR'           : 500,
  'NOT_IMPLEMENTED'                 : 501,
  'BAD_GATEWAY'                     : 502,
  'SERVICE_UNAVAILABLE'             : 503,
  'GATEWAY_TIMEOUT'                 : 504,
  'HTTP_VERSION_NOT_SUPPORTED'      : 505,
}
#----------------------------------------

I'm also happy to see things remain as they are. Having a human readable 
version of the code is handy for code self-documentation purposes.

So I suppose it works just as well for authors to write the following 
examples in their own code

start_response("200 Au quay!", [] )
start_response("200 Cool", [] )
start_response("200 That hoopy frood knows where his towel is", [] )

As long as the integer bit actually evaluates to an integer, it wouldn't 
be a problem.

I sort of like the ability to play with these strings: think of the 
(Monty) pythonisms we could use in our middleware!

start_response("404 Defenestrated", [] )
start_response("410 It's pining for the fjords!", [] )
start_response("414 Who are you calling big-nose, big-nose?", [] )
start_response("417 But I thought this was a cheese shop?", [] )

Regards,

Alan.

From py-web-sig at xhaus.com  Thu Sep  2 19:33:02 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep  2 19:28:20 2004
Subject: [Web-SIG] Integer status codes.
In-Reply-To: <4137553B.5000208@xhaus.com>
References: <4137553B.5000208@xhaus.com>
Message-ID: <4137594E.3000109@xhaus.com>

[Alan Kennedy]
> J2EE uses integer status codes, with human readable constants available 
> in the javax.servlet.http.HttpServletRequest class, which works well.
> 
> http://java.sun.com/j2ee/1.4/docs/api/index.html

D'oh!

That link should have been

http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/http/HttpServletResponse.html

Regards,

Alan.

From janssen at parc.com  Thu Sep  2 21:47:03 2004
From: janssen at parc.com (Bill Janssen)
Date: Thu Sep  2 21:47:59 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 
In-Reply-To: Your message of "Wed, 01 Sep 2004 20:25:56 PDT."
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com> 
Message-ID: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com>

I think we need some terminology that I don't remember seeing.  There
are two sides to WSGI, the server side, which I'll call the "socket",
and the framework side, which I'll call the "plug".  If there are
other terms already in use, please let me know.

Let me ask first, has anyone written a "socket" layer for Medusa?

> >1.  The "environ" parameter must be a Python dict: I think subclasses
> >should be allowed.
> [...various reasons why this might be a bad idea are introducted...]
> These are "practicality beats purity" argument, so I need to see some 
> *practical* applications of dictionary subclasses that would be useful 
> enough to outweigh both of the above issues.

Phillip, these are good engineering reasons for socket developers not
to use subclasses, but that restriction doesn't belong in WSGI.  They
may have other reasons for using subclasses that we haven't thought of
(perhaps because they're using these dicts for additional purposes
besides WSGI), and they should be allowed to use them.  You don't want
to try to fix things out of scope of this work.

> Because 'file' has a 'fileno' attribute, 'isinstance(f,file)' implies 
> 'hasattr(f,"fileno")'.  Therefore, the latter is the preferred behavior 
> here, because it doesn't unnecessarily exclude other valid wrappers of file 
> descriptors.

I'm not familiar with all the ins and outs of files on Python and
Jython and IronPython, so I'll just say, reasonable enough.  Though
I'd prefer to say, a file-like object (whatever that means).

> These restrictions are intended to simplify servers and middleware; nobody 
> has yet presented an example of a scenario where this imposed any practical 
> limitation.

Here's a scenario for you: I want to return a valid HTTP header that
your WSGI layer doesn't allow!  For example, accented Latin-1
characters, which are valid in the Reason-Phrase.  Or for another
example, a multi-line header value, which I actually use quite a bit,
and which is perfectly valid in HTTP, and which your prohibition on
control characters in header values breaks.

> The fallback position would be that the status string and headers must not 
> be CR or CRLF terminated.

The fallback position would be fine.

> Are you aware of any 
> applications that currently fold their headers, or transmit ISO-8859-1 
> characters without using the encoding prescribed by RFC 2047?  Is there a 
> practical use case for either one?

Whether or not our limited group currently knows of such a case is
immaterial.  This is an overly restrictive limitation with nothing,
I'm afraid, but religion for its justification.  Aside from clueless
implementors (against which the gods themselves strive in vain), why
would allowing any valid header value be a problem?

> * In order to ensure safe interpretation, smart middleware and server 
> developers will have to write routines to *unfold* potentially-folded 
> headers; why not just disallow folding to begin with?

Because it's allowed in the HTTP spec, and this is a general-purpose
HTTP framework layer.

> How about "must provide the *option*" and "must be enabled by default"? Or, 
> leave it as is, but add something like, "may provide the user with the 
> option of suppressing this output, so that users who cannot fix a broken 
> application are not forced to bear the pain of its error."

That's fine with me.

> >6.  The "write()" callable is important; it should not be deprecated
> >or in some other way made a poor stepchild of the iterable.
> 
> But it *is* one.  The presence of the 'write()' facility significantly 
> increases the implementation complexity for middleware and server 
> authors.  If it weren't necessary to support existing streaming APIs, it 
> wouldn't exist.

But supporting streaming APIs is an important consideration, from the
point of view of authors actually writing code against a framework.
It should be a peer methodology (or completely removed).

Again, WSGI is a very general mechanism, which should provide
mechanism, not enforce policy.  That's the only way to get it widely
accepted in all the server and framework projects.  If you don't like
the streaming model, write editorials about it, but don't try to
cripple other people's software.

> However, the language should perhaps be clarified to be explicit about this 
> point

Yes.

> and to address what happens if code *within* the iterator calls 
> 'write()'.  (I don't think it should be allowed to, but I'm open to 
> arguments either way.)

Good point.  I tend to agree with you here.

> This seems at odds with your previous desire to use RFC 2616, which is 
> pretty clear that it's ISO-8859-1 or RFC 2047.  PEP 333 goes further and 
> says, it's ASCII, dammit, and use MIME header encodings (RFC 2047) if you 
> need to do something special, because God help you if you're trying to mess 
> with non-ASCII in HTTP headers and you don't know how to deal with that stuff.

My problem here is not with PEP 333, but with Python strings in
general.  The only string type which carries an associated charset tag
is Unicode.  The byte strings are *some* string encoded in *some*
character set encoding, but no one knows which encoding, for any given
byte string.  I meant to say that the characters used should be
restricted to those specified in RFC 2616, but those characters should
be passed in Unicode strings, so that we can safely apply the
.encode() method to them.  But simply specifying that the byte strings
conform to RFC 2616 would be OK with me.  As I say, with the current
Python, our options are limited.

> Glad there was something you liked.  ;)  (j/k)

Hey, there was lots I liked!  Most of my suggestions were about
removing restrictions on areas outside of WSGI, I think.

> I rather like this, although I don't at all see how FTP gets into 
> this.  What the heck would CGI variables for FTP look like, I 
> wonder?  Anyway, it's handy for "http" and "https" at the very least.  I'd 
> prefer "wsgi.url_scheme" for the name, though, as it's otherwise a somewhat 
> ambiguous name.

Sure, that's fine with me.  As for "ftp", I was thinking of Medusa,
which supports serving a number of protocols with the same framework.

Bill


From janssen at parc.com  Thu Sep  2 21:52:46 2004
From: janssen at parc.com (Bill Janssen)
Date: Thu Sep  2 21:53:43 2004
Subject: 2 cents on file objects... WAS: RE: [Web-SIG] Bill's comments on
	WSGI draft 1.4 
In-Reply-To: Your message of "Thu, 02 Sep 2004 07:33:26 PDT."
	<41372F36.2020806@xhaus.com> 
Message-ID: <04Sep2.125247pdt."58612"@synergy1.parc.xerox.com>

> 1. The write callable is only there to support "push" applications, 
> where the application generates output and then pushes it through a 
> channel set-up by the server/framework, thus relegating the framework to 
> a kind of dumb switchboard. This sort of design is usually used in 
> threaded servers, which can present scalability problems.

It's also heavily used in CGI scripts.

Bill
From janssen at parc.com  Thu Sep  2 21:55:38 2004
From: janssen at parc.com (Bill Janssen)
Date: Thu Sep  2 21:56:49 2004
Subject: [Web-SIG] Status code, status header 
In-Reply-To: Your message of "Thu, 02 Sep 2004 09:19:11 PDT."
	<413747FF.4030803@colorstudy.com> 
Message-ID: <04Sep2.125542pdt."58612"@synergy1.parc.xerox.com>

> I would say that start_response(http.METHOD_NOT_ALLOWED, headers) is 
> more readable.  Or:
>    start_response(405, headers) # method not allowed
> is just as readable.  "Method not allowed" is just a comment, it isn't 
> program information.  Why propagate a comment through the system? 

While I tend to prefer the integer codes, I could just point out that

      http.METHOD_NOT_ALLOWED

could map to "405 Method Not Allowed" as easily as to 405.

Bill
From ianb at colorstudy.com  Thu Sep  2 22:38:31 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Sep  2 22:39:08 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com>
References: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com>
Message-ID: <413784C7.9040708@colorstudy.com>

Bill Janssen wrote:
> I think we need some terminology that I don't remember seeing.  There
> are two sides to WSGI, the server side, which I'll call the "socket",
> and the framework side, which I'll call the "plug".  If there are
> other terms already in use, please let me know.

Generally we're using the terms "server" and "application".  And 
"middleware" is both a server and application.

> Let me ask first, has anyone written a "socket" layer for Medusa?
> 
> 
>>>1.  The "environ" parameter must be a Python dict: I think subclasses
>>>should be allowed.
>>
>>[...various reasons why this might be a bad idea are introducted...]
>>These are "practicality beats purity" argument, so I need to see some 
>>*practical* applications of dictionary subclasses that would be useful 
>>enough to outweigh both of the above issues.
> 
> 
> Phillip, these are good engineering reasons for socket developers not
> to use subclasses, but that restriction doesn't belong in WSGI.  They
> may have other reasons for using subclasses that we haven't thought of
> (perhaps because they're using these dicts for additional purposes
> besides WSGI), and they should be allowed to use them.  You don't want
> to try to fix things out of scope of this work.

The restriction is kind of there for the benefit of middleware, so that 
middleware can rewrite the environment without having to worry about 
losing anything (except parts it explicitly leaves out).  By requiring 
it to be a dictionary, you can be sure that there are no side effects, 
no unusual requirements, it's consistent, and you can recreate a 
completely equivalent object.  It means the environment is required to 
be a dumb container.

The restriction that isinstance(environ, dict) be true isn't much of a 
requirement at all, because subclasses of dictionaries can override 
pretty much everything they care to.  If isinstance was the only 
requirement, it might as well be required that the environment has a 
dictionary interface.

>>These restrictions are intended to simplify servers and middleware; nobody 
>>has yet presented an example of a scenario where this imposed any practical 
>>limitation.
> 
> 
> Here's a scenario for you: I want to return a valid HTTP header that
> your WSGI layer doesn't allow!  For example, accented Latin-1
> characters, which are valid in the Reason-Phrase.  Or for another
> example, a multi-line header value, which I actually use quite a bit,
> and which is perfectly valid in HTTP, and which your prohibition on
> control characters in header values breaks.

Is an accented Latin-1 character a control character?  I would have 
though a control character meant a character with a code less than 32.

>>Are you aware of any 
>>applications that currently fold their headers, or transmit ISO-8859-1 
>>characters without using the encoding prescribed by RFC 2047?  Is there a 
>>practical use case for either one?
> 
> 
> Whether or not our limited group currently knows of such a case is
> immaterial.  This is an overly restrictive limitation with nothing,
> I'm afraid, but religion for its justification.  Aside from clueless
> implementors (against which the gods themselves strive in vain), why
> would allowing any valid header value be a problem?

Because it requires more work to parse and manipulate a more permissive 
standard.  You have to worry about corner cases.

>>* In order to ensure safe interpretation, smart middleware and server 
>>developers will have to write routines to *unfold* potentially-folded 
>>headers; why not just disallow folding to begin with?
> 
> 
> Because it's allowed in the HTTP spec, and this is a general-purpose
> HTTP framework layer.

But it doesn't *matter*.  And the HTTP spec very clearly *says* that it 
doesn't matter.  Folded headers are allowed, but they don't *add* any 
functionality.  So why allow it?  In those cases where you are 
interfacing with something that allows folded headers, they would have 
to be normalized; but most Python frameworks don't allow folded headers 
(at least intentionally).

I don't know if it would make a big difference if headers could be 
folded.  But there should be *some* use case for it if it were allowed.

>>>6.  The "write()" callable is important; it should not be deprecated
>>>or in some other way made a poor stepchild of the iterable.
>>
>>But it *is* one.  The presence of the 'write()' facility significantly 
>>increases the implementation complexity for middleware and server 
>>authors.  If it weren't necessary to support existing streaming APIs, it 
>>wouldn't exist.
> 
> 
> But supporting streaming APIs is an important consideration, from the
> point of view of authors actually writing code against a framework.
> It should be a peer methodology (or completely removed).

It effectively is a peer methodology.  It's part of the standard and it 
will work with any server; it's not optional.  The language Phillip 
wants to use is simply to encourage authors to prefer the iterable if 
that is an option.

> Again, WSGI is a very general mechanism, which should provide
> mechanism, not enforce policy.  That's the only way to get it widely
> accepted in all the server and framework projects.  If you don't like
> the streaming model, write editorials about it, but don't try to
> cripple other people's software.

There's no crippling, it is specifically allowed for.  It's not the 
primary interface that frameworks require, so Phillip wants to encourage 
those framework to use the iterable when they can.

For instance, in Webware the response object has a flush method.  When 
that is called, the accumulated response will have to be written out via 
the write method.  But in most cases a response is never flushed, it is 
cached completely until the request is over, and the whole page is sent 
at once.  The language is there to encourage someone to go to the extra 
length to return an iterable in the common case, instead of doing the 
easier thing and always using write.

Note that streaming can be implemented with the iterator interface. 
It's just a different streaming that wouldn't be compatible with all 
current frameworks.  If you aren't streaming then there's no real 
difference between the two, except that the iterator gives the server 
more leeway in implementation.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From fumanchu at amor.org  Thu Sep  2 22:36:32 2004
From: fumanchu at amor.org (Robert Brewer)
Date: Thu Sep  2 22:42:15 2004
Subject: [Web-SIG] Bill's comments on WSGI draft 1.4
Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022EB0@exchange.hqamor.amorhq.net>

Phillip J. Eby wrote:
> > I'd like to at least hear the rationale behind
> > favoring iterables so heavily over write().
> 
> One important reason: the server can suspend an iterable's execution 
> without tying up a thread.  It can therefore potentially use 
> a much smaller thread pool to handle a given number of connections,
> because the threads are only tied up while they're executing an
> iterator 'next()' call.
> 
> By contrast, 'write()' occurs *within* the application execution,
> so the only way to suspend execution is to suspend the thread (e.g. 
> waiting for a lock).

Hmm. I still don't get it--why would the server not simply "suspend
execution" of the framework within the write() call? In my naive
estimation, it would be the difference between:

for chunk in framework.data:
    output(chunk)
    do_out_of_band_stuff()

...and:

def write(chunk):
    output(chunk)
    do_out_of_band_stuff()

...and in fact, I see most existing servers having to do both when they
grow WSGI interfaces, since both are allowed in the WSGI spec (even if
one is deprecated). Maybe you could add a line or two of pseudocode to
help me understand...? (Assuming you're not fleeing for your life from
hurricanes, that is ;)

Stay safe,


Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org
From janssen at parc.com  Fri Sep  3 01:15:09 2004
From: janssen at parc.com (Bill Janssen)
Date: Fri Sep  3 01:15:34 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 
In-Reply-To: Your message of "Thu, 02 Sep 2004 13:38:31 PDT."
	<413784C7.9040708@colorstudy.com> 
Message-ID: <04Sep2.161513pdt."58612"@synergy1.parc.xerox.com>

> The restriction that isinstance(environ, dict) be true isn't much of a 
> requirement at all, because subclasses of dictionaries can override 
> pretty much everything they care to.  If isinstance was the only 
> requirement, it might as well be required that the environment has a 
> dictionary interface.

Except that "a dictionary interface" is very poorly defined, while the
isinstance check is very well defined.  But this is a small point; I
won't argue it further.

> > Here's a scenario for you: I want to return a valid HTTP header that
> > your WSGI layer doesn't allow!  For example, accented Latin-1
> > characters, which are valid in the Reason-Phrase.  Or for another
> > example, a multi-line header value, which I actually use quite a bit,
> > and which is perfectly valid in HTTP, and which your prohibition on
> > control characters in header values breaks.
> 
> Is an accented Latin-1 character a control character?  I would have 
> though a control character meant a character with a code less than 32.

You're right.  I was confusing the requirements on headers with the
"status" argument, which is unnecessarily restricted to ASCII.

> Because it requires more work to parse and manipulate a more permissive 
> standard.  You have to worry about corner cases.

How much more work?  Why is this restriction in particular a good one?

> There's no crippling, it [streaming] is specifically allowed for.  It's not the 
> primary interface that frameworks require, so Phillip wants to encourage 
> those framework to use the iterable when they can.

Why?  Why is an editorial opinion in the technology spec?  And, which
frameworks are you talking about?  Isn't this on the "server" or
"socket" side of things, rather than the "application" or "plug" or
"framework" side of things?

Bill


From andrew at andreweland.org  Fri Sep  3 11:44:50 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Fri Sep  3 11:55:52 2004
Subject: [Web-SIG] Integer status codes.
In-Reply-To: <4137553B.5000208@xhaus.com>
References: <4137553B.5000208@xhaus.com>
Message-ID: <41383D12.6030601@andreweland.org>

Alan Kennedy wrote:

> But I suppose that since WSGI has no classes to hang such constants on, 
> it cannot use that tidy approach.

Maybe we could try to have the constants added to another module in the 
standard library. httplib would be an obvious choice.

   -- Andrew (http://www.andreweland.org)
From py-web-sig at xhaus.com  Fri Sep  3 14:07:12 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Fri Sep  3 14:02:30 2004
Subject: [Web-SIG] Iterators, generators and threads.
Message-ID: <41385E70.20507@xhaus.com>

Dear Sig,

With the focus on iterables in WSGI, I think we may need to put 
something into the WSGI spec about generators and threading.

As I'm sure you're all aware, generators are an excellent mechanism for 
generating content on demand: a perfect fit for memory efficient WSGI 
"pull" processing and for event driven servers.

However, generator-iterators are different from other iterables, in that 
they cannot be resumed/iterated  simultaneously from multiple threads 
(without external locking anyway).

Pep 255 is specific on the topic: "Restriction:  A generator cannot be 
resumed while it is actively running". Which effectively means that a 
generator cannot be used from multiple threads without some form of 
external synchronization/locking.

Offhand, I can't think of scenarios where a WSGI server or application 
would *need* to iterate over an iterable across multiple threads. But I 
can certainly think of multiple server architectures where the request 
and its related response will pass through multiple threads before 
completion. Whether or not it would make sense for such architectures to 
iterate an iterable from multiple threads: well, I don't know: is it 
possible some server designer might attempt something like this?

Which would probably work as long as the iterable is not a generator. 
But if it is: *boom*, the generator could be resumed simultaneously from 
multiple threads, thus resulting in a ValueError.

Perhaps we need to describe this problem in the PEP? Or are python 
programmers suppoed to be big and old enough to know these things?

I find myself wondering: is this a cpython specific thing? Does resuming 
a generator from multiple threads have any meaning?

Obviously, calling a standard function/method from different threads 
works because each thread gets an independent stack frame, i.e. local 
variables, etc. So if there is no (unsynchronized) shared state between 
the threads, everything will work fine.

Since a generator is a single resumable stack frame, resuming it 
multiple times simultaneously from multiple threads won't work, from an 
isolation point-of-view.

Or am I mis-understanding it? Is the restriction somehow related to the 
cpython's GIL?

Obviously, resuming general iterators from multiple threads is related. 
Pep 234 makes no statements about threads (well, one unrelated reference 
to modifying dictionaries while they are being iterated). So I take this 
to mean that iterating iterables from multiple threads is acceptable.

Regards,

Alan.

P.S. I hope Phillip is OK. He said yesterday that he was right in the 
Frances path, although obviously that path will have a significant 
margin for error. But Frances is *huge*: see this stunning picture from 
NASA.

http://antwrp.gsfc.nasa.gov/apod/ap040903.html


From janssen at parc.com  Fri Sep  3 21:34:15 2004
From: janssen at parc.com (Bill Janssen)
Date: Fri Sep  3 21:34:41 2004
Subject: [Web-SIG] Integer status codes. 
In-Reply-To: Your message of "Fri, 03 Sep 2004 02:44:50 PDT."
	<41383D12.6030601@andreweland.org> 
Message-ID: <04Sep3.123418pdt."58612"@synergy1.parc.xerox.com>

I think submitting a bug report ("httplib doesn't define constants for
standard HTTP status messages"), plus a patch, would probably get it
done.

Bill

> Alan Kennedy wrote:
> 
> > But I suppose that since WSGI has no classes to hang such constants on, 
> > it cannot use that tidy approach.
> 
> Maybe we could try to have the constants added to another module in the 
> standard library. httplib would be an obvious choice.
> 
>    -- Andrew (http://www.andreweland.org)
From jjl at pobox.com  Sat Sep  4 18:38:45 2004
From: jjl at pobox.com (John J Lee)
Date: Sat Sep  4 18:38:18 2004
Subject: [Web-SIG] Integer status codes. 
In-Reply-To: <04Sep3.123418pdt."58612"@synergy1.parc.xerox.com>
References: <04Sep3.123418pdt."58612"@synergy1.parc.xerox.com>
Message-ID: <Pine.LNX.4.58.0409041737410.2704@alice>

[Andrew]
> > Maybe we could try to have the constants added to another module in the 
> > standard library. httplib would be an obvious choice.

[Bill Janssen]
> I think submitting a bug report ("httplib doesn't define constants for
> standard HTTP status messages"), plus a patch, would probably get it
> done.

+1


John
From py-web-sig at xhaus.com  Sun Sep  5 23:56:10 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Sun Sep  5 23:51:16 2004
Subject: [Web-SIG] Standardised configuration and temporary directories.
Message-ID: <413B8B7A.4090401@xhaus.com>

Dear Sig,

While thinking about writing middleware, two issues occurred to me that 
may need to be addressed in the WSGI spec.

1. Temporary storage/scratch directory.

It is common in servers and frameworks to provide a particular location 
  for applications to store temporary files, etc: a temporary directory. 
This prevents applications from picking their own temporary directories, 
which provides platform independence, security and isolation.

I think that this is a such a common thing that may be worth requiring a 
WSGI environment variable for it, e.g. environ['wsgi.temp_dir']

I realise that this could be considered a server specific thing, but 
server-specific variables mean lack of portability. Perhaps some 
containers will not be able to provide the temporary area: in that case 
it is better for the application or middleware to check for 
environ['wsgi.temp_dir'] == None than to check for perhaps a dozen or 
more possible server variables.

2. Standardised parameter configuration and specification.

When I am plugging middleware into a server, it often has need of its 
own configuration. For example, session handling middleware may need to 
retrieve the name of file system directory to persist session files 
into, or connection details for an RDBMS, etc.

Obviously, such configuration values need to be configured somewhere.

1. It could be done in the middleware source file itself, e.g. in global 
variables. However, I really don't like this, since it would mean 
changing source files, instead of leaving a standard versioned 
distribution untouched and read-only.

2. The session middleware could have its own configuration mechanism. It 
would define a standard way for it, and it alone, to be configured, e.g. 
it names the location of its configuration file. I think that this also 
is problematic, primarily becuase lots of different middleware authors 
will pick lots of different ways of configuring their stuff, leading to 
platform-specific errors, need for debugging, code rewriting, etc. And I 
think that the purpose of WSGI is to help prevent this kind of wheel 
re-invention.

A more promising place to put it is in the WSGI environment. The next 
two methods are different ways of doing that.

3. It could perhaps be set by another middleware component that is prior 
to the session handler in the middleware stack: some form of general 
configuration component for example. I like this more than the above 
options, because it concentrates configuration into one place.

Or rather two places, because there is also the server specific 
configuration file, whose contents actually configure how the server 
drives the request through the middleware stack. In my case, that is a 
Tomcat server.xml file, where I have several parameters which configure 
my wsgi servlet.

4. It could be configured in the server configuration file, e.g. the 
Tomcat server.xml with modjy, the Apache httpd.conf with mod_python, 
environment variables with CGI, etc, etc. I like this one the most 
because it means that there is only one configuration environment to manage.

So, as an example, let's say my session middleware is looking for the 
following variables

my_fancy_sessions.cookies
my_fancy_sessions.storage_dir

Ideally, it would be nice to be able to have a standardised way of 
specifying these variables in a centralised location. Why? Because when 
the middleware authors are writing documentation for their module, they 
could write something like

"""
Make sure to set values for the following WSGI variables, in whatever 
way is appropriate for your chosen WSGI server.

my_fancy_sessions.cookies = True

my_fancy_sessions.storage_dir = '/var/modjy/session_dir'

"""

So, if I was configuring it to run under modjy, my servlet description 
would look something like this

   <servlet>
     <servlet-name>modjy</servlet-name>
     <servlet-class>com.xhaus.wsgi.Modjy</servlet-class>
     <init-param>
       <param-name>python.home</param-name>
       <param-value>C:/jython21</param-value>
     </init-param>
     <init-param>
       <param-name>my_fancy_sessions.cookies</param-name>
       <param-value>True</param-value>
     </init-param>
     <init-param>
       <param-name>my_fancy_sessions.storage_dir</param-name>
       <param-value>/var/modjy/session_dir</param-value>
     </init-param>
   </servlet>

A CGI implementation could examine the contents of say a WSGI_ENVIRON 
os.environ variable, which might contain

"""
my_fancy_sessions.cookies = True
my_fancy_sessions.storage_dir = '/var/modjy/session_dir'
"""

Etc, etc.

I'm still not sure about having such a standard configuration mechanism, 
or how such a thing would be presented inside the WSGI environment. But 
it does seem to me to be an area that needs addressing.

Perhaps a simple solution would be to add wording like the following to 
the PEP:

"""
WSGI compliant servers must provide a simple mechanism for users to 
place name/value pairs in the WSGI environment, without modification or 
transformation. This is to make it easy for users to gather all 
middleware (i.e. server-independent) configuration under one centralized 
configuration mechanism.
"""

Or maybe I'm off base. Maybe session handling middleware is not the sort 
of thing that is meant to be universally portable?

Regards,

Alan.
From paul.boddie at ementor.no  Mon Sep  6 10:16:29 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep  6 10:16:37 2004
Subject: [Web-SIG] Standardised configuration and temporary directories.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net>

Alan Kennedy wrote:
> 
> While thinking about writing middleware, two issues occurred to me
that 
> may need to be addressed in the WSGI spec.
> 
> 1. Temporary storage/scratch directory.

I've been thinking about this at the level above frameworks, and I do
wonder
how far up in the applications stack this information would remain
useful.
If you consider something like Zope, I think the only place where this
kind
of thing is exposed to applications is in the machinery around file
uploads,
but I don't necessarily think you'd want applications directly
interfering
with such directories.

That said, for both applications and frameworks, it is interesting to
define
concepts such as shared and private storage, and at a low enough level I
can
imagine that things like temporary directories are relevant. (It is
almost
shocking to see what cgi.FieldStorage does with temporary files, I might
add.)

[...]

> 2. Standardised parameter configuration and specification.

As you've said, various frameworks provide mechanisms for specifying
parameters, yet this means that there isn't a single method of
administration for developers or users who don't care enough about those
frameworks to know how to deal with them all. I'm inclined to think that
better tools could be the answer here - if you have a simple
configuration
file reminiscent of Webware's .config files (which are Python modules
with
simple dictionaries or attributes) then different tools could produce
Apache
.conf files or Java Servlet web.xml files, for example.

[...]

> I'm still not sure about having such a standard configuration
mechanism, 
> or how such a thing would be presented inside the WSGI environment.
But 
> it does seem to me to be an area that needs addressing.

I've avoided this issue with WebStack so far, mostly because the
configuration done at the adapter level (the glue code between
frameworks
and WebStack applications/frameworks) mainly covers things like the
server
port number and other things that aren't particularly interesting at
higher
levels. Moreover, applications can often be configured through things
like
modules acting as configuration files, and such things are clearly
separate
from issues of framework configuration.

Paul
From py-web-sig at xhaus.com  Mon Sep  6 14:02:45 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Sep  6 13:57:51 2004
Subject: [Web-SIG] Standardised configuration and temporary directories.
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net>
References: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net>
Message-ID: <413C51E5.2090107@xhaus.com>

[Alan Kennedy]
 >>2. Standardised parameter configuration and specification.

[Paul Boddie]
 > As you've said, various frameworks provide mechanisms for specifying
 > parameters, yet this means that there isn't a single method of
 > administration for developers or users who don't care enough about
 > those frameworks to know how to deal with them all. I'm inclined to
 > think that better tools could be the answer here - if you have a
 > simple configuration file reminiscent of Webware's .config files
 > (which are Python modules with simple dictionaries or attributes)
 > then different tools could produce Apache .conf files or Java
 > Servlet web.xml files, for example.

Paul, thanks for taking the time to reply.

On thinking about the configuration issue further on the way into work, 
I've changed my mind :-)

The original two options I presented for configuration were

A: By a specialised middleware component.

B: In the server configuration file. (I will now call this the "platform 
configuration file").

I originally thought that option B was the best, but now I think 
differently. And from what I read from your post, Paul, I think we're in 
agreement.

Configuring the middleware stack is really the entire purpose of a 
python WSGI server. The platform in which the server and application 
reside, e.g. Apache, CGI, Tomcat, etc, should not be relevant. Instead, 
in an ideal scenario, the entire python application, i.e. server + 
middleware + configuration, should be portable to another platform(+WSGI 
layer).

If this is to be the case, then the middleware and its configuration 
would be best kept under centralised python control, which would 
facilitate maximum portability between platforms.

Conversely, as little as possible should be kept in the platform 
configuration file: ideally platforms should be the thinnest possible 
layer required to deliver WSGI requests to the python WSGI server.

Which leads to the question of how best to configure middleware, in the 
server configuration. Taking the example of the session handling 
middleware:-

1. The server configuration specifies the middleware stack to be 
constructed for responding to requests. Parameters for specific pieces 
of middleware could be specified as parameters to the constructors for 
each component. For example, configuring session handling could go like this

middleware_stack.append
   (
   my_fancy_session_handler
     (
     cookies=True, storage_dir='/var/session_dir'
     )
   )

2. Or there could be some standardised way for a server to specify 
config values to middleware components, e.g.

middleware_config['my_fancy_sessions.cookies'] = True
middleware_config['my_fancy_sessions.storage_dir'] = '/var/session_dir'

middleware_stack.append(my_fancy_session_handler())

And there's probably a few other different ways to do it as well.

Although I know we're firmly in the realm of server-specific 
configuration here, an area where WSGI may need to remain agnostic, it 
would be nice to standardise these configuration issues, in order to 
maximize portability of servers, middleware and configuration.

Regards,

Alan.
From paul.boddie at ementor.no  Mon Sep  6 14:14:19 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep  6 14:14:30 2004
Subject: [Web-SIG] Standardised configuration and temporary directories.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CAA5@100nooslmsg005.common.alpharoot.net>

Alan Kennedy wrote:
>
> On thinking about the configuration issue further on the way into
work, 
> I've changed my mind :-)
> 
> The original two options I presented for configuration were
> 
> A: By a specialised middleware component.
> 
> B: In the server configuration file. (I will now call this the
"platform 
> configuration file").
> 
> I originally thought that option B was the best, but now I think 
> differently. And from what I read from your post, Paul, I think we're
in 
> agreement.

Are we? ;-) Certain things like sessions are most likely to be
configured in
the server environment. In Tomcat, for example, that would be in one of
the
XML configuration files, but for something like Apache/mod_python it
would
be nicest to use httpd.conf or a related file, and Webware and Zope
store
sessions in their own particular way - note that Zope uses its own
special
mechanisms which might not correspond exactly with the conceptual model
you
envisage.

> Configuring the middleware stack is really the entire purpose of a 
> python WSGI server. The platform in which the server and application 
> reside, e.g. Apache, CGI, Tomcat, etc, should not be relevant.
Instead, 
> in an ideal scenario, the entire python application, i.e. server + 
> middleware + configuration, should be portable to another
platform(+WSGI 
> layer).

That's what WebStack is about: the same code runs on the seven supported
frameworks without any changes. Currently, the only server configuration
required consists of the following kinds of activities:

  * Add directives to Apache's httpd.conf (for anything using Apache).
  * Add context definitions to Webware's configuration.
  * Prepare a .war file for a Java servlet container.
  * Add a product to a Zope 2 instance.

Some interaction with the server configuration is clearly going to be
necessary.

> If this is to be the case, then the middleware and its configuration 
> would be best kept under centralised python control, which would 
> facilitate maximum portability between platforms.

I think what we agree on is that much of an application's configuration
can
be done at a fairly high level. An application which stores stuff in the
filesystem or which uses a database system doesn't necessarily need to
have
that kind of configuration entered into web.xml or httpd.conf, and it
should
be possible to keep that configuration portable, although I can imagine
complications with things like Tomcat which define JDBC connections in
the
XML configuration files.

Paul
From py-web-sig at xhaus.com  Mon Sep  6 14:46:13 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Sep  6 14:41:19 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <413B8B7A.4090401@xhaus.com>
References: <413B8B7A.4090401@xhaus.com>
Message-ID: <413C5C15.6030003@xhaus.com>

[Alan Kennedy]
 >>1. Temporary storage/scratch directory.

On thinking further about the temp directory issue, I see now that it is 
but one example of a class of problems relating to accessing physical 
resources on the local machine.

The other main one that springs to mind is how WSGI applications 
discover the file-system path name that corresponds to an URI.

CGI defines a "PATH_TRANSLATED" variable for this purpose, but 
"PATH_TRANSLATED" is a poor solution to the problem, IMHO. In order to 
explain what I mean, I'm going to go through an example.

Say I have an Apache installation, running CGI scripts. Assume that my 
cgi-bin directory is at the root level of my document root, so my 
document root looks like this (I'm using DOS path names, to illustrate a 
point)

DOCUMENT_ROOT = "c:\\htdocs\\"
CGI_BIN       = "c:\\htdocs\\cgi-bin\\"

Now, say I receive a request for the following URI

http://localhost/cgi-bin/my_application.py/images/stars.jpg

The CGI variables for this request would be set as follows:-

SCRIPT_NAME     = ""
PATH_INFO       = "/images/stars.jpg"
PATH_TRANSLATED = "c:\\htdocs\\images\\stars.jpg"

And I want to introduce another variable, giving the path to the actual 
script

CONTEXT_PATH    = "c:\\htdocs\\cgi-bin\\my_application.py"

There are a few points to make here

1. The contents of the PATH_TRANSLATED variable are not necessarily what 
I want. The standard definition for PATH_TRANSLATED is

PATH_TRANSLATED = DOCUMENT_ROOT + PATH_INFO, i.e.
PATH_TRANSLATED = 'c:\\htdocs\\' + '/images/stars.jpg', i.e.
PATH_TRANSLATED = 'c:\\htdocs\\images\\stars.jpg'

But what happens if I really want the path translated to a point 
relative to my cgi script, for example, not relative to the document 
root, i.e. what I really want is

PATH_TRANSLATED = CONTEXT_PATH + PATH_INFO, i.e.
PATH_TRANSLATED = 'c:\\htdocs\\cgi-bin\\application.py' + \
     '/images/stars.jpg', i.e.
PATH_TRANSLATED = 'c:\\htdocs\\cgi-bin\\images\\stars.jpg'

2. Because of the platform (i.e. windoze, *nix) specific path names 
returned for PATH_TRANSLATED, it is a hassle to write path manipulation 
functions which will reliably deliver the final path name that I am 
seeking. I could take the content of the PATH_TRANSLATED variable, 
subtract PATH_INFO from it again (being careful to deal correctly with 
"\" vs. "/"), and then work out my own path to the physical resource.

But this is just going to cause all kinds of portability problems.

Therefore I propose that WSGI somehow attempt to standardise access to 
local resources on the disk. This could be done, perhaps, by providing a 
function which resolves a logical URI to a physical resource. J2EE has 
just such a function (surprise ;-), called ServletContext.getRealPath(), 
which returns a file-system path name which is relative to the 
CONTEXT_PATH mentioned above.

Without WSGI providing such local mapping functions, I don't see how 
WSGI applications/middleware can map URIs to files, without undertaking 
platform specific tricks.

I know it might look like I'm trying to drag WSGI into being more 
container-oriented, more like J2EE for example. But I think the above 
issues are sufficiently commonplace/universal that it is worth dealing 
with them in a standardised way.

Regards,

Alan.
From brsizer at kylotan.eidosnet.co.uk  Mon Sep  6 15:23:18 2004
From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer)
Date: Mon Sep  6 15:21:01 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <413C5C15.6030003@xhaus.com>
References: <413B8B7A.4090401@xhaus.com> <413C5C15.6030003@xhaus.com>
Message-ID: <413C64C6.2020408@kylotan.eidosnet.co.uk>

Alan Kennedy wrote:
> The other main one that springs to mind is how WSGI applications 
> discover the file-system path name that corresponds to an URI.

I thought that one of the major features of most of these Python web
frameworks is that a URI doesn't map to a file but to an object or a
function, several of which might be in one physical file. Since WSGI
seems to be promoted as a minimal system that applies equally to almost
any system, I'd think that such a mapping falls entirely out of its scope.

I agree that it might be useful to have this functionality. I think a
standard way to map URIs to Python files would be beneficial for Python
web development. I just don't see it fitting into what people here have
told me about WSGI.

-- 
Ben Sizer.


From pje at telecommunity.com  Mon Sep  6 15:23:59 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 15:23:19 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 
In-Reply-To: <04Sep2.124712pdt."58612"@synergy1.parc.xerox.com>
References: <Your message of "Wed, 01 Sep 2004 20:25:56 PDT."
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>

[skipping stuff that Ian answered]

At 12:47 PM 9/2/04 -0700, Bill Janssen wrote:

>I'm not familiar with all the ins and outs of files on Python and
>Jython and IronPython, so I'll just say, reasonable enough.  Though
>I'd prefer to say, a file-like object (whatever that means).

File-like is out of scope; there were only ever two kinds of objects 
intended to be returnable:

1) Iterables (the initial scope)

2) Objects that map to an operating system file descriptor, as an optional 
special case to increase performance (added later per user request)

I think that perhaps because files (under 2.2+ at least) meet *both* of 
these criteria, some folks have construed this to mean that we really 
should allow any file-like object, when "file-like" never had anything to 
do with anything.  It's a total red herring that has nothing to do with the 
spec's intent.

I will add something to the Q&A section about this.


> > These restrictions are intended to simplify servers and middleware; nobody
> > has yet presented an example of a scenario where this imposed any 
> practical
> > limitation.
>
>Here's a scenario for you: I want to return a valid HTTP header that
>your WSGI layer doesn't allow!  For example, accented Latin-1
>characters, which are valid in the Reason-Phrase.

Technically, you could use the MIME header encoding support to put them in, 
encoded in 7-bit ASCII, as is allowed by RFC 2616.

OTOH, I could see allowing 8-bit strings in ISO-8859-1 encoding as per RFC 
2616, and don't see significant practical problems in doing so.


>   Or for another
>example, a multi-line header value, which I actually use quite a bit,
>and which is perfectly valid in HTTP, and which your prohibition on
>control characters in header values breaks.
>
> > The fallback position would be that the status string and headers must not
> > be CR or CRLF terminated.
>
>The fallback position would be fine.

I'm currently still strongly -1 on allowing folding; the only thing that's 
going to budge me is use cases.  I only accept "on general principle" 
arguments when they *simplify* compliance and make the spec more robust, 
not when they make compliance more difficult.

Header folding adds repetitive boilerplate processing to all middleware 
that processes headers: boilerplate that can and will be written 
incorrectly or sometimes omitted because somebody forgot that headers are 
allowed to be folded.  Before too long, the practical advice to WSGI 
application authors will be, "don't fold headers because it breaks a lot of 
middleware", and we'll be right back where we could've been in the first 
place if we just banned folding from the get-go.  Meanwhile, the people who 
will have paid the price for this is all the conscientious implementors who 
tried to write code that would work properly with header folding.

From py-web-sig at xhaus.com  Mon Sep  6 15:30:00 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Sep  6 15:25:16 2004
Subject: [Web-SIG] Standardised configuration.
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CAA5@100nooslmsg005.common.alpharoot.net>
References: <89DE0F3E9781C048A14DC88C06D9F93D18CAA5@100nooslmsg005.common.alpharoot.net>
Message-ID: <413C6658.6070007@xhaus.com>

[Alan Kennedy]
 >> I originally thought that option B was the best, but now I think
 >> differently. And from what I read from your post, Paul, I think
 >> we're in agreement.

[Paul Boddie]
 > Are we? ;-) Certain things like sessions are most likely to be
 > configured in the server environment. In Tomcat, for example, that
 > would be in one of the XML configuration files, but for something
 > like Apache/mod_python it would be nicest to use httpd.conf or a
 > related file, and Webware and Zope store sessions in their own
 > particular way - note that Zope uses its own special mechanisms
 > which might not correspond exactly with the conceptual model you
 > envisage.

Ah, now we're getting somewhere.

I think that session handling is an excellent example against which to 
have this discussion. Note however that I am *not* advocating 
standardising session management under WSGI.

J2EE session handling is generally a huge PITA, primarily because the 
base unit of session management is the servlet context, i.e. every 
servlet context gets its own "session space". For example

'/forms' may map to one session space, while
'/news' may map to a different session space.

Any given user may have multiple sessions on a server, depending on the 
number of servlets they have interacted with. It is generally not 
possible, except using container specific methods, to have a single 
"uber-session" which concentrates all user session data into a single 
object. This "hierarchy problem" makes it difficult, and extremely 
container-specific, to manage a single set of users across a set of J2EE 
servlets.

Most J2EE containers support both cookies and URL rewriting for session 
management, i.e. if the user-agent has cookies disabled, then all urls 
are rewritten to contain sessions IDs. Which means that the url 
rewriting algorithm has to be aware of multiple servlet contexts, and 
rewrite local urls to contain the session ID which is specific to the 
target context/servlet.

Some J2EE containers support a "Single Sign On" facility, where the 
container manages the multiple session objects on the applications 
behalf, and makes it easy for the user (but not the programer) by only 
making them sign on to a server once. Tomcat does this using an extra 
cookie, the SSO cookie, which is transmitted to user-agents *as well as* 
the per-servlet cookie, i.e. the user-agent receives two cookies from 
the container. Worse, the Tomcat Single-Sign-On facility does not 
support URL rewriting: the user-agent *must* have cookies enabled in 
order for single sign on to work. Which sucks.

I think that if WSGI applications were to rely on the local 
platform/container session management facilities, it is extremely 
unlikely that they would be portable. It's difficult enough to get 
coherent cross-servlet session-handling working on J2EE when writing in 
java, as these pages show

http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/host.html#Single%20Sign%20On
http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/valve.html#Single%20Sign%20On%20Valve
http://www.fwd.at/tomcat/sharing-session-data-howto.html

Imagine the complications if the application code were originally 
written to work with say, WebWare under cpython?

To me, session handling is one of those things that is done in so many 
different ways by so many different platforms/containers that it is 
impractical to achieve application portability once a particular 
methodology has been chosen.

So, IMHO, session handling is one of those "should be simple" areas of 
web programming that gets horrifically complicated when trying to move 
applications between platforms/containers: in fact I'd go so far as to 
say the multiple session handling techniques is one of the primary 
reasons why the python web world is currently so fragmented: every 
framework author thinks they know best: although some do it much better 
than others. I like webwares method of using URL path parameters, with a 
auto-refresh if a request is received that doesn't contain a session ID. 
But IIRC, this method is quite Apache specific, and requires 
modification of the Apache httpd.conf to get working. I could be wrong 
though.

It is important to note that I am *not* advocating standardising session 
management under WSGI: far from it. But what I am advocating is trying 
to make it as easy as possible for session-handling middleware 
components to be as portable between WSGI servers as possible. WSGI, as 
it currently stands, makes it far easier to do this than any other 
approach: I'm just trying to foresee and eliminate the last few % that 
stands in the way of 100% portability.

Regards,

Alan.
From paul.boddie at ementor.no  Mon Sep  6 15:32:39 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep  6 15:32:42 2004
Subject: [Web-SIG] Standardising containment.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CAF4@100nooslmsg005.common.alpharoot.net>

Ben Sizer wrote:
> 
> Alan Kennedy wrote:
> > The other main one that springs to mind is how WSGI applications 
> > discover the file-system path name that corresponds to an URI.
> 
> I thought that one of the major features of most of these Python web
> frameworks is that a URI doesn't map to a file but to an object or a
> function, several of which might be in one physical file. Since WSGI
> seems to be promoted as a minimal system that applies equally to
almost
> any system, I'd think that such a mapping falls entirely out of its
scope.

It probably does for WSGI, although I wonder how such issues (and the
many
others out there) can be simultaneously avoided and yet anticipated by
the
specification in order to avoid incompatibilities later on.

> I agree that it might be useful to have this functionality. I think a
> standard way to map URIs to Python files would be beneficial for
Python
> web development. I just don't see it fitting into what people here
have
> told me about WSGI.

I suppose that Alan is moving slowly up the stack. It's an interesting
issue
that existing frameworks have addressed in their own ways (the
getRealPath
that Alan mentioned, Webware's getServerSidePath, and so on), and
although
one can wonder whether application data (which the image example could
almost be considered as being) should be configured within or with
reference
to the server environment or not, if you consider having to specify the
filenames of resources within an application, it's much nicer to be able
to
make those filenames relative to some deployment variable (eg. where the
application ends up when deployed) and to keep those resources bundled
with
the application than to have to manually configure the application to
use
absolute paths before/during/after deployment.

I hope that made sense. ;-)

Paul
From pje at telecommunity.com  Mon Sep  6 15:38:13 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 15:37:27 2004
Subject: [Web-SIG] Bill's comments on WSGI draft 1.4
In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022EB0@exchange.hqamor.amo
	rhq.net>
Message-ID: <5.1.1.6.0.20040906092441.02e8a610@mail.telecommunity.com>

At 01:36 PM 9/2/04 -0700, Robert Brewer wrote:
>Phillip J. Eby wrote:
> > > I'd like to at least hear the rationale behind
> > > favoring iterables so heavily over write().
> >
> > One important reason: the server can suspend an iterable's execution
> > without tying up a thread.  It can therefore potentially use
> > a much smaller thread pool to handle a given number of connections,
> > because the threads are only tied up while they're executing an
> > iterator 'next()' call.
> >
> > By contrast, 'write()' occurs *within* the application execution,
> > so the only way to suspend execution is to suspend the thread (e.g.
> > waiting for a lock).
>
>Hmm. I still don't get it--why would the server not simply "suspend
>execution" of the framework within the write() call? In my naive
>estimation, it would be the difference between:
>
>for chunk in framework.data:
>     output(chunk)
>     do_out_of_band_stuff()
>
>..and:
>
>def write(chunk):
>     output(chunk)
>     do_out_of_band_stuff()

Because now you've moved the server code into the application thread; many 
Python web servers (pretty much all of the async ones including Medusa, 
Twisted, and ZServer) have a single thread for all I/O operations, distinct 
from the threads that run application requests.

So, if you want to perform I/O from an app thread, you need lock 
synchronization code that didn't exist before...  and the design rapidly 
becomes more complicated.

Anyway, such servers' write() methods will probably look more like:

    def write(self,data):
        self.output_queue.put(data)

and they'll then return to the caller.  However, this has new issues of its 
own: specifically, if a program transmits a large file, it will consume 
lots of memory if it produces data faster than the client can accept 
it.  (Because the output queue will back up.)

Of course, one can throttle the output queue to some set maximum size, but 
then you end up right where I began this discussion: the application thread 
has to hang, tying up that thread's availability until the app's execution 
is complete, and thus reducing the concurrent request throughput of the server.

If, however, the application is structured as an iterable, these problems 
all go away.  Application threads are only tied up for computation, not 
waiting for I/O, output a client isn't going to receive is never produced, 
large memory buffers aren't needed, and so on.

So, on purely technical grounds, the iterable approach is immensely 
superior; it should be used wherever practical to do so.


>..and in fact, I see most existing servers having to do both when they
>grow WSGI interfaces, since both are allowed in the WSGI spec (even if
>one is deprecated).

Yes, servers will have to support both; but it should be understood that 
for many important servers (especially ones written in Python) that 
applications using 'write()' may have detrimental effects on the server's 
overall throughput, even if the application seems to run quite well on say, 
a local connection to an unloaded server.

So, that's why people should be discouraged from using 'write()' outside of 
necessity.


>Maybe you could add a line or two of pseudocode to
>help me understand...? (Assuming you're not fleeing for your life from
>hurricanes, that is ;)

Hurricane's past me now; I just got power and 'net back this morning.

From pje at telecommunity.com  Mon Sep  6 15:43:24 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 15:42:39 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 
In-Reply-To: <04Sep2.161513pdt."58612"@synergy1.parc.xerox.com>
References: <Your message of "Thu,
	02 Sep 2004 13:38:31 PDT." <413784C7.9040708@colorstudy.com>
Message-ID: <5.1.1.6.0.20040906093835.02e804e0@mail.telecommunity.com>

At 04:15 PM 9/2/04 -0700, Bill Janssen wrote:
 >[Ian Bicking]
> > There's no crippling, it [streaming] is specifically allowed for.  It's 
> not the
> > primary interface that frameworks require, so Phillip wants to encourage
> > those framework to use the iterable when they can.
>
>Why?  Why is an editorial opinion in the technology spec?

Why do you think it's a technology spec?  I thought I was previously quite 
clear on this list that PEP 333 is "an attempt at market manipulation by 
social engineering mind control" (or something to that general effect), so 
that puts editorial opinion well within its scope, IMO.  :)


>  And, which
>frameworks are you talking about?  Isn't this on the "server" or
>"socket" side of things, rather than the "application" or "plug" or
>"framework" side of things?

Ian was speaking of application frameworks.  Specifically, we wish to 
discourage use of 'write()' because it's "bad citizenship" for an 
application to hog the thread it's running in.  Being iterable allows the 
server to control multitasking better, and thus improve the server's 
overall throughput.  While 'write()' has to be available to support legacy 
streaming API's, it's not at all efficient for the typical asynchronous web 
server written in Python.

From pje at telecommunity.com  Mon Sep  6 15:47:03 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 15:46:18 2004
Subject: [Web-SIG] Iterators, generators and threads.
In-Reply-To: <41385E70.20507@xhaus.com>
Message-ID: <5.1.1.6.0.20040906094347.02e8cec0@mail.telecommunity.com>

At 01:07 PM 9/3/04 +0100, Alan Kennedy wrote:
>Offhand, I can't think of scenarios where a WSGI server or application 
>would *need* to iterate over an iterable across multiple threads. But I 
>can certainly think of multiple server architectures where the request and 
>its related response will pass through multiple threads before completion. 
>Whether or not it would make sense for such architectures to iterate an 
>iterable from multiple threads: well, I don't know: is it possible some 
>server designer might attempt something like this?
>
>Which would probably work as long as the iterable is not a generator. But 
>if it is: *boom*, the generator could be resumed simultaneously from 
>multiple threads, thus resulting in a ValueError.

Generators don't actually add a new problem here.  Pretend we're talking 
about a list object instead.  If you were to "resume it simultaneously from 
multiple threads", what would happen?  Well, you'd send items twice, or out 
of order.  So, obviously, you can't iterate over *any* iterable returned by 
a WSGI app from multiple threads unless you serialize the access.

From paul.boddie at ementor.no  Mon Sep  6 15:48:37 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep  6 15:48:41 2004
Subject: [Web-SIG] Standardised configuration.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.common.alpharoot.net>

Alan Kennedy wrote:
> 
> I think that session handling is an excellent example against which to

> have this discussion. Note however that I am *not* advocating 
> standardising session management under WSGI.

There will be plenty of other places to standardise it, I'm sure. ;-)

> J2EE session handling is generally a huge PITA, primarily because the 
> base unit of session management is the servlet context, i.e. every 
> servlet context gets its own "session space". For example
> 
> '/forms' may map to one session space, while
> '/news' may map to a different session space.
> 
> Any given user may have multiple sessions on a server, depending on
the 
> number of servlets they have interacted with. It is generally not 
> possible, except using container specific methods, to have a single 
> "uber-session" which concentrates all user session data into a single 
> object. This "hierarchy problem" makes it difficult, and extremely 
> container-specific, to manage a single set of users across a set of
J2EE 
> servlets.

Session sharing sounds like a great idea, and I've seen some pretty
unfortunate workarounds to achieve such things, but then overreliance on
such mechanisms can be very restrictive if you change the "topology" of
your
system architecture (ie. relocate one application to another server).

> Most J2EE containers support both cookies and URL rewriting for
session 
> management, i.e. if the user-agent has cookies disabled, then all urls

> are rewritten to contain sessions IDs. Which means that the url 
> rewriting algorithm has to be aware of multiple servlet contexts, and 
> rewrite local urls to contain the session ID which is specific to the 
> target context/servlet.

This is a pretty nasty problem that WSGI and other technologies could do
relatively cleanly for once.

> Some J2EE containers support a "Single Sign On" facility, where the 
> container manages the multiple session objects on the applications 
> behalf, and makes it easy for the user (but not the programer) by only

> making them sign on to a server once. Tomcat does this using an extra 
> cookie, the SSO cookie, which is transmitted to user-agents *as well
as* 
> the per-servlet cookie, i.e. the user-agent receives two cookies from 
> the container. Worse, the Tomcat Single-Sign-On facility does not 
> support URL rewriting: the user-agent *must* have cookies enabled in 
> order for single sign on to work. Which sucks.

I guess you haven't seen other SSO solutions, then, or are too polite to
mention them. ;-)

> I think that if WSGI applications were to rely on the local 
> platform/container session management facilities, it is extremely 
> unlikely that they would be portable. It's difficult enough to get 
> coherent cross-servlet session-handling working on J2EE when writing
in 
> java, as these pages show
> 
>
http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/host.html#Single%
20Sign%20On
>
http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/valve.html#Single
%20Sign%20On%20Valve
> http://www.fwd.at/tomcat/sharing-session-data-howto.html
> 
> Imagine the complications if the application code were originally 
> written to work with say, WebWare under cpython?

Sharing sessions between completely different framework implementations
(eg.
Webware and mod_python) within some kind of WSGI infrastructure is going
to
be an extremely difficult thing to achieve, mostly because the session
store
implementations are probably not interoperable - I haven't checked, but
the
chances of interoperability are fairly low, I would think. My opinion is
that as soon as you're sharing session information, you're moving
towards
some kind of shared database situation, anyway.

> To me, session handling is one of those things that is done in so many

> different ways by so many different platforms/containers that it is 
> impractical to achieve application portability once a particular 
> methodology has been chosen.

You'll have to clarify that. I've been working on WebStack functionality
which at least allows applications to treat sessions in the same way,
and it
shouldn't be surprising that this is possible given the narrow range of
operations that most session implementations expose. Of course, were it
possible for an application running on Webware to suddenly, between HTTP
requests, find itself "migrated" to Twisted, it would be a bit much to
expect that application to find its sessions intact after the move.

> So, IMHO, session handling is one of those "should be simple" areas of

> web programming that gets horrifically complicated when trying to move

> applications between platforms/containers: in fact I'd go so far as to

> say the multiple session handling techniques is one of the primary 
> reasons why the python web world is currently so fragmented: every 
> framework author thinks they know best: although some do it much
better 
> than others. I like webwares method of using URL path parameters, with
a 
> auto-refresh if a request is received that doesn't contain a session
ID. 
> But IIRC, this method is quite Apache specific, and requires 
> modification of the Apache httpd.conf to get working. I could be wrong

> though.

Are you advocating a common session manager? I can see some major
benefits
with something like that.

Paul
From pje at telecommunity.com  Mon Sep  6 15:53:27 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 15:52:41 2004
Subject: [Web-SIG] Standardised configuration and temporary directories.
In-Reply-To: <413B8B7A.4090401@xhaus.com>
Message-ID: <5.1.1.6.0.20040906094818.02e8a4d0@mail.telecommunity.com>

At 10:56 PM 9/5/04 +0100, Alan Kennedy wrote:
>2. Standardised parameter configuration and specification.
>[snip]
>
>Perhaps a simple solution would be to add wording like the following to 
>the PEP:
>
>"""
>WSGI compliant servers must provide a simple mechanism for users to place 
>name/value pairs in the WSGI environment, without modification or 
>transformation. This is to make it easy for users to gather all middleware 
>(i.e. server-independent) configuration under one centralized 
>configuration mechanism.
>"""

I could go for something like this as a *should*, as long as it was 
explained that the simplest possible implementation is to simply include 
operating system environment variables in 'environ'.  (And at that point, 
your desire for temporary directory info might be met by simply using 
'environ["TMP"]'!)

From pje at telecommunity.com  Mon Sep  6 16:00:40 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 15:59:54 2004
Subject: [Web-SIG] Standardised configuration and temporary directories.
In-Reply-To: <413C51E5.2090107@xhaus.com>
References: <89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net>
	<89DE0F3E9781C048A14DC88C06D9F93D18C9E5@100nooslmsg005.common.alpharoot.net>
Message-ID: <5.1.1.6.0.20040906095537.02e8e020@mail.telecommunity.com>

At 01:02 PM 9/6/04 +0100, Alan Kennedy wrote:

>Configuring the middleware stack is really the entire purpose of a python 
>WSGI server. The platform in which the server and application reside, e.g. 
>Apache, CGI, Tomcat, etc, should not be relevant. Instead, in an ideal 
>scenario, the entire python application, i.e. server + middleware + 
>configuration, should be portable to another platform(+WSGI layer).
>
>If this is to be the case, then the middleware and its configuration would 
>be best kept under centralised python control, which would facilitate 
>maximum portability between platforms.

[snip]

This is starting to get into the area of portable deployment standards for 
WSGI developers, which is mostly out of scope for the current PEP.  I'd 
like to see us get some field experience in various ways to do it before we 
choose the "one obvious" way to do it.

That being said, don't let me stop y'all from discussing various ways to do 
it, because if nobody does that we'll never get to the "one way" 
part.  :)  I just don't expect the discussion to yield anything that would 
convince me to "bless" a single option before PEP 333 is finalized.

From pje at telecommunity.com  Mon Sep  6 16:11:02 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 16:10:18 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <413C5C15.6030003@xhaus.com>
References: <413B8B7A.4090401@xhaus.com>
 <413B8B7A.4090401@xhaus.com>
Message-ID: <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com>

At 01:46 PM 9/6/04 +0100, Alan Kennedy wrote:
>[Alan Kennedy]
> >>1. Temporary storage/scratch directory.
>
>On thinking further about the temp directory issue, I see now that it is 
>but one example of a class of problems relating to accessing physical 
>resources on the local machine.
>
>The other main one that springs to mind is how WSGI applications discover 
>the file-system path name that corresponds to an URI.

*boggle*  Why do you think that URIs have anything to do with file 
paths?  In the general case, they are entirely unrelated.


>[snip]
>Therefore I propose that WSGI somehow attempt to standardise access to 
>local resources on the disk. This could be done, perhaps, by providing a 
>function which resolves a logical URI to a physical resource. J2EE has 
>just such a function (surprise ;-), called ServletContext.getRealPath(), 
>which returns a file-system path name which is relative to the 
>CONTEXT_PATH mentioned above.
>
>Without WSGI providing such local mapping functions, I don't see how WSGI 
>applications/middleware can map URIs to files, without undertaking 
>platform specific tricks.

Well-written Python applications make this sort of thing part of their 
configuration today already, because in the general case (e.g. mod_rewrite) 
this stuff just plain isn't guessable.

Also, if you need access to local resources, relative to some Python 
module, just grab the '__file__' attribute/variable of that module, and 
then use 'os.path' functions to portably manipulate it.  E.g.:

     my_dir = os.path.dirname(__file__)
     target = os.path.join(os.path.join(my_dir,"images"),"stars.jpg")

This is simple and portable.  If you need something more complex, you 
should probably have configuration specific to the application that spells 
out what it needs to know.

From paul.boddie at ementor.no  Mon Sep  6 16:12:00 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep  6 16:12:04 2004
Subject: [Web-SIG] Standardised configuration and temporary directories.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB0F@100nooslmsg005.common.alpharoot.net>

Phillip J. Eby wrote:
> 
> This is starting to get into the area of portable deployment standards
for
> WSGI developers, which is mostly out of scope for the current PEP.
I'd
> like to see us get some field experience in various ways to do it
before
> we choose the "one obvious" way to do it.
> 
> That being said, don't let me stop y'all from discussing various ways
to
> do it, because if nobody does that we'll never get to the "one way"
part.
> :)  I just don't expect the discussion to yield anything that would
> convince me to "bless" a single option before PEP 333 is finalized.

Well, this is the Web-SIG mailing list (as opposed to the WSGI mailing
list), so there will hopefully be a bit more discussion, some
experimentation and eventually some results to point to by the time any
other PEPs get written. It has been a while since there was this much
focus
on Python Web standardisation on any mailing list.

Paul
From pje at telecommunity.com  Mon Sep  6 16:21:27 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 16:20:40 2004
Subject: [Web-SIG] Standardised configuration.
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.comm
	on.alpharoot.net>
Message-ID: <5.1.1.6.0.20040906101509.02e8f670@mail.telecommunity.com>

At 03:48 PM 9/6/04 +0200, Paul Boddie wrote:
>Alan Kennedy wrote:
> >
> > I think that session handling is an excellent example against which to
>
> > have this discussion. Note however that I am *not* advocating
> > standardising session management under WSGI.
>
>There will be plenty of other places to standardise it, I'm sure. ;-)
>
> > J2EE session handling is generally a huge PITA, primarily because the
> > base unit of session management is the servlet context, i.e. every
> > servlet context gets its own "session space". For example
> >
> > '/forms' may map to one session space, while
> > '/news' may map to a different session space.
> >
> > Any given user may have multiple sessions on a server, depending on
>the
> > number of servlets they have interacted with. It is generally not
> > possible, except using container specific methods, to have a single
> > "uber-session" which concentrates all user session data into a single
> > object. This "hierarchy problem" makes it difficult, and extremely
> > container-specific, to manage a single set of users across a set of
>J2EE
> > servlets.
>
>Session sharing sounds like a great idea, and I've seen some pretty
>unfortunate workarounds to achieve such things, but then overreliance on
>such mechanisms can be very restrictive if you change the "topology" of
>your
>system architecture (ie. relocate one application to another server).


Just to throw another thought in here, keep in mind that one could write a 
"cookie consolidator" WSGI component that would send its own 
session-management cookie to the client after removing application-sent 
cookies from the responses and saving them somewhere locally.  When a 
request comes in, the "cookie consolidator" would read its own cookie from 
HTTP_COOKIE, and then add the stored cookie data before passing it on to 
the application.  So, from the app's point of view, it's as if all the 
cookies are going to the client, but in reality there's only one, with the 
rest of the data stored server-side.

One could presumably also extend this cookie consolidator to manage other 
kinds of session keys as well, such as ones embedded in the URL.  Or, for 
that matter, you could write one that embeds its session key in the URL 
instead of in a cookie, but still makes it look to the application as if 
cookies are being used.

From david at sundayta.com  Mon Sep  6 16:22:44 2004
From: david at sundayta.com (David Warnock)
Date: Mon Sep  6 16:22:54 2004
Subject: [Web-SIG] Standardised configuration.
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.common.alpharoot.net>
References: <89DE0F3E9781C048A14DC88C06D9F93D18CB05@100nooslmsg005.common.alpharoot.net>
Message-ID: <413C72B4.3020507@sundayta.com>

Paul,

> You'll have to clarify that. I've been working on WebStack functionality
> which at least allows applications to treat sessions in the same way,
> and it
> shouldn't be surprising that this is possible given the narrow range of
> operations that most session implementations expose. Of course, were it
> possible for an application running on Webware to suddenly, between HTTP
> requests, find itself "migrated" to Twisted, it would be a bit much to
> expect that application to find its sessions intact after the move.

But I for 1 can certainly imagine an "application" consisting of 
multiple servers, so that parts of the "application/site" are webware, 
part twisted, part quixote. If all these supported wsgi and if there 
were a wsgi session add-on then surely this heads towards the possible, 
and that makes lots of things much easier to assemble/develop/extend.

Dave
From py-web-sig at xhaus.com  Mon Sep  6 16:38:26 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Sep  6 16:33:31 2004
Subject: [Web-SIG] Standardised configuration.
In-Reply-To: <5.1.1.6.0.20040906101509.02e8f670@mail.telecommunity.com>
References: <5.1.1.6.0.20040906101509.02e8f670@mail.telecommunity.com>
Message-ID: <413C7662.8010800@xhaus.com>

[Phillip J. Eby]
> Just to throw another thought in here, keep in mind that one could write 
> a "cookie consolidator" WSGI component that would send its own 
> session-management cookie to the client after removing application-sent 
> cookies from the responses and saving them somewhere locally.  When a 
> request comes in, the "cookie consolidator" would read its own cookie 
> from HTTP_COOKIE, and then add the stored cookie data before passing it 
> on to the application.  So, from the app's point of view, it's as if all 
> the cookies are going to the client, but in reality there's only one, 
> with the rest of the data stored server-side.
> 
> One could presumably also extend this cookie consolidator to manage 
> other kinds of session keys as well, such as ones embedded in the URL.  
> Or, for that matter, you could write one that embeds its session key in 
> the URL instead of in a cookie, but still makes it look to the 
> application as if cookies are being used.

That's an excellent idea, and could solve the problem of multiple 
session handling techniques very well, and in a portable manner.

However, it would only work for WSGI middleware components that are 
above that session component in the middleware stack.

If the application administrator had configured session management in 
the platform configuration file, e.g. tomcat server.xml, then that 
session management would be run *after* the entire WSGI middleware stack 
had completed.

But that's not a problem according to my view of things: avoiding 
platform managed sessions is the whole point.

Were I running a WSGI middleware stack inside Apache or Tomcat, I'd want 
to disable the "native" session handling completely, and instead take 
care of it entirely within the WSGI stack.

Regards,

Alan.

From paul.boddie at ementor.no  Mon Sep  6 16:33:58 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep  6 16:34:01 2004
Subject: [Web-SIG] Standardised configuration.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB17@100nooslmsg005.common.alpharoot.net>

David Warnock wrote:
> 
> But I for 1 can certainly imagine an "application" consisting of
multiple
> servers, so that parts of the "application/site" are webware, part
> twisted, part quixote. If all these supported wsgi and if there were a
> wsgi session add-on then surely this heads towards the possible, and
that
> makes lots of things much easier to assemble/develop/extend.

Yes, once you've discarded the Webware session mechanisms (or most
likely
swapped them out within Webware itself), and once you've done the same
with
Quixote and Twisted (or quite probably added sessions to Twisted unless
it
comes with session support these days), you could have a session manager
of
some kind under the applications. It might even have to happen under the
frameworks, since I suppose you would need to define how best to make
these
servers co-exist and then add this session manager so that all server
environments are affected in the same way.

What I've done so far with certain WebStack examples is to provide a
resource which deals with authentication and then to add the actual
application functionality as a resource within that resource. I imagine
that
the chaining of WSGI components would be done in a similar fashion,
although
WebStack doesn't address the issue of dispatching through different
server
environments, whereas your example situation would have to tackle that
issue.

Paul
From py-web-sig at xhaus.com  Mon Sep  6 16:56:33 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Sep  6 16:51:38 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com>
References: <413B8B7A.4090401@xhaus.com> <413B8B7A.4090401@xhaus.com>
	<5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com>
Message-ID: <413C7AA1.8010702@xhaus.com>

[Alan Kennedy]
>> The other main one that springs to mind is how WSGI applications 
>> discover the file-system path name that corresponds to an URI.

[Phillip J. Eby]
> *boggle*  Why do you think that URIs have anything to do with file 
> paths?  In the general case, they are entirely unrelated.

Well, perhaps it's just that pretty much every web 
server/harness/framework I ever used has support for mapping URIs to 
files. How silly of me to try to apply my experience of other web 
systems to WSGI.

In the *general* case, yes, such a mapping has no meaning.

But there are specific cases, e.g. static file serving, where it is 
required.

[Phillip J. Eby]
> Well-written Python applications make this sort of thing part of their 
> configuration today already, because in the general case (e.g. 
> mod_rewrite) this stuff just plain isn't guessable.

It doesn't even have to be guessable: it could be standardised.

[Phillip J. Eby]
> Also, if you need access to local resources, relative to some Python 
> module, just grab the '__file__' attribute/variable of that module, and 
> then use 'os.path' functions to portably manipulate it.  E.g.:
> 
>     my_dir = os.path.dirname(__file__)
>     target = os.path.join(os.path.join(my_dir,"images"),"stars.jpg")
> 
> This is simple and portable.  If you need something more complex, you 
> should probably have configuration specific to the application that 
> spells out what it needs to know.

And that is a nice (python-specific) solution to the problem.

Perhaps it's worth adding something to the Q&A about how to map URIs to 
files in the local file system, based on the above pythonic, i.e. 
module.__file__, approach?

Alan.

From david at sundayta.com  Mon Sep  6 16:59:03 2004
From: david at sundayta.com (David Warnock)
Date: Mon Sep  6 16:59:10 2004
Subject: [Web-SIG] wsgi layers
Message-ID: <413C7B37.7020305@sundayta.com>

Hi,

Is my understanding correct in terms of layers

A web browser sends requests to a WSGI enabled web server (eg mod_python 
under apache, or medusa or twisted) which passes them through installed 
WSGI middleware layers (eg session management, gzip, cookie consolidator 
etc) to an application hosted inside a WSGI enabled application 
framework (eg quixote).

So the intention is that the application is written within the features 
of a specific WSGI enabled application framework while it can be hosted 
(via the way it's framework is WSGI compliant) in any WSGI server 
environment.

If all this is so, then I am confused about which projects are currently 
implementing/planning to implement wsgi as servers and as application 
frameworks. My assumption is that the servers being pluggable don't need 
to be my first concern as long as there is something that can be used 
for testing. But the application framework is the critical one for 
application developers. What is the state of play here?

Thanks

Dave
-- 
David Warnock, Sundayta Ltd. http://www.sundayta.com
iDocSys for Document Management. VisibleResults for Fundraising.
Development and Hosting of Web Applications and Sites.

From paul.boddie at ementor.no  Mon Sep  6 17:00:09 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep  6 17:00:13 2004
Subject: [Web-SIG] Standardising containment.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CB1F@100nooslmsg005.common.alpharoot.net>

Alan Kennedy wrote:
> 
> [Phillip J. Eby]
> > *boggle*  Why do you think that URIs have anything to do with file 
> > paths?  In the general case, they are entirely unrelated.
> 
> Well, perhaps it's just that pretty much every web
> server/harness/framework I ever used has support for mapping URIs to
> files. How silly of me to try to apply my experience of other web
systems
> to WSGI.

Perhaps WSGI is too "low-level" for such considerations. I don't know.

> In the *general* case, yes, such a mapping has no meaning.
> 
> But there are specific cases, e.g. static file serving, where it is 
> required.

Coming from a J2EE background, as I guess you are, there's a fairly
strong
tradition that resources are sort of "mounted" within the context of the
application, isn't there? In other words, if my application refers to
somedir/somefile, the framework will have done the necessary directory
changing such that the reference translates to
$CONTEXT/somedir/somefile.

It actually doesn't matter what the URL is and whether you're mapping
that
or something else to a filename, or whether you're mapping anything to a
filename at all. It could just be a nice idea to define the behaviour
when
some component uses a non-absolute path in order to access some
resource.

[__file__]

> And that is a nice (python-specific) solution to the problem.
> 
> Perhaps it's worth adding something to the Q&A about how to map URIs
to 
> files in the local file system, based on the above pythonic, i.e. 
> module.__file__, approach?

I've seen some strange stuff with __file__ in my time, however.
Moreover,
how does all this map to things like Zope where resources aren't
necessarily
related to the filesystem?

Paul
From py-web-sig at xhaus.com  Mon Sep  6 17:26:08 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Sep  6 17:21:12 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CAF4@100nooslmsg005.common.alpharoot.net>
References: <89DE0F3E9781C048A14DC88C06D9F93D18CAF4@100nooslmsg005.common.alpharoot.net>
Message-ID: <413C8190.8090902@xhaus.com>

[Alan Kennedy]
 >>>The other main one that springs to mind is how WSGI applications
 >>>discover the file-system path name that corresponds to an URI.

[Ben Sizer]
 >>I thought that one of the major features of most of these Python web
 >>frameworks is that a URI doesn't map to a file but to an object or a
 >>function, several of which might be in one physical file. Since WSGI
 >>seems to be promoted as a minimal system that applies equally to
 >>almost any system, I'd think that such a mapping falls entirely out
 >>of its scope.

[Paul Boddie]
 > It probably does for WSGI, although I wonder how such issues (and the
 > many others out there) can be simultaneously avoided and yet
 > anticipated by the specification in order to avoid incompatibilities
 > later on.

And avoiding incompatibility is what I am trying to do.

[Ben Sizer]
 >>I agree that it might be useful to have this functionality. I think a
 >>standard way to map URIs to Python files would be beneficial for Python
 >>web development. I just don't see it fitting into what people here
 >> have told me about WSGI.

[Paul Boddie]
 > I suppose that Alan is moving slowly up the stack.

I'm sorry if I appear not to be as au fait with these matters as you. I 
see that you've been addressing all of these problems for years with 
WebStack.

 > It's an interesting issue that existing frameworks have addressed
 > in their own ways (the getRealPath that Alan mentioned, Webware's
 > getServerSidePath, and so on), and although one can wonder whether
 > application data (which the image example could almost be considered
 > as being) should be configured within or with reference to the server
 > environment or not, if you consider having to specify the filenames
 > of resources within an application, it's much nicer to be able to
 > make those filenames relative to some deployment variable (eg. where
 > the application ends up when deployed) and to keep those resources
 > bundled with the application than to have to manually configure the
 > application to use absolute paths before/during/after deployment.
 >
 > I hope that made sense. ;-)

Yes, it does make sense.

To summarise: it is *sometimes* the case that static resources and the 
functionality that renders them are *deployed* together, i.e. in the 
directory structure, which can make for simplicity of deployment and 
administration.

And as Phillip has suggested, the python module.__file__ attribute can 
be used to support location of such resources.

Regards,

Alan.
From pythonTutor at venix.com  Mon Sep  6 17:55:00 2004
From: pythonTutor at venix.com (Lloyd Kvam)
Date: Mon Sep  6 17:55:04 2004
Subject: [Web-SIG] Making HEAD request using urllib2 module
Message-ID: <1094486100.3107.26.camel@laptop.venix.com>

I wrote a URL checker to verify that a website is up and responding. 
For this, a HEAD request rather than GET seems better.  The urllib2
module provides a Request class with a get_method method.  I derived my
HeadRequest class overriding get_method to return HEAD if there was no
POST data.  Then I discovered that AbstractHTTPHandler.do_open did not
use Request.get_method, but simply used GET if there was no POST data.

I changed do_open to use Request.get_method and that's working for me. 
Should I be reporting a bug and offering a patch?  Or am I missing the
boat on other issues?  Is anyone making changes to urllib2?


Lloyd Kvam
Venix Corp.
1 Court Street, Suite 378
Lebanon, NH 03766-1358

voice:	603-653-8139
fax:	320-210-3409 (changed Aug 26, 2004)

From floydophone at gmail.com  Mon Sep  6 18:35:13 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Mon Sep  6 18:35:21 2004
Subject: [Web-SIG] RE: Standardised configuration.
Message-ID: <6654eac404090609351a5ebd83@mail.gmail.com>

I was actually thinking about this a week ago. First of all, I think
the configuration should be implemented as middleware. It will read a
configuration file or resource and stick it into environ["config"].
This way, we can have pluggable middleware which could, perhaps, take
their configuration from a remote server, local file, or other data
source.

The configuration middleware should be extensible and allow each
middleware to be configured (i.e.
environ["config"]["mymodule.gzip_middleware"]). Now I'm not a huge fan
of XML, but I think this would work okay:

<wsgi-config>
<middleware name="mymodule.gzip_middleware">
<!-- the data here can be arbitrary depending on how the middleware
wants to deal with it (plaintext or structured XML) -->
</middleware>
</wsgi-config>

Finally, I see a need for at least two different types of
configuration files. One has to be a "gateway" configuration. It sets
up general settings used by all applications on the server. This is
analagous to an httpd.conf file. For example, this is needed so shared
webhosting providers can set up generic services, such as storing
sessions on their RDBMS for speed purposes.

There also needs to be an "application" configuration file, for those
who want to set up application-specific services, such as gzip
encoding. My simple XML configuration format allows both configuration
of middleware, AND picking which middleware will be installed for a
request.

We also have to remember that applications may not have a working
directory. They might simply exist as Python functions inside of
BaseHTTPServer. Thus, the _gateway_ must instantiate the configuration
middleware for the gateway, AND it must instantiate the configuration
middleware for the application (if it exists).

i.e. mod_python would pick the gateway configuration file as the one
installed in the mod_python directory, and it would pick the
application configuration file as the one in the working directory of
the current script.

What do you think?
From tony at lownds.com  Mon Sep  6 18:44:56 2004
From: tony at lownds.com (tony@lownds.com)
Date: Mon Sep  6 19:05:24 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
References: <Your message of "Wed, 01 Sep 2004 20:25:56
	PDT."<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
Message-ID: <54134.67.127.185.114.1094489096.squirrel@*>

> [skipping stuff that Ian answered]
>
> At 12:47 PM 9/2/04 -0700, Bill Janssen wrote:
>
>>I'm not familiar with all the ins and outs of files on Python and
>>Jython and IronPython, so I'll just say, reasonable enough.  Though
>>I'd prefer to say, a file-like object (whatever that means).
>
> File-like is out of scope; there were only ever two kinds of objects
> intended to be returnable:
>
> 1) Iterables (the initial scope)
>
> 2) Objects that map to an operating system file descriptor, as an optional
> special case to increase performance (added later per user request)
>

But using a file object as an iterable is going to give terrible
performance, and fileno() isn't good enough for Jython and IronPython. I
don't see why allowing a file-like object is unreasonable. If an
application returns a file-like object, it should render the same data
whether accessed through read(), or fileno(), or next().

-Tony

From ianb at colorstudy.com  Mon Sep  6 19:22:19 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Sep  6 19:22:24 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <413C7AA1.8010702@xhaus.com>
References: <413B8B7A.4090401@xhaus.com>
	<413B8B7A.4090401@xhaus.com>	<5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com>
	<413C7AA1.8010702@xhaus.com>
Message-ID: <413C9CCB.1070005@colorstudy.com>

Alan Kennedy wrote:
> [Alan Kennedy]
> 
>>> The other main one that springs to mind is how WSGI applications 
>>> discover the file-system path name that corresponds to an URI.
> 
> 
> [Phillip J. Eby]
> 
>> *boggle*  Why do you think that URIs have anything to do with file 
>> paths?  In the general case, they are entirely unrelated.
> 
> 
> Well, perhaps it's just that pretty much every web 
> server/harness/framework I ever used has support for mapping URIs to 
> files. How silly of me to try to apply my experience of other web 
> systems to WSGI.

I guess it depends how you're looking at it.  Zope, for instance, is 
exactly the opposite -- files are an extension, not a native concept 
(with respect to URLs).  Quixote and Twisted both prominently feature 
ways to parse the URL to find a resource, which is not a file.  At some 
level, most frameworks allow for this kind of URL manipulation.  And I 
would assume the same is true in Java, somehow...?  At least among 
Python frameworks, URIs cannot generally be mapped to URLs.

Of course, there is an issue -- if not a file, it would be nice to find 
the terminal application for a particular URL.  But that's very vague, 
and something that WSGI does not facilitate.  If we have a bunch of 
middleware, is there any way to say "give me the last one"?  Is that 
even meaningful, as the middleware is not necessary pass-through?  So 
maybe if you think you need the terminal application, it might be better 
to reconsider and refactor the problem.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Mon Sep  6 20:14:31 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep  6 20:13:51 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <54134.67.127.185.114.1094489096.squirrel@*>
References: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<Your message of "Wed, 01 Sep 2004 20:25:56
	PDT."<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>

At 09:44 AM 9/6/04 -0700, tony@lownds.com wrote:

>But using a file object as an iterable is going to give terrible
>performance, and fileno() isn't good enough for Jython and IronPython. I
>don't see why allowing a file-like object is unreasonable.

Because explicit is better than implicit.  Returning a "file-like" object 
can mean, "read all the data and send it as one block", or "read the data 
in arbitrary-size blocks and send them".  The application should say what 
it means!  Either:

     return [filelike.read()]

or:
     yield filelike.read()

or:
     return iter(lambda: filelike.read(bufsize), '')

or something else, according to the results it intends.  The server 
shouldn't have to *guess* which of these is meant.


From tony at lownds.com  Mon Sep  6 20:41:43 2004
From: tony at lownds.com (tony@lownds.com)
Date: Mon Sep  6 21:02:15 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
References: <5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>PDT."<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
Message-ID: <54588.67.127.185.114.1094496103.squirrel@*>

> Because explicit is better than implicit.  Returning a "file-like" object
> can mean, "read all the data and send it as one block", or "read the data
> in arbitrary-size blocks and send them".  The application should say what
> it means!  Either:
>
>      return [filelike.read()]
>
> or:
>      yield filelike.read()
>
> or:
>      return iter(lambda: filelike.read(bufsize), '')
>
> or something else, according to the results it intends.  The server
> shouldn't have to *guess* which of these is meant.
>

Wouldn't servers be better equipped to send a file efficiently, rather
than the application?
The recipe for sending decent-sized chunks instead of line-sized chunks
just obliterated the fileno() optimization.

I'm specifically advocating that servers be required to use read() if they
can't use fileno(). When an application returns an open file object,
servers that send it out line by line (ie, as an interator) would be far
far slower than servers that use fileno(). So that technique wouldn't
really be portable across WSGI implementations. Using read() would make
returning an open file a viable technique on all WSGI servers.

-Tony

From py-web-sig at xhaus.com  Mon Sep  6 21:10:03 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Sep  6 21:05:09 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <413C9CCB.1070005@colorstudy.com>
References: <413B8B7A.4090401@xhaus.com>
	<413B8B7A.4090401@xhaus.com>	<5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com>
	<413C7AA1.8010702@xhaus.com> <413C9CCB.1070005@colorstudy.com>
Message-ID: <413CB60B.6090504@xhaus.com>

[Alan Kennedy]
 >>>> The other main one that springs to mind is how WSGI applications
 >>>> discover the file-system path name that corresponds to an URI.

[Phillip J. Eby]
 >>> *boggle*  Why do you think that URIs have anything to do with file
 >>> paths?  In the general case, they are entirely unrelated.

[Alan Kennedy]
 >> Well, perhaps it's just that pretty much every web
 >> server/harness/framework I ever used has support for mapping URIs to
 >> files. How silly of me to try to apply my experience of other web
 >> systems to WSGI.

[Ian Bicking]
 > I guess it depends how you're looking at it.  Zope, for instance, is
 > exactly the opposite -- files are an extension, not a native concept
 > (with respect to URLs).  Quixote and Twisted both prominently feature
 > ways to parse the URL to find a resource, which is not a file.  At some
 > level, most frameworks allow for this kind of URL manipulation.  And I
 > would assume the same is true in Java, somehow...?  At least among
 > Python frameworks, URIs cannot generally be mapped to URLs.

Just a couple of quick points.

1. I am fully aware that there is not necessarily a mapping from URLs to 
files. It's just that sometimes it does have a meaning, with serving 
static files being the obvious example, and I think we need to keep that 
in mind.

Though perhaps it should remain a server-specific thing. Perhaps it's 
worth adding a note to the spec to explain why such facilities are *not* 
available.

2. Phillip has already proposed a pythonic solution: the python 
module.__file__ attribute.

3. I am *not* holding up J2EE as the be-all-and-end-all of models for 
web development: it has substantial problems, IMO. It's just that A: I 
happen to be implementing a WSGI server on J2EE at the moment, B: it is 
a very mature web architecture that provides a lot of useful facilities. 
I think WSGI should at least be informed by as many such architectures 
as possible, and C: I've used J2EE often enough to know reasonably well 
what it can and can't do.

4. J2EE does not provide particularly good facilities for incrementally 
mapping URL sub-components to application objects, although it does 
provide all the information required should one desire to do so oneself.

 > Of course, there is an issue -- if not a file, it would be nice to find
 > the terminal application for a particular URL.  But that's very vague,
 > and something that WSGI does not facilitate.  If we have a bunch of
 > middleware, is there any way to say "give me the last one"?  Is that
 > even meaningful, as the middleware is not necessary pass-through?  So
 > maybe if you think you need the terminal application, it might be better
 > to reconsider and refactor the problem.

I'm not sure I see a direct connection between the terminal application 
and uri->file mapping.

Another example that springs to mind is a middleware component that 
takes care of, say "media downgrading", i.e. removing image references 
for aural/tactile/textual user-agents, and replacing it with a 
textual/metadata equivalent.

Such a component may not live at the top of the middleware stack. Quite 
possibly some higher up component will be generating some form of 
markup, which contains image references. The rendering component, 
further down the stack, would rewrite those references in the markup to 
contain whatever textual equivalent is appropriate.

Now, when the downgrading component is doing it's job, simply knowing a 
URI reference to each image may not be enough. If it is going to 
transform a reference to an image, it may need to actually find, open 
and parse that image, in order to extract it's metadata, e.g. width, 
height, textual description, etc.

Let's further assume that requests for the images URIs are *not* handled 
by a WSGI component. Let's say for example instead that URIs for such 
static asset files are served by the platform (e.g. Apache) directly, 
for (perhaps dubious, perhaps valid) performance reasons.

So how does the component actually get its mitts on the physical image 
when it is needed? All it has is an URI for the image. It could crank up 
httplib, make an HTTP request to the platform for the image, and examine 
the returned contents. But that's significantly more expensive than 
asking the platform to construct a file-system pathname for the image 
file, based solely on its URI, and then accessing it through the filesystem.

This example is perhaps overly contrived, but I'm trying to explain 
examples of why I think it is sometimes necessary to refer to the 
platform in order to find physical locations of other content served by 
that platform. This other content may not be under the control of WSGI 
applications.

Either way, I think it's a good thing for us to thrash all of these 
issues out. It's better that we sort it out as much as possible now 
rather than after the WSGI PEP has been finalised.

Maybe my approach has been wrong over the last few days. I've been 
writing to the SIG about issues that I have seen during my 
implementation phase. When I write about a particular issue, or feature 
of another language/framework, that doesn't mean that I'm demanding for 
such to be added to WSGI. It just means "Hey Folks, here's something 
that occurred to me that may need some consideration for WSGI".

And judging by many of the responses to my posts, e.g. along the lines 
of "I see what you're saying, *but* .... ", and "Well I think it's 
outside the spec, but yes you're right, it would be really nice to 
standardise X", I seem to be identifying the boundaries of WSGI pretty well.

I'm happy to be shot down by good arguments: we're all trying to achieve 
the same thing here: the best possible pythonic web architecture. And 
I'll never be too old to learn ;-)

Regards,

Alan.
From jjl at pobox.com  Mon Sep  6 23:44:59 2004
From: jjl at pobox.com (John J Lee)
Date: Mon Sep  6 23:45:04 2004
Subject: [Web-SIG] Making HEAD request using urllib2 module
In-Reply-To: <1094486100.3107.26.camel@laptop.venix.com>
References: <1094486100.3107.26.camel@laptop.venix.com>
Message-ID: <Pine.LNX.4.58.0409062235070.443@alice>

On Mon, 6 Sep 2004, Lloyd Kvam wrote:

> I wrote a URL checker to verify that a website is up and responding.
> For this, a HEAD request rather than GET seems better.  The urllib2
> module provides a Request class with a get_method method.  I derived my
> HeadRequest class overriding get_method to return HEAD if there was no
> POST data.  Then I discovered that AbstractHTTPHandler.do_open did not
> use Request.get_method, but simply used GET if there was no POST data.
>
> I changed do_open to use Request.get_method and that's working for me.
> Should I be reporting a bug and offering a patch?  Or am I missing the
> boat on other issues?

Don't think it's a bug -- it's simply not implemented.  But do go ahead
and upload it as a patch to the SF patch tracker!

I don't like the idea of a subclass just for HEAD requests (but then I
don't much like the Request class at all).  How about an additional
optional arg to the Request constructor, named 'method', instead?


> Is anyone making changes to urllib2?

Yes, me.


John
From pje at telecommunity.com  Tue Sep  7 02:29:32 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Sep  7 02:28:49 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <54588.67.127.185.114.1094496103.squirrel@*>
References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>

At 11:41 AM 9/6/04 -0700, tony@lownds.com wrote:
>I'm specifically advocating that servers be required to use read() if they
>can't use fileno().

But with what block size?  If the block size is the whole file, why not 
just use:

     return [filelike.read()]

If it's some other block size, why not be explicit?


>  When an application returns an open file object,
>servers that send it out line by line (ie, as an interator) would be far
>far slower than servers that use fileno(). So that technique wouldn't
>really be portable across WSGI implementations. Using read() would make
>returning an open file a viable technique on all WSGI servers.

Okay, you've convinced me: the fileno() optimization (as it's currently 
specified) needs to be removed, and I need to strip out all mention of 
returning files from the application.  (Except maybe to mention that it's a 
bad idea!)

Instead of using 'fileno' as an extension attribute on the iterable, we'll 
add a 'wsgi.file_wrapper' key, usable as follows by an application:

     return environ['wsgi.file_wrapper'](something,blksize)

The 'file_wrapper' may introspect "something" in order to do a fileno() 
check, or other "I know how to send this kind of object quickly" 
optimizations.  It must return an iterable, that the application may return 
back to the server.  The server *must not* assume that the application 
*will* return the iterable; it is perfectly legal to do something like this:

     an_iter = environ['wsgi.file_wrapper'](something,blksize)

     for block in an_iter:
          yield block.replace('\n', '\r\n')

In this case, the application iterates over the file, but the original 
iterator's contents are not yielded.  In the same way, middleware may 
transform or ignore data yielded by the iterator.  So, in effect 
'file_wrapper' should just wrap the original file-like object in an 
iterator that the server can recognize and perform an optimization on, in 
the event that it *actually* is returned by the application.

Here's the simplest possible conforming implementation of 'file_wrapper', 
that works for any modern (1.5.2+) Python:

     class file_wrapper:

         def __init__(self,readable,blocksize=8192):
             self.readable, self.blocksize = readable, blocksize
             self.close = readable.close

         def __getitem__(self,index):
             data = self.readable.read(self.blocksize)
             if data:
                 return data
             raise IndexError

     environ['wsgi.file_wrapper'] = file_wrapper

     result = application(environ, start_response)

     if isinstance(result, file_wrapper):
         # check result.readable for fileno() or other optimizations
     else:
         # do normal iteration over 'result'

Unfortunately, this is a lot more boilerplate than I'd like to impose on 
server authors.  But, if we don't, then the same boilerplate is effectively 
imposed on all application/framework/middleware authors who want to return 
file-like objects.

The other hassle here is going to be adjusting the PEP's presentation 
sequence so that this complication doesn't obscure the simplicity of the 
"CGI Gateway" example.  :(

The other alternative is to check for a 'read()' method as an alternative 
to iterability, but it leaves open the question of appropriate block 
size.  I suppose we could say that this is up to the server.

But, no matter how the introspection works, it's going to work strongly 
against the appearance of simplicity in the examples.  :(

From ianb at colorstudy.com  Tue Sep  7 02:31:52 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Sep  7 02:31:57 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <413CB60B.6090504@xhaus.com>
References: <413B8B7A.4090401@xhaus.com>	<413B8B7A.4090401@xhaus.com>	<5.1.1.6.0.20040906100050.02e8f840@mail.telecommunity.com>	<413C7AA1.8010702@xhaus.com>
	<413C9CCB.1070005@colorstudy.com> <413CB60B.6090504@xhaus.com>
Message-ID: <413D0178.4010406@colorstudy.com>

Alan Kennedy wrote:
>  > Of course, there is an issue -- if not a file, it would be nice to find
>  > the terminal application for a particular URL.  But that's very vague,
>  > and something that WSGI does not facilitate.  If we have a bunch of
>  > middleware, is there any way to say "give me the last one"?  Is that
>  > even meaningful, as the middleware is not necessary pass-through?  So
>  > maybe if you think you need the terminal application, it might be better
>  > to reconsider and refactor the problem.
> 
> I'm not sure I see a direct connection between the terminal application 
> and uri->file mapping.
> 
> Another example that springs to mind is a middleware component that 
> takes care of, say "media downgrading", i.e. removing image references 
> for aural/tactile/textual user-agents, and replacing it with a 
> textual/metadata equivalent.
> 
> Such a component may not live at the top of the middleware stack. Quite 
> possibly some higher up component will be generating some form of 
> markup, which contains image references. The rendering component, 
> further down the stack, would rewrite those references in the markup to 
> contain whatever textual equivalent is appropriate.
> 
> Now, when the downgrading component is doing it's job, simply knowing a 
> URI reference to each image may not be enough. If it is going to 
> transform a reference to an image, it may need to actually find, open 
> and parse that image, in order to extract it's metadata, e.g. width, 
> height, textual description, etc.

This is reasonable.  My initial suggestion would be to create an 
artificial request; creating a new environ and re-calling the 
application, fetching the object at that location.  Then, if it is a 
file object you can find it on disk (file objects have some attribute, I 
forget what), or if not you can read the data in and find its width and 
such.  But that might not work...

> Let's further assume that requests for the images URIs are *not* handled 
> by a WSGI component. Let's say for example instead that URIs for such 
> static asset files are served by the platform (e.g. Apache) directly, 
> for (perhaps dubious, perhaps valid) performance reasons.

Obviously, this is much more complex, as the middleware can't call its 
application, since the application doesn't actually have access to the 
object, rather some parent server handles the object.

If you wanted to do the same sort of recursive request, a server could 
provide an extension to allow this.  Presumably you would get back 
another iterable, which may be a file object, which would contain the 
necessary information.

But, in both cases, there's a limit to what you can do -- you only get 
access to the public information stored in that particular image.  Maybe 
there's text files alongside the image, which mean that you need access 
to the filename.  E.g., image.jpg and image.jpg.desc, in the same 
directory.  If you get back the original file object, you can do this -- 
but it seems likely in many circumstances that you won't get back the 
file object at all, you'll get some wrapped version, and you won't be 
able to find the filename.

This is also where it would be nice if the response had more structure 
(or at least potential for structure) than what we currently have in 
WSGI.  If there were an (optional) attribute .fileobj (or something) 
wrappers could use this to expose the underlying file object, useful 
when you want to do this kind of server introspection.  It's not 
impossible that the application iterator could have these methods, but 
it's not an extension that WSGI really talks about.  Maybe it should.

Another extension that a server could implement is a URL resolver; if 
the server actually resolved URLs, to applications/resources/files, then 
it might expose this.  But as an extension it's not uniform, and I don't 
think it could be very uniform.  But I think there's a genuine need 
there, as I encounter things like this myself.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From tony at lownds.com  Tue Sep  7 05:21:23 2004
From: tony at lownds.com (tony@lownds.com)
Date: Tue Sep  7 05:41:52 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
Message-ID: <55542.67.127.185.114.1094527283.squirrel@*>

> The other alternative is to check for a 'read()' method as an
alternative
> to iterability, but it leaves open the question of appropriate block
size.  I suppose we could say that this is up to the server.
>

Yes, much simpler, and servers can come up with a good block size.

> But, no matter how the introspection works, it's going to work strongly
against the appearance of simplicity in the examples.  :(
>
>

Here's the tail end of the CGI example.

    result = application(environ, start_response)
    try:
        if hasattr(result, 'read'):
            result = iter(lambda: result.read(BLOCKSIZE), '')
        for data in result:
            write(data)
    finally:
        if hasattr(result,'close'):
            result.close()

-Tony
From paul.boddie at ementor.no  Tue Sep  7 12:45:07 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Tue Sep  7 12:45:12 2004
Subject: [Web-SIG] Standardising containment.
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D18CC2B@100nooslmsg005.common.alpharoot.net>

Alan Kennedy wrote:
>
> Perhaps it's worth adding something to the Q&A about how to map URIs
to 
> files in the local file system, based on the above pythonic, i.e. 
> module.__file__, approach?

Another note about this here:

http://www.google.com/groups?selm=ur7pfb6uz.fsf%40fitlinxx.com

I've been most interested in having applications represented by modules
or
packages which get imported by various adapters, but in schemes where
applications are just executed programs, you might run into an issue
with
__file__ and older Python releases.

Paul

P.S. Although __file__ is supposedly Pythonic, it's quite possible that
the
resources associated with an application don't always reside in an
easily
discoverable location relative to the application's modules - ie. they
get
installed in some opaquely-named directory which might vary with the
framework being used, even it is located relative to those modules in
the
filesystem. Perhaps an explicit resource path (or context path) needs
defining somewhere.
From py-web-sig at xhaus.com  Tue Sep  7 13:06:20 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Tue Sep  7 13:01:22 2004
Subject: [Web-SIG] wsgi layers
In-Reply-To: <413C7B37.7020305@sundayta.com>
References: <413C7B37.7020305@sundayta.com>
Message-ID: <413D962C.3020906@xhaus.com>

[David Warnock]
 > Is my understanding correct in terms of layers
 >
 > A web browser sends requests to a WSGI enabled web server (eg
 > mod_python under apache, or medusa or twisted) which passes them
 > through installed WSGI middleware layers (eg session management,
 > gzip, cookie consolidator etc) to an application hosted inside
 > a WSGI enabled application framework (eg quixote).

I understand it differently, though perhaps wrongly.

I see that the request arrives to a web server, which is either a pure 
python server, or a native-code server with a python interpreter and a 
very thin WSGI adapter. This transforms the request into a WSGI 
compatible request, and calls a single python application callable with 
it, as specified under WSGI.

Possible server+adapter combinations would be

Apache + mod_python_wsgi
AnyServer + CGI + wsgi.py
SimpleHttpServerWSGI + Very little
Tomcat + modjy
Factored-out Medusa request dispatcher
PyWx+, FastCGI+, SCGI+, etc, etc, etc.

The single python application callable, I see as *being* the python 
framework, e.g. WebWare. So all those server+adapter combinations listed 
above basically become the bootstrap process by which HTTP requests are 
fed to WSGI frameworks.

Hence, a fully-refactored-for-WSGI WebWare would then be portable to all 
of the above server+adapter combinations (python 2.2+ accepted).

The WSGIWebWare application would then be responsible for driving the 
request through a stack (more likely a tree) of middleware components, 
based on its configuration.

So I suppose that I see middleware stacks/trees as the generic class of 
python frameworks, and individual frameworks as instances of that class, 
each with their own specific mechanisms for specifying configuration of 
the stack/tree of middleware components.

To me, the portability of middleware would be ideally between 
frameworks. For example, I could take the WebWare session management 
middleware component and plug it into a Snakelets middleware stack. Or 
more appropriately: don't make the Snakelets guy have to bend his brain 
about session management and all of its horrors: just borrow and reuse 
an existing quality and field-tested component. So, when I write about 
middleware portability, this is what I mean, although that seems to 
conflict with your picture of middleware happening outside the framework.

The difference between your picture and mine is that I don't see where 
the middleware configuration happens in your processing model, i.e. how 
is the stack of middleware components before the framework configured?

In the case of twisted or zope, I have to say that I'm not familiar 
enough with the structure of either to know how exactly they would fit in.

But I know that an asynchronous WSGI server could be fairly easily put 
together simply using asyncore. In this case, the application callable 
could then be a simple dispatcher that sends WSGI requests down queues 
into processing objects in other threads (which have been created by the 
application callable at initialization time). The other-thread objects 
receiving those requests from the queues could themselves drive the 
requests through a stack of WSGI middleware. So the queues down which 
requests are sent would simply be a mechanism for extending middleware 
trees/stacks across thread boundaries (and potentially processor 
boundaries in jython and ironpython).

 > So the intention is that the application is written within the
 > features of a specific WSGI enabled application framework while
 > it can be hosted (via the way it's framework is WSGI compliant)
 > in any WSGI server environment.
 >
 > If all this is so, then I am confused about which projects are
 > currently implementing/planning to implement wsgi as servers and
 > as application frameworks. My assumption is that the servers being
 > pluggable don't need to be my first concern as long as there is
 > something that can be used for testing. But the application
 > framework is the critical one for application developers. What
 > is the state of play here?

The above is my outline view of the topic. I think it would be great if 
we could standardize on some terminology to be discussing these matters. 
I found myself considering replacing the word "framework" with 
"WebWare-like" up above, because the "f-word" is potentially 
inappropriately used. From the middleware components point of view, the 
framework is the WSGI server. From the server point of view, e.g. 
mod_python, the framework is the WSGI application.

Regards,

Alan.
From pythonTutor at venix.com  Tue Sep  7 14:50:00 2004
From: pythonTutor at venix.com (Lloyd Kvam)
Date: Tue Sep  7 14:50:27 2004
Subject: [Web-SIG] Making HEAD request using urllib2 module
In-Reply-To: <Pine.LNX.4.58.0409062235070.443@alice>
References: <1094486100.3107.26.camel@laptop.venix.com>
	<Pine.LNX.4.58.0409062235070.443@alice>
Message-ID: <1094561400.4811.9.camel@laptop.venix.com>

Thanks for the response.

On Mon, 2004-09-06 at 17:44, John J Lee wrote:
> On Mon, 6 Sep 2004, Lloyd Kvam wrote:
> 
> > I wrote a URL checker to verify that a website is up and responding.
> > For this, a HEAD request rather than GET seems better.  The urllib2
> > module provides a Request class with a get_method method.  I derived my
> > HeadRequest class overriding get_method to return HEAD if there was no
> > POST data.  Then I discovered that AbstractHTTPHandler.do_open did not
> > use Request.get_method, but simply used GET if there was no POST data.
> >
> > I changed do_open to use Request.get_method and that's working for me.
> > Should I be reporting a bug and offering a patch?  Or am I missing the
> > boat on other issues?
> 
> Don't think it's a bug -- it's simply not implemented.  But do go ahead
> and upload it as a patch to the SF patch tracker!
> 
> I don't like the idea of a subclass just for HEAD requests (but then I
> don't much like the Request class at all).  How about an additional
> optional arg to the Request constructor, named 'method', instead?

I started down the subclass path on the assumption that overriding
get_method was all that was necessary and I could avoid changing the
urllib2 module.

> 
> 
> > Is anyone making changes to urllib2?
> 
> Yes, me.
> 
> 
> John
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/pythontutor%40venix.com
-- 
Lloyd Kvam
Venix Corp

From janssen at parc.com  Tue Sep  7 20:31:16 2004
From: janssen at parc.com (Bill Janssen)
Date: Tue Sep  7 20:32:38 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4 
In-Reply-To: Your message of "Mon, 06 Sep 2004 17:29:32 PDT."
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com> 
Message-ID: <04Sep7.113120pdt."58613"@synergy1.parc.xerox.com>

> But, no matter how the introspection works, it's going to work strongly 
> against the appearance of simplicity in the examples.  :(

Luckily, the WSGI spec is for server and framework implementors, who
are used to a lack of simplicity :-).

Bill
From py-web-sig at xhaus.com  Tue Sep  7 21:12:27 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Tue Sep  7 21:07:27 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
Message-ID: <413E081B.70609@xhaus.com>

[Phillip J. Eby]
 > Instead of using 'fileno' as an extension attribute on the iterable,
 > we'll add a 'wsgi.file_wrapper' key, usable as follows by an
 > application:
 >
 >     return environ['wsgi.file_wrapper'](something,blksize)
 >
 > The 'file_wrapper' may introspect "something" in order to do a
 > fileno() check, or other "I know how to send this kind of object
 > quickly" optimizations.  It must return an iterable, that the
 > application may return back to the server.

[tony@lownds.com]
 > Here's the tail end of the CGI example.
 >
 >     result = application(environ, start_response)
 >     try:
 >         if hasattr(result, 'read'):
 >             result = iter(lambda: result.read(BLOCKSIZE), '')
 >         for data in result:
 >             write(data)
 >     finally:
 >         if hasattr(result,'close'):
 >             result.close()

Since I am just about to implement "wsgi.file_wrapper", I just wanted to 
check that my understanding of it is correct.

I think Tony's example above is not correct: the hasattr(result, 'read') 
should not be necessary, since the 'file_wrapper' class should implement 
its own iterator? I think it should read simply

result = application(environ, start_response)
try:
   for data in result:
     write(data)
finally:
   if hasattr(result,'close'):
     result.close()

Only the application has to change in this case, to return any file like 
object, wrapped in a 'file_wrapper'?

Is this correct?

Regards,

Alan.
From pje at telecommunity.com  Tue Sep  7 21:12:04 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Sep  7 21:11:17 2004
Subject: [Web-SIG] Standardising containment.
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D18CC2B@100nooslmsg005.comm
	on.alpharoot.net>
Message-ID: <5.1.1.6.0.20040907150800.028698c0@mail.telecommunity.com>

At 12:45 PM 9/7/04 +0200, Paul Boddie wrote:

>P.S. Although __file__ is supposedly Pythonic, it's quite possible that
>the
>resources associated with an application don't always reside in an
>easily
>discoverable location relative to the application's modules - ie. they
>get
>installed in some opaquely-named directory which might vary with the
>framework being used, even it is located relative to those modules in
>the
>filesystem. Perhaps an explicit resource path (or context path) needs
>defining somewhere.

Note that existing frameworks and applications already have lots of ways to 
handle this.  For example, applications like Roundup, MoinMoin, and 
Pyblosxom either have prescribed layouts or use configuration files that 
indicate where things are.

Thus, this is an area where adding a facility to WSGI is just creating 
"choice N+1" instead of actually reducing unnecessary choice.

From tony at lownds.com  Tue Sep  7 22:16:50 2004
From: tony at lownds.com (tony@lownds.com)
Date: Tue Sep  7 22:37:27 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <413E081B.70609@xhaus.com>
References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<413E081B.70609@xhaus.com>
Message-ID: <64786.204.162.121.54.1094588210.squirrel@*>

> [Phillip J. Eby]
>  > Instead of using 'fileno' as an extension attribute on the iterable,
>  > we'll add a 'wsgi.file_wrapper' key, usable as follows by an
>  > application:
>  >
>  >     return environ['wsgi.file_wrapper'](something,blksize)
>  >
>  > The 'file_wrapper' may introspect "something" in order to do a
>  > fileno() check, or other "I know how to send this kind of object
>  > quickly" optimizations.  It must return an iterable, that the
>  > application may return back to the server.
>
> [tony@lownds.com]
>  > Here's the tail end of the CGI example.
>  >
>  >     result = application(environ, start_response)
>  >     try:
>  >         if hasattr(result, 'read'):
>  >             result = iter(lambda: result.read(BLOCKSIZE), '')
>  >         for data in result:
>  >             write(data)
>  >     finally:
>  >         if hasattr(result,'close'):
>  >             result.close()
>
> Since I am just about to implement "wsgi.file_wrapper", I just wanted to
> check that my understanding of it is correct.
>
> I think Tony's example above is not correct: the hasattr(result, 'read')
> should not be necessary, since the 'file_wrapper' class should implement
> its own iterator?

My change is not correct, wrt using a file_wrapper. I was showing the change
needed for WSGI server to simply use a file-like object. Sorry for any
confusion.

Which do you think is better? That servers should understand file objects
as return values, or that applications should be careful to wrap files?

-Tony

From py-web-sig at xhaus.com  Tue Sep  7 23:35:05 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Tue Sep  7 23:30:04 2004
Subject: [Web-SIG] Re: Bill's comments on WSGI draft 1.4
In-Reply-To: <64786.204.162.121.54.1094588210.squirrel@*>
References: <5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com><5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com><5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com><5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<413E081B.70609@xhaus.com>
	<64786.204.162.121.54.1094588210.squirrel@*>
Message-ID: <413E2989.3030100@xhaus.com>

[tony@lownds.com]
> Which do you think is better? That servers should understand file objects
> as return values, or that applications should be careful to wrap files?

I really like the wsgi.file_wrapper solution, because it is neither of 
the above. I see it as the server telling the application how files 
should be wrapped, but in a platform independent way.

I think that Phillip's posted definition of the FileWrapper class should 
be included in the spec, as an example of what is expected. Many server 
authors can just drop that standard FileWrapper definition into their 
code, and all will be well.

Although the definition of the file_wrapper may need to vary between 
servers, the overhead is not large. And any server author who really 
needs to get fancy with file_wrapper's will probably have a very good 
idea of what they are doing anyway.

 From the efficiency point of view, it is important to note that the 
server is free to implement the FileWrapper class in whatever way it 
sees fit, e.g. ignoring the buffer size parameter, or supplying it's own 
optimal default value for the parameter, etc, etc.

Phillip, am I off-base by requesting that there be a 'pathname' 
attribute on file_wrapper instances? Fair enough if the file_wrapper 
gets hidden by some component of the middleware stack: in that case the 
pathname loses its meaning anyway because the component has obviously 
transformed the content of the file in some way. In cases where the 
file_wrapper does not wrap an OS file, e.g. sockets, pipes, etc, the 
pathname could be set/defaulted to None.

One use case for this is, for example, a page templating middleware 
component. While parsing the text of a page template (wrapped in a 
file_wrapper) passed down from higher up the stack, it could use the 
pathname as a starting point to resolve relative pathnames in the page 
template source, e.g. include files, etc. Though it could perhaps be 
argued that the higher-up component should be responsible for resolving 
such relative references, because it is the component which actually 
knows where the template file came from?

Regards,

Alan.
From pje at telecommunity.com  Wed Sep  8 00:55:08 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  8 00:54:24 2004
Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's
	comments on WSGI draft 1.4)
In-Reply-To: <413E081B.70609@xhaus.com>
References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>

At 08:12 PM 9/7/04 +0100, Alan Kennedy wrote:
>[Phillip J. Eby]
> > Instead of using 'fileno' as an extension attribute on the iterable,
> > we'll add a 'wsgi.file_wrapper' key, usable as follows by an
> > application:
> >
> >     return environ['wsgi.file_wrapper'](something,blksize)
> >
> > The 'file_wrapper' may introspect "something" in order to do a
> > fileno() check, or other "I know how to send this kind of object
> > quickly" optimizations.  It must return an iterable, that the
> > application may return back to the server.
>
>[tony@lownds.com]
> > Here's the tail end of the CGI example.
> >
> >     result = application(environ, start_response)
> >     try:
> >         if hasattr(result, 'read'):
> >             result = iter(lambda: result.read(BLOCKSIZE), '')
> >         for data in result:
> >             write(data)
> >     finally:
> >         if hasattr(result,'close'):
> >             result.close()
>
>Since I am just about to implement "wsgi.file_wrapper", I just wanted to 
>check that my understanding of it is correct.

Before you implement it, I should warn you that I'm thinking 'file_wrapper' 
was a bad idea, and that there's a better way to do all this.

As I understand them, the current use cases for file-like objects are:

  1. sendfile(fileno()) for fast file-descriptor copying (Unix-like OSes 
only, and only single-thread synchronous servers like Apache 1.x or CGI)

  2. Convenience in returning an open file or pipe

  3. Convenience in returning a StringIO or other "file-like" object

By the way, as far as I know, none of these use cases are especially common 
in today's existing web frameworks.  Anyway, use cases 2 and 3 can be 
grouped into cases where the object is "large", "small", or "pipe-like":

     "Small" case:

        return [filelike.read()]

     "Large" case:

        return iter(lambda: filelike.read(SIZE), '')

     "Pipe-like" case:

        return iter(filelike.read, '')

These are all very simple, one-line solutions (at least for 2.2+) and have 
the advantage of being explicit, and refusing the temptation to guess.  The 
application is in total control of how the resource will be transmitted.

That leaves only use case 1, which is a fairly limited use case and isn't 
even applicable to most web servers written in Python, as most such servers 
are asynchronous and can't take advantage of the 'sendfile()' system call 
(which Python doesn't expose as an 'os' facility anyway).

Therefore, my current thinking is to relegate use case 1 to a WSGI 
extension, 'wsgi.fd_wrapper', which can used like this (if the application 
is returning an object with a working 'fileno()' method):

     if 'wsgi.fd_wrapper' in environ:
         return environ['wsgi.sendfile'](fd.fileno())
     else:
         # return a normal iterable

In other words, 'wsgi.fd_wrapper' would be sort of like my earlier 
'wsgi.file_wrapper', but it would be *optional* to implement and 
use.  (Meaning it can be relegated to an application note, instead of 
having to be introduced in-line.)

For Alan's attempt to support Jython 2.1, he could write an 'iter' function 
or class and put it in __builtin__, so that programs written to this idiom 
would still work.

After thinking about the 'file_wrapper' idea some more, I'm thinking that 
this way works better for everything but the issue of closing 
files.  However, my example 'file_wrapper' class should maybe be included 
in the PEP under an application note about sending files and file-like objects.

From jjl at pobox.com  Wed Sep  8 11:09:46 2004
From: jjl at pobox.com (John J Lee)
Date: Wed Sep  8 11:06:12 2004
Subject: [Web-SIG] Making HEAD request using urllib2 module
In-Reply-To: <1094561400.4811.9.camel@laptop.venix.com>
References: <1094486100.3107.26.camel@laptop.venix.com> 
	<Pine.LNX.4.58.0409062235070.443@alice>
	<1094561400.4811.9.camel@laptop.venix.com>
Message-ID: <Pine.WNT.4.58.0409071812020.1704@vernon>

On Tue, 7 Sep 2004, Lloyd Kvam wrote:
> On Mon, 2004-09-06 at 17:44, John J Lee wrote:
[...]
> > I don't like the idea of a subclass just for HEAD requests (but then I
> > don't much like the Request class at all).  How about an additional
> > optional arg to the Request constructor, named 'method', instead?
>
> I started down the subclass path on the assumption that overriding
> get_method was all that was necessary and I could avoid changing the
> urllib2 module.
[...]

Right.  So, are you going to upload a modified version, then? :-)


John
From py-web-sig at xhaus.com  Wed Sep  8 13:56:54 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  8 13:51:53 2004
Subject: [Web-SIG] Modjy status.
Message-ID: <413EF386.3060407@xhaus.com>

Dear Sig,

I just wanted to quickly let ye know the current status of modjy: my 
j2ee implementation of a WSGI server.

Since I reduced the amount of java used to a minimum, and rewrote most 
of the code in jython, things have gone much quicker. Basically, about

95% of the code
75% of the documentation
50% of the testing

is now complete.

On the code front, I have to add one or two more extra features, but 
modjy already does pretty much all that it needs to.

On the testing side, I'm currently only focussed on testing the WSGI 
compliance: I've not put much effort into testing the "server" as a 
whole, because it is likely to change shape significantly over time. 
However, I will endeavour to test modjy as much as possible.

Lastly: documentation. I had originally said in this forum that I would 
publish modjy last weekend, no matter what state it was in. But I 
couldn't bring myself to publish it without some decent documentation to 
support it: users would find it hard to work with, and just be confused 
and disappointed. I've written most of configuration, installation, etc, 
documentation. I still have to work on documenting the WSGI compliance, 
and a few other bits and pieces.

For the next few days, other work has to take higher priority than 
modjy. But I will get back to it at the weekend. Presuming that I can 
get all of the above finished on Sunday, I'll hopefully be releasing it 
on Sunday evening.

Just wanted to keep y'all informed.

Kind regards,

Alan.
From exarkun at divmod.com  Wed Sep  8 15:28:29 2004
From: exarkun at divmod.com (Jp Calderone)
Date: Wed Sep  8 15:28:33 2004
Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's	comments
	on WSGI draft 1.4)
In-Reply-To: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
Message-ID: <413F08FD.2090805@divmod.com>

Phillip J. Eby wrote:
> [snip]
> 
> Before you implement it, I should warn you that I'm thinking 
> 'file_wrapper' was a bad idea, and that there's a better way to do all 
> this.
> 
> As I understand them, the current use cases for file-like objects are:
> 
>  1. sendfile(fileno()) for fast file-descriptor copying (Unix-like OSes 
> only, and only single-thread synchronous servers like Apache 1.x or CGI)

   FWIW, there's a non-zero probability Twisted will support this at 
some point in the future.  A (horrible, hackish) proof of concept 
already exists.

   Jp

From py-web-sig at xhaus.com  Wed Sep  8 16:25:12 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  8 16:21:17 2004
Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's comments
	on WSGI draft 1.4)
In-Reply-To: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
Message-ID: <413F1648.3040303@xhaus.com>

[Phillip J. Eby]
 >>> Instead of using 'fileno' as an extension attribute on the iterable,
 >>> we'll add a 'wsgi.file_wrapper' key, usable as follows by an
 >>> application:
 >>>
 >>>     return environ['wsgi.file_wrapper'](something,blksize)
 >>>
 >>> The 'file_wrapper' may introspect "something" in order to do a
 >>> fileno() check, or other "I know how to send this kind of object
 >>> quickly" optimizations.  It must return an iterable, that the
 >>> application may return back to the server.

and

 > [...] I should warn you that I'm thinking
 > 'file_wrapper' was a bad idea, and that there's a better way to do all
 > this.
 >
 > As I understand them, the current use cases for file-like objects are:
 >
 >  1. sendfile(fileno()) for fast file-descriptor copying (Unix-like
 >  OSes only, and only single-thread synchronous servers like Apache 1.x
 >  or CGI)


Well, I see sendfile functionality as being much more than widespread 
than that. Java.nio, for example, has excellent support for fast 
"channel transfers" between file channels and other writable channel types.

http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/FileChannel.html

This support goes right down to the level of allocating "direct 
buffers", which use DMA to bypass the CPU when transferring the 
bytestream to/from the destination channel. On OSes where such DMA 
facilities are not supported, the exact same code still works, but just 
isn't as fast.

For an excellent discussion of how these facilities work in java.nio, 
and more importantly why they work and are high performance, I recommend 
Ron Hitchens comprehensive book "Java NIO"

http://www.oreilly.com/catalog/javanio/

And I'd be surprised if the .Net CLR doesn't soon develop such 
functionality, if it isn't already supported.

 > [other cases snipped]
 >
 > These are all very simple, one-line solutions (at least for 2.2+) and
 > have the advantage of being explicit, and refusing the temptation to
 > guess.  The application is in total control of how the resource will
 > be transmitted.

Well, I suppose the key question here is "should the application be in 
total control of how the resource is transmitted"? Can we rely on all 
WSGI applications behaving correctly across all server platforms? Should 
the server not have some say in how the resource can be optimally 
tansmitted, in its environment?

 > That leaves only use case 1, which is a fairly limited use case and
 > isn't even applicable to most web servers written in Python, as most
 > such servers are asynchronous and can't take advantage of the
 > 'sendfile()' system call (which Python doesn't expose as an 'os'
 > facility anyway).

A pity that cpython doesn't implement sendfile as an native C method 
that is layered on top of a native OS implementation if available, or a 
generic C implementation if not. The current lack of the call means that 
people tend to implement their own sendfile in pure python, meaning that 
they end up acquiring and releasing the GIL between every chunk sent.

Also, I don't think we should restrict ourselves to thinking solely in 
terms of single-threaded asynchronous architectures. When I think about 
asynchronous, high-performance and high-throughput server architectures, 
I tend to think in terms of hybrid asynchronous/threaded architectures, 
of the type described by Welsh et al. in the excellent and readable 
14-page overview paper "A Design Framework for Highly Concurrent 
Systems" (highly recommended reading, for those who might be interested)

http://www.eecs.harvard.edu/~mdw/papers/events.pdf

More details on Welsh's work can be obtained from his publications page.

http://www.eecs.harvard.edu/~mdw/pubs.html

Welsh describes the use of thread-pools of a fixed "width" to service 
particular request types, with requests shunted between those (otherwise 
isolated) thread pools using queues. For example, if the server hardware 
is capable of processing 50 disk requests simultaneously, then the 
"width" of the thread pool serving resources from disk should be 50: any 
more is a waste, any less will underperform the theoretical maximum.

It is important to note that those 50 threads would be threads which 
continually block while waiting for disk read completions. When the disk 
I/O has completed, they could either "sendfile" the data back to the 
client, or more likely pass it onto a dedicated thread-pool that does 
nothing but transfer disk byte streams to client sockets. Meaning that 
that they need some way to record/represent the fact that the bytestream 
is coming from a file.

This file->socket transfer could also conceivably be done by a single 
thread, which continually watches the readiness status of large sets of 
both socket and file channels/descriptors, and transferring blocks 
between them as appropriate. And "blocks" is the key word here. Data 
comes from disks in fixed size chunks, the size of which are optimised 
for maximum throughput at all levels of the OS. Many modern operating 
systems come with specialised high-performance support for transferring 
data from one channel/descriptor to another. Such support can radically 
increase throughput on a server.

So I suppose my real concern is that by relegating disk-originating byte 
streams to being second-class citizens under WSGI, we might hinder the 
portability of some highly-desirable server architectural approaches.

 > Therefore, my current thinking is to relegate use case 1 to a WSGI
 > extension, 'wsgi.fd_wrapper', which can used like this (if the
 > application is returning an object with a working 'fileno()' method):
 >
 >     if 'wsgi.fd_wrapper' in environ:
 >         return environ['wsgi.sendfile'](fd.fileno())
 >     else:
 >         # return a normal iterable
 >
 > In other words, 'wsgi.fd_wrapper' would be sort of like my earlier
 > 'wsgi.file_wrapper', but it would be *optional* to implement and use.
 > (Meaning it can be relegated to an application note, instead of having
 > to be introduced in-line.)

Well, I suppose that that makes sense too. After all, all of this talk 
of "highly-concurrent" architectures doesn't really apply to Apache + 
CGI/WSGI, for example.

 > For Alan's attempt to support Jython 2.1, he could write an 'iter'
 > function or class and put it in __builtin__, so that programs written
 > to this idiom would still work.
 >
 > After thinking about the 'file_wrapper' idea some more, I'm thinking
 > that this way works better for everything but the issue of closing
 > files.  However, my example 'file_wrapper' class should maybe be
 > included in the PEP under an application note about sending files and
 > file-like objects.

Perhaps a "finalise" method might be appropriate?

Just thinking through some scenarios here:

What happens if the server is just about to start serving a 
multi-megabyte PDF file back to a client socket, and then the client 
closes the socket, i.e. the user cancelled their request. What should 
the server do in that case? Should it continue to iterate through the 
iterable right until the end, discarding the results? Or should it just 
drop the iterable on the floor, to be sorted out by GC (and thus 
potentially wasting file-descriptors)? Or should it attempt to finalise 
the iterable, so that all related resource is freed?

Does these considerations also apply when the bytestream being 
transferred is not "physical", i.e. coming from a 
file-descriptor/channel. What if the bytestream is coming from an 
iterable yielding several megabytes of python strings, from a page 
rendering component, for example. How does the server tell the 
application to stop, because the client is no longer interested? Does it 
simply drop the iterable on the floor and forget about it?

Might the application have a need to know that the client aborted the 
request, for example in E-commerce scenarios? If the application did 
need to know, how could the server inform the application?

Kind regards,

Alan.
From neel at mediapulse.com  Wed Sep  8 17:12:20 2004
From: neel at mediapulse.com (Michael C. Neel)
Date: Wed Sep  8 17:24:21 2004
Subject: Matt Welsh WAS: Re: [Web-SIG] Use cases for file-like objects (was
	Re: Bill's comments on WSGI draft 1.4)
In-Reply-To: <413F1648.3040303@xhaus.com>
References: <5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
	<413F1648.3040303@xhaus.com>
Message-ID: <1094656340.29095.5.camel@mike.mediapulse.com>


> Also, I don't think we should restrict ourselves to thinking solely in 
> terms of single-threaded asynchronous architectures. When I think about 
> asynchronous, high-performance and high-throughput server architectures, 
> I tend to think in terms of hybrid asynchronous/threaded architectures, 
> of the type described by Welsh et al. in the excellent and readable 
> 14-page overview paper "A Design Framework for Highly Concurrent 
> Systems" (highly recommended reading, for those who might be interested)
> 
> http://www.eecs.harvard.edu/~mdw/papers/events.pdf
> 
> More details on Welsh's work can be obtained from his publications page.
> 
> http://www.eecs.harvard.edu/~mdw/pubs.html

I just grabbed this and I've only started it, but this looks to be a
very interesting read, thank you for the reference.  I also agree on
it's relevence to WSGI, and encourage otheres on this list to take a
moment and read it as well.

Mike

From pje at telecommunity.com  Wed Sep  8 17:55:20 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep  8 17:54:38 2004
Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's
	comments on WSGI draft 1.4)
In-Reply-To: <413F1648.3040303@xhaus.com>
References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com>

At 03:25 PM 9/8/04 +0100, Alan Kennedy wrote:
>Well, I see sendfile functionality as being much more than widespread than 
>that. Java.nio, for example, has excellent support for fast "channel 
>transfers" between file channels and other writable channel types.

Well, I see a few options here, then.  We can use 'wsgi.file_wrapper' to 
wrap Python 'file' objects, allowing each platform to dig into the file 
object and get at the file descriptor, nio, or what-have-you in a platform 
specific way.  As long as it remains an optional extension, I'm fine with that.

Another option is to have separate 'wsgi.nio_wrapper', 'wsgi.fd_wrapper', 
and so on, for different physical backend types.


> > [other cases snipped]
> >
> > These are all very simple, one-line solutions (at least for 2.2+) and
> > have the advantage of being explicit, and refusing the temptation to
> > guess.  The application is in total control of how the resource will
> > be transmitted.
>
>Well, I suppose the key question here is "should the application be in 
>total control of how the resource is transmitted"?

Yes, because of the need for backward compatibility.  I realize that most 
people discussing WSGI here on the Web-SIG seem more interested in new 
applications than old, but backward compatibility is critical and that 
means apps must have control that's comparable to what they have today.


>Welsh describes the use of thread-pools of a fixed "width" to service 
>particular request types, with requests shunted between those (otherwise 
>isolated) thread pools using queues.

The description you use here sounds exactly like typical Python async 
servers today: they have fixed-size threadpools for running "application" 
code, and another fixed size thread pool (width=1) for I/O.


>So I suppose my real concern is that by relegating disk-originating byte 
>streams to being second-class citizens under WSGI, we might hinder the 
>portability of some highly-desirable server architectural approaches.

We're not; we're simply requiring that any functionality more sophisticated 
than an iterable be treated as an optional extension, that the application 
has to check for and opt to use.  The application developer is motivated to 
do this because of the promise of extra performance when run on platforms 
that support the boost.  But middleware developers don't have to think 
about it because they always have access to the data in iterable form.


> > After thinking about the 'file_wrapper' idea some more, I'm thinking
> > that this way works better for everything but the issue of closing
> > files.  However, my example 'file_wrapper' class should maybe be
> > included in the PEP under an application note about sending files and
> > file-like objects.
>
>Perhaps a "finalise" method might be appropriate?
>
>Just thinking through some scenarios here:
>
>What happens if the server is just about to start serving a multi-megabyte 
>PDF file back to a client socket, and then the client closes the socket, 
>i.e. the user cancelled their request. What should the server do in that 
>case? Should it continue to iterate through the iterable right until the 
>end, discarding the results? Or should it just drop the iterable on the 
>floor, to be sorted out by GC (and thus potentially wasting 
>file-descriptors)? Or should it attempt to finalise the iterable, so that 
>all related resource is freed?

The current spec requires that the iterable's 'close()' method be called at 
the termination of the request, whether the iterator was exhausted or 
not.  So, the server is free to cancel iteration when a client connection 
is lost.


>Does these considerations also apply when the bytestream being transferred 
>is not "physical", i.e. coming from a file-descriptor/channel. What if the 
>bytestream is coming from an iterable yielding several megabytes of python 
>strings, from a page rendering component, for example. How does the server 
>tell the application to stop, because the client is no longer interested? 
>Does it simply drop the iterable on the floor and forget about it?
>
>Might the application have a need to know that the client aborted the 
>request, for example in E-commerce scenarios? If the application did need 
>to know, how could the server inform the application?

By calling 'close()' on the iterable, as the spec requires.  Until PEP 325 
is implemented, though, generators have to be wrapped in a custom iterable 
in order to support this functionality, e.g.:

     class MyApp:

         def __init__(self,environ,start_response):
             # setup code here

         def __iter__(self):
             # generator yielding results


         def close(self):
             # cleanup code here

There are of course other ways to do the same basic thing, such as my 
file_wrapper example class.  But, once PEP 325 is implemented, you'll be 
able to use try/finally in the generator body, and the finally block will 
be executed when close() is called or the generator is garbage 
collected.  (PEP 325 was written by Samuele Pedroni, so I assume he intends 
to implement it in Jython, too.)

From py-web-sig at xhaus.com  Wed Sep  8 18:29:59 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep  8 18:24:55 2004
Subject: [Web-SIG] Asynchronous architectures, abstract and concrete.
In-Reply-To: <5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com>
References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
	<5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com>
Message-ID: <413F3387.9040908@xhaus.com>

[Phillip J. Eby]
 > [lots of excellent stuff snipped]

Thanks for the great explanations Phillip, and I agree with your 
positions on these issues.

There is just one area that I wanted to address.

[Alan Kennedy]
>> Welsh describes the use of thread-pools of a fixed "width" to service 
>> particular request types, with requests shunted between those 
>> (otherwise isolated) thread pools using queues.

[Phillip J. Eby]
> The description you use here sounds exactly like typical Python async 
> servers today: they have fixed-size threadpools for running 
> "application" code, and another fixed size thread pool (width=1) for I/O.

In reply

1. Welsh's architecture is much more abstract and high level, in that it 
discusses clustering, multiply redundant hardware pools, failover, 
isolation, load-balancing, etc, and no specific implementation technology.

2. The existing cpython frameworks are all still limited by the cpython 
GIL. Which gives all the more reason for pushing as much as possible 
down closer to the operating system, and outside of pure python.

3. Welsh's architecture discusses isolation of multiple IO subsystems 
into different thread groups. For example, there could be a thread group 
holding a pool of (blocking) database connections, which would be the 
appropriate "width" to process as many requests as can be concurrently 
supported by the RDBMS. Since there are blocking sockets/pipes/fifos 
between the application and the database, such database operations also 
count as a form of IO, which has to be managed. It could potentially be 
managed in an asynchronous fashion. Do any of the cpython frameworks 
support an asynchronous database API?

Just some thoughts.

I really think Welsh's paper is worth a read. In fact, it's been 6 
months since I read it: I'm going to read it now again, in light of my 
newly gained WSGI knowledge. Should only take 30 to 40 mins to read it 
again.

Regards,

Alan.
From exarkun at divmod.com  Wed Sep  8 19:39:18 2004
From: exarkun at divmod.com (Jp Calderone)
Date: Wed Sep  8 19:39:23 2004
Subject: [Web-SIG] Asynchronous architectures, abstract and concrete.
In-Reply-To: <413F3387.9040908@xhaus.com>
References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>	<5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com>
	<413F3387.9040908@xhaus.com>
Message-ID: <413F43C6.3080900@divmod.com>

Alan Kennedy wrote:
> [snip]
> 
> 3. Welsh's architecture discusses isolation of multiple IO subsystems 
> into different thread groups. For example, there could be a thread group 
> holding a pool of (blocking) database connections, which would be the 
> appropriate "width" to process as many requests as can be concurrently 
> supported by the RDBMS. Since there are blocking sockets/pipes/fifos 
> between the application and the database, such database operations also 
> count as a form of IO, which has to be managed. It could potentially be 
> managed in an asynchronous fashion. Do any of the cpython frameworks 
> support an asynchronous database API?

   Yes, http://twistedmatrix.com/documents/current/howto/enterprise

   Jp
From janssen at parc.com  Thu Sep  9 01:40:48 2004
From: janssen at parc.com (Bill Janssen)
Date: Thu Sep  9 01:41:20 2004
Subject: [Web-SIG] Use cases for file-like objects (was Re: Bill's
	comments on WSGI draft 1.4) 
In-Reply-To: Your message of "Wed, 08 Sep 2004 07:25:12 PDT."
	<413F1648.3040303@xhaus.com> 
Message-ID: <04Sep8.164056pdt."58612"@synergy1.parc.xerox.com>

> A pity that cpython doesn't implement sendfile as an native C method 
> that is layered on top of a native OS implementation if available, or a 
> generic C implementation if not. The current lack of the call means that 
> people tend to implement their own sendfile in pure python, meaning that 
> they end up acquiring and releasing the GIL between every chunk sent.

Is this something that should be added to the standard library
(probably as part of the socket module)?

Bill
From pje at telecommunity.com  Thu Sep  9 02:43:09 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  9 02:42:37 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
Message-ID: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>

* HTTP_AUTHENTICATION -- I haven't seen a concrete proposal for this yet, 
and I don't personally consider it a high priority.  If something is to go 
in for this, somebody needs to put together a proposal, preferably in the 
form of a patch to the PEP.

* Byte strings: so far the only discussion here has centered on character 
sets required by HTTP RFCs.  I'm going to loosen up the ASCII status/header 
requirement slightly, to indicate that ISO-8859-1 is acceptable encoding, 
per RFC 2616.  Any other comments regarding byte string issues?

* Error handling -- I'm assuming the SIG consensus is +1 on the 
'wsgi.fatal_errors' key, but haven't seen any feedback on my ideas for 
'start_response', except that I seem to recall someone saying they didn't 
want the body passed to start_response.  Taking that part out, we end up 
with something like this:

'start_response()' doesn't actually transmit the status or headers until 
the first write() call occurs or the first string is yielded from the 
returned iterable.  'start_response' simply stores the status or headers 
for future use, and may therefore be called more than once.  However, 
calling 'start_response()' *after* a write(), or after the first string is 
yielded, is a fatal error.  Top-level servers/gateways should log detailed 
information about errors that occur after a partial result is 
transmitted.  They may also attempt to send error information to the client 
if the content type is text (e.g. text/html, text/xml, text/plain).

Feedback, anyone?

* File-like objects -- I think anything we offer for file-like objects 
should be optional.  The big question is whether to offer a single, 
introspection-based extension for all file-like things, or whether to use 
separate extensions for different sorts of things, like 'wsgi.fd_wrapper' 
for file descriptors and 'wsgi.nio_wrapper' for Java NIO objects, 
etc.  Does anybody have any arguments/use cases one way or the other?

* Configuration -- I'm going to mention that servers *should* provide an 
easy way to configure name-value pairs to be supplied to an application's 
'environ', and that one way to do that is simply to include OS environment 
variables in 'environ'.

Am I missing anything else that's been discussed recently?  (E.g. just 
before I went into hiding from the hurricane...)

From py-web-sig at xhaus.com  Thu Sep  9 13:20:51 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep  9 13:15:46 2004
Subject: [Web-SIG] CPU cache locality.
In-Reply-To: <413F43C6.3080900@divmod.com>
References: <5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>	<5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com>	<413F3387.9040908@xhaus.com>
	<413F43C6.3080900@divmod.com>
Message-ID: <41403C93.8080509@xhaus.com>

[Alan Kennedy]
>> 3. Welsh's architecture discusses isolation of multiple IO subsystems 
>> into different thread groups. For example, there could be a thread 
>> group holding a pool of (blocking) database connections, which would 
>> be the appropriate "width" to process as many requests as can be 
>> concurrently supported by the RDBMS. Since there are blocking 
>> sockets/pipes/fifos between the application and the database, such 
>> database operations also count as a form of IO, which has to be 
>> managed. It could potentially be managed in an asynchronous fashion. 
>> Do any of the cpython frameworks support an asynchronous database API?

[Jp Calderone]
>   Yes, http://twistedmatrix.com/documents/current/howto/enterprise

Thanks for the reply Jp.

I've been thinking further about multi-threading, CPU cache locality and 
iterators. While I was thinking about it in relation to twisted 
enterprise at first, it's really an issue that applies to WSGI as well.

But let's take twisted enterprise as an example. I'm not intimately 
familiar with Twisted, so please forgive me if I get something wrong.

So twisted has a pool of threads which carries out synchronous database 
operations on behalf of clients, but in an asynchronous manner from the 
clients perspective. This is done by receiving the "database requests" 
from a queue, processing each synchronously using blocking DB-API calls, 
and then returning the result to the client asynchronously, either using 
a callback function or sending the results back on a queue. Is this how 
twisted "deferred"s work?

So, for the sake of argument, let's say that a similar structure is in 
place in a WSGI framework. Further, let's say that database "results", 
i.e. strings, ints, blobs, etc, from database columns will be yielded as 
iterable data by some middleware component. These values will be 
processed further down the middleware stack by some other component, 
which, for example, is generating HTML pages containing the data.

Let's assume that there is a single I/O thread which is responsible for 
communicating final results back to the user, i.e. through the client 
socket. Due to the on-demand nature of the iterator which middleware 
uses to return values, it is possible that the I/O thread could end up 
executing database code. For example, say that the database data is 
accessed through a python descriptor, meaning that accessing the data 
may cause execution of python code in whatever python object retrieved 
the data from database

Which will be detrimental to CPU cache locality.

Because the I/O thread will potentially execute code from every 
component in the middleware stack, its thread of execution could meander 
all over several megabytes of python bytecode. Which is pretty much 
guaranteed to eliminate any benefit that may be provided by CPU caches. 
In the worst case, this could cause significant cache "thrashing", as 
lots of different pieces of bytecode clash and "fight" for space in the 
CPU cache.

Welsh[1] states the problem like this: "In a thread-per-task system, the 
instruction cache tends to take many misses as the thread's control 
passes through many unrelated code modules to process the task. In 
addition, whenever a context switch occurs (due to thread preemption or 
  blocking I/O call, say), other threads will invariably push the waiting
thread's state out of the cache. When the original thread resumes 
execution, it will need to take many cache misses in order to bring its 
  code and state back into the cache. In this situation, all of the 
threads in the system are competing for limited cache space."

The solution to this problem is for middleware components to only return 
references to passive data, and never to return iterators that cause the 
execution of python code.

I notice that Phillip has include a statement in PEP-0333 which states 
in the section under "Buffering and Streaming":

"""
Generally speaking, applications will achieve the best throughput by 
buffering their (modestly-sized) output and sending it all at once. When 
this is the case, applications should simply return a single-element 
iterable containing their entire output as a single string.

[snip]

For large files, however, or for specialized uses of HTTP streaming 
(such as multipart "server push"), an application may need to provide 
output in smaller blocks (e.g. to avoid loading a large file into 
memory). It's also sometimes the case that part of a response may be 
time-consuming to produce, but it would be useful to send ahead the 
portion of the response that precedes it.
"""

Phillip, when you wrote about "performance" here, did you have CPU 
cache's in mind?

Regards,

Alan.

1. A Design Framework for Highly Concurrent Systems
http://www.eecs.harvard.edu/~mdw/papers/events.pdf
From pje at telecommunity.com  Thu Sep  9 15:24:06 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  9 15:23:43 2004
Subject: [Web-SIG] CPU cache locality.
In-Reply-To: <41403C93.8080509@xhaus.com>
References: <413F43C6.3080900@divmod.com>
	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040901224135.0210b620@mail.telecommunity.com>
	<5.1.1.6.0.20040906091122.02e78400@mail.telecommunity.com>
	<5.1.1.6.0.20040906140852.02af3c00@mail.telecommunity.com>
	<5.1.1.6.0.20040906194918.034400e0@mail.telecommunity.com>
	<5.1.1.6.0.20040907151219.0287c070@mail.telecommunity.com>
	<5.1.1.6.0.20040908113837.03058c20@mail.telecommunity.com>
	<413F3387.9040908@xhaus.com> <413F43C6.3080900@divmod.com>
Message-ID: <5.1.1.6.0.20040909091926.020cb020@mail.telecommunity.com>

At 12:20 PM 9/9/04 +0100, Alan Kennedy wrote:
>I notice that Phillip has include a statement in PEP-0333 which states in 
>the section under "Buffering and Streaming":
>
>"""
>Generally speaking, applications will achieve the best throughput by 
>buffering their (modestly-sized) output and sending it all at once. When 
>this is the case, applications should simply return a single-element 
>iterable containing their entire output as a single string.
>
>[snip]
>
>For large files, however, or for specialized uses of HTTP streaming (such 
>as multipart "server push"), an application may need to provide output in 
>smaller blocks (e.g. to avoid loading a large file into memory). It's also 
>sometimes the case that part of a response may be time-consuming to 
>produce, but it would be useful to send ahead the portion of the response 
>that precedes it.
>"""
>
>Phillip, when you wrote about "performance" here, did you have CPU cache's 
>in mind?

Actually, the word "performance" doesn't appear anywhere in the above; I 
referred only to "throughput".  Performance can affect throughput, but not 
really the other way around.

The reason that returning a single-element iterable improves throughput in 
async architectures like Twisted and ZServer is that they use a thread pool 
for application code.   If the application object returns an iterable 
containing the whole response body, then the application thread is now free 
to run a new application instance.  This allows greater "throughput" at the 
application level, because more requests can be run in a given period of 
time than if an application thread had to continue to be used.

From py-web-sig at xhaus.com  Thu Sep  9 18:01:51 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep  9 17:57:28 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
Message-ID: <41407E6F.4050809@xhaus.com>

[Phillip J. Eby]
 > * File-like objects -- I think anything we offer for file-like objects
 > should be optional.  The big question is whether to offer a single,
 > introspection-based extension for all file-like things, or whether to
 > use separate extensions for different sorts of things, like
 > 'wsgi.fd_wrapper' for file descriptors and 'wsgi.nio_wrapper' for Java
 > NIO objects, etc.  Does anybody have any arguments/use cases one way
 > or the other?

Optionality is fine by me.

But I don't understand what reasons there might be to have separate 
class names per platform?

It's always been my understanding that the intention for this capability 
is so that applications can give "hints", to servers that support 
high-performance methods of file transmission, that the resource being 
returned is a candidate for bulk transfer. So, as an application author, 
I'll surely want that hinting process to work on as many servers as 
possible, regardless of the platform.

So, if there is a choice of multiple such hinting processes, and I have 
to look for each one of them at runtime, my code is longer and less 
efficient than it could be, e.g.

def app_object(environ, start_response):
   start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] )
   result = open('humungous.pdf')
   for cname in ['fd','nio','dotnet','stackless','pypy','smalltalk']:
     try:
       return environ['wsgi.%s_wrapper' % cname](result):
     except KeyError:
       pass
   return result

Instead, if a single class is used, the definition of which is different 
per server, then I have only to look at that one class.

def app_object(environ, start_response):
     start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] )
     result = open('humungous.pdf')
     if environ.has_key('wsgi.file_wrapper'):
         return environ['wsgi.file_wrapper'](result)
     return result

One reason I can see for having multiple classes is if they really 
represent fundamentally different concepts.

For example, there are possibly more types of optimisations available, 
e.g. return a stream of bytes from a shared memory partition, if the 
platform supported DMA access to that shared memory, which would then be 
bulk-transferable, i.e. bypassing the CPU. Since shared memory is a 
concept whose implementation varies subtly between platforms, should we 
be trying to abstract that concept into one class with a single 
interface, whose implementation differs between platforms, or into 
separate classes, one for each platform?

What about an optimised transfer from an RDBMS, say a BLOB stored in a 
database row. Should that be wrapped with a file_wrapper (because it's 
really coming from a file descriptor?), or with a special 
db_blob_wrapper class? Would these db_blob_wrappers differ between 
different database platforms? Because it is quite possible that the 
RDBMS data is also coming through the network subsystem, this bulk 
transfer could potentially be arranged at the network level, conceivably 
on a sophisticated network-card/router/etc, and thus never even reach 
the bus on the serving machine. OK, that's a bit wild and unlikely :-), 
but I'm just trying to foresee as many scenarios for bulk transfers as I 
can, to see if the proposed WSGI model fits.

I suppose it's about recording enough meta-information for the server to 
recognise such optimisable scenarios. So the question has to be asked: 
how portable do we need these optimisations to be between servers. Is 
medusa likely to have its middleware component dedicated to sendfile, 
for example? And twisted have its own, thread-pool based, 
implementation, for example. In which case portability of, say the 
sendfile optimisation, becomes an issue of server configuration, not 
support classes.

Or might it be that we need to facilitate the application at two levels 
in the server? Take the example of shared memory :-

1. In the middleware stack, a component maps a certain URL space into 
the shared memory partition, and returns a specialised wrapper class 
that contains a shared memory reference, i.e. a handle, start/end/len, etc.

2. The application also needs to plug into the server, below the 
middleware stack, so that it can implement the actual bulk transfer from 
the shared memory (assuming that the shared_memory_wrapper wasn't 
obscured by some component below it in the stack). Since shared memory 
support, and probably DMA support, would vary between platform, this is 
where the platform specific element comes in: there would be different 
versions of that "server plug-in"  for different platforms/servers.

Lastly, I should also point out that, with the current jython I/O 
subsystem, the sendfile/transferTo optimisation is not currently 
possible, inside most existing J2EE containers anyway. This is because 
sockets created using the old java.net APIs, do not by default have 
nio.channels associated with them. Most existing J2EE containers, which 
must support blocking servlets by definition, don't bother to handle 
sockets using java.nio, because it's more work, not necessary, and not 
portable to older versions of the platform. So it's not possible to use 
the sockets they create for bulk transfers.

A container could be redesigned to use the java.nio APIs, completely in 
a blocking fashion, if desired. Which still wouldn't be any use in 
existing jython, because jython's current socket modules are entirely 
based on old java.net classes. Which means that jython code couldn't 
access the channel nature of the sockets, even if those sockets 
supported it, without modification of the standard library.

I have a (~60% complete) side-project to develop aysnchronous socket 
support on jython 2.1, by porting the socket, select and (maybe) 
asyncore modules to java.nio. When that is complete (timescale==months, 
v busy), I hope to see experimentation, from myself and others, on 
running python asynchronous models on jython.

Here is what the jython file_wrapper code might look like.

class jython_file_wrapper:

     def __init__(self, wrapped):
         self.wrapped = wrapped

     def sendfile(self, jynio_socket):
         if hasattr(self.wrapped, 'getChannel') :
             self.wrapped.getChannel().transferTo(jynio_socket)
         else:
             self.send_in_chunks_instead(jynio_socket)

Regards,

Alan.
From pje at telecommunity.com  Thu Sep  9 18:31:27 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  9 18:31:16 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <41407E6F.4050809@xhaus.com>
References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com>

At 05:01 PM 9/9/04 +0100, Alan Kennedy wrote:
>[Phillip J. Eby]
> > * File-like objects -- I think anything we offer for file-like objects
> > should be optional.  The big question is whether to offer a single,
> > introspection-based extension for all file-like things, or whether to
> > use separate extensions for different sorts of things, like
> > 'wsgi.fd_wrapper' for file descriptors and 'wsgi.nio_wrapper' for Java
> > NIO objects, etc.  Does anybody have any arguments/use cases one way
> > or the other?
>
>Optionality is fine by me.
>
>But I don't understand what reasons there might be to have separate class 
>names per platform?
>
>It's always been my understanding that the intention for this capability 
>is so that applications can give "hints", to servers that support 
>high-performance methods of file transmission, that the resource being 
>returned is a candidate for bulk transfer. So, as an application author, 
>I'll surely want that hinting process to work on as many servers as 
>possible, regardless of the platform.

You may want that, but it's going to be platform-dependent whether you can 
do that.  A trivial example: Java doesn't have file descriptors, so you're 
not going to be able to use 'sendfile()' in Java.  So, what's the point of 
having 'fd_wrapper' available there?

>So, if there is a choice of multiple such hinting processes, and I have to 
>look for each one of them at runtime, my code is longer and less efficient 
>than it could be, e.g.
>
>def app_object(environ, start_response):
>   start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] )
>   result = open('humungous.pdf')
>   for cname in ['fd','nio','dotnet','stackless','pypy','smalltalk']:
>     try:
>       return environ['wsgi.%s_wrapper' % cname](result):
>     except KeyError:
>       pass
>   return result
>
>Instead, if a single class is used, the definition of which is different 
>per server, then I have only to look at that one class.

An object that works with 'fd' isn't going to work with 'nio', or vice 
versa is it?  Or am I missing something about how nio works?

I suppose the alternative is to specify 'wsgi.file_wrapper' such that it's 
required to always return *something* usable, even if it can't figure out 
any way to optimize it.  Objects passed to 'file_wrapper' would have to 
have a 'read', optionally a 'close', and optionally 'fileno'.  (A Jython 
WSGI server would ignore fileno, of course.)

From tony at lownds.com  Thu Sep  9 20:09:16 2004
From: tony at lownds.com (tony@lownds.com)
Date: Thu Sep  9 20:30:25 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com>
References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com>
Message-ID: <60305.204.162.121.54.1094753356.squirrel@*>

[Phillip]
> I suppose the alternative is to specify 'wsgi.file_wrapper' such that it's
> required to always return *something* usable, even if it can't figure out
> any way to optimize it.  Objects passed to 'file_wrapper' would have to
> have a 'read', optionally a 'close', and optionally 'fileno'.  (A Jython
> WSGI server would ignore fileno, of course.)
>

I like this option. As long as the file_wrapper does not initiate any
actions until the server gets it, the results of file_wrapper can be
opaque to middleware. Other methods might be useful too, for instance,
tell() - if an application passes a file that has been seeked to a certain
point, thats where reading of data should start.

I'm assuming the new "combined" wsgi.file_wrapper key would be optional.
This puts a burden on applications that need to send back data from files,
because they'd need fallback logic if the wsgi.file_wrapper key isn't
present. But that seems better on the whole that putting the burden on
servers, all the time.

-Tony

From tony at lownds.com  Thu Sep  9 20:32:10 2004
From: tony at lownds.com (tony@lownds.com)
Date: Thu Sep  9 20:53:17 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
Message-ID: <60470.204.162.121.54.1094754730.squirrel@*>

> * Error handling -- I'm assuming the SIG consensus is +1 on the
> 'wsgi.fatal_errors' key, but haven't seen any feedback on my ideas for
> 'start_response', except that I seem to recall someone saying they didn't
> want the body passed to start_response.  Taking that part out, we end up
> with something like this:
>
> 'start_response()' doesn't actually transmit the status or headers until
> the first write() call occurs or the first string is yielded from the
> returned iterable.  'start_response' simply stores the status or headers
> for future use, and may therefore be called more than once.  However,
> calling 'start_response()' *after* a write(), or after the first string is
> yielded, is a fatal error.  Top-level servers/gateways should log detailed
> information about errors that occur after a partial result is
> transmitted.  They may also attempt to send error information to the
> client
> if the content type is text (e.g. text/html, text/xml, text/plain).
>
> Feedback, anyone?
>

I still like the idea of having an exception that servers will always
catch and send back to the user. If an application doesn't know whether a
server can display an error page, it will tend to include it's own
error-displaying logic (made simpler by the start_response() above). But,
if applications take care of displaying those exceptions, then exception
catching middleware won't really be useful for those applications.

As long as exceptions get logged, I think it is fine for there to be no
requirement about sending error data back to the client, after the
response is started.

Without some other way for applications to send errors, then the
additional requirements on start_response do make sense, even though it
complicates some pretty tricky logic.

How does wsgi.fatal_errors help servers? Wouldn't servers have to make up
specialized exceptions for inclusion in wsgi.fatal_errors, in order to
avoid interfering with catching other exceptions? Now write() and
start_response() need more logic, to throw only errors in
wsgi.fatal_errors. And servers can't rely on applications adhering to the
rules in the specs.

-Tony

From py-web-sig at xhaus.com  Thu Sep  9 21:05:06 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep  9 21:00:25 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com>
References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040909122054.02e63570@mail.telecommunity.com>
Message-ID: <4140A962.2050602@xhaus.com>

[Phillip J. Eby]
 > Java doesn't have file descriptors, so
 > you're not going to be able to use 'sendfile()' in Java.  So, what's
 > the point of having 'fd_wrapper' available there?

and

 > An object that works with 'fd' isn't going to work with 'nio', or vice
 > versa is it?  Or am I missing something about how nio works?
 >
 > I suppose the alternative is to specify 'wsgi.file_wrapper' such that
 > it's required to always return *something* usable, even if it can't
 > figure out any way to optimize it.  Objects passed to 'file_wrapper'
 > would have to have a 'read', optionally a 'close', and optionally
 > 'fileno'.  (A Jython WSGI server would ignore fileno, of course.)

Ah, I think I see where the confusion lies. Perhaps I should have taken 
more time to explain a certain issue earlier than this.

Jython files *may* have the local analogue of a file descriptor, i.e. a 
channel, but only when the jython code is running on a jvm that supports 
java.nio, which means 1.4 or greater.

I could define the fileno method of jython files like this

class file:

     def fileno(self):
         if hasattr(self.java_file, 'getChannel'):
              # java >= 1.4 behaviour
              return self.java_file.getChannel()
         else:
              # java < 1.4 behaviour
              raise UnimplementedException()

The current jython 2.1 library only raises the exception, because 
java.nio didn't exist when it was written.

Now, I could suggest a patch to the jython runtime to redefine fileno as 
above, but that's not a safe thing to do: existing python code that is 
expecting a cpython file descriptor will almost certainly break if it 
gets passed a java.nio.channels.FileChannel instead.

Unless the entire I/O subsystem has been rewritten, as I am doing for 
jynio sockets, which *do* have a useful fileno() method which each of 
the new modules knows how to use properly. The returned object confers 
identical semantics to cpython file descriptors, when passed to jynio 
socket modules. For example, when jynio is finished, this code will run 
identically on cpython and jython

s = socket.socket(AF_INET, SOCK_STREAM)
fd = s.fileno()
po = select.poll()
po.register(fd)

http://www.xhaus.com/alan/python/jynio/socket.html#socketvschannel

A very similar set of operations can be carried out on both file 
descriptors and channels: i.e. selectability/event notification, bulk 
transfers, etc.

http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/SelectableChannel.html
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/InterruptibleChannel.html
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/WritableByteChannel.html
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/GatheringByteChannel.html

So, when running on java 1.4+, I *can* get the local equivalent of a 
file descriptor, and do meaningful things with it, in terms of bulk 
transfer, etc.

But that doesn't work on older JVMs where only java.io is available: 
There is no other way (without examining private file object data, i.e. 
the java.io.FileInputStream encapsulated in the jython file) that I can 
determine if something is file-like, other than to do this

if type(app_object) is types.FileType:
     do_high_performance_file_stuff(app_object)

That's why I pushed for permitting return of file-likes from 
applications: because it's the only "safe" way to recognise files on 
pre-1.4 jvms. And it's also portable to all other python platforms.

It would still definitely be useful to recognise the optimisation on 
older VMs, because I could still have a fast native-java 
loop-while-sending-blocks implementation of "sendfile", which would be 
substantially faster than a jython one, because it would avoid the 
unnecessary transformation of the file data into jython data structures 
(i.e. binary strings) and then back again.

But your solution of a single server-provided file_wrapper class solves 
the problem nicely. Because the application has hinted that the 
application object is a file, I now have a simple way of checking, that 
works across all jvms. So I can now very simply provide the bulk 
transfer optimisation, and implement it differently, depending on the 
availability of the java.nio classes, e.g.

try:
     import java.nio

     class file_wrapper:

         def send_file(self, dest)
             use_nio_transfer_to(dest)

except ImportError:
     import java.io

     class file_wrapper:

         def send_file(self, dest)
             use_looping_sendfile(dest)

Also, the 'file_wrapper' solution alleviates the need for me look at 
private data inside jython file objects, to see if the underlying 
java.io.FileInputStream has a getChannel method. So it's definitely the 
cleanest solution.

As for the 'file_wrapper' class name across platforms, as you can see 
from the above, having different class names for each platform would not 
change the above considerations one bit: it would just make the 
application authors life more difficult.

I hope that makes the situation clearer!

Regards,

Alan.
From pje at telecommunity.com  Thu Sep  9 21:30:27 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  9 21:30:25 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <60470.204.162.121.54.1094754730.squirrel@*>
References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>

At 11:32 AM 9/9/04 -0700, tony@lownds.com wrote:

>I still like the idea of having an exception that servers will always
>catch and send back to the user.

Currently, isn't that *every* exception?  I'm making the assumption that 
the server will want to log and display every non-fatal error.  (Except 
those occurring after the headers are sent, which can only be logged in the 
general case.)


>If an application doesn't know whether a
>server can display an error page, it will tend to include it's own
>error-displaying logic (made simpler by the start_response() above). But,
>if applications take care of displaying those exceptions, then exception
>catching middleware won't really be useful for those applications.

This seems circular to me: if the application throws an error that's 
actually an application-defined error message, then why is middleware going 
to be *useful* here?

You must have some other use case in mind besides the middleware presenting 
a friendly message, since presumably the application can produce a 
friendlier message (at least in the sense of being specific to the app and 
looking like the app).  Could you elaborate on your use case?


>As long as exceptions get logged, I think it is fine for there to be no
>requirement about sending error data back to the client, after the
>response is started.
>
>Without some other way for applications to send errors, then the
>additional requirements on start_response do make sense, even though it
>complicates some pretty tricky logic.

I'm not sure it's *that* bad...

     headers_set = []
     headers_sent = []

     def write(data):

         if not headers_set:
              raise AssertionError("write() before start_response()")

         elif not headers_sent:
             status, headers = headers_sent[:] = headers_set
             write(status+'\r\n')
             for header in headers:
                 write('%s: %s\r\n' % header)
             write('\r\n')

         # actual write() code goes here...

     def start_response(status,headers):
         if headers_sent:
             raise AssertionError("Headers already sent!")
         headers_set[:] = [status,headers]
         return write

     # ...
     result = application(environ, start_response)
     try:
         try:
             for data in result:
                 write(data)
             if not headers_sent:
                 write('')   # force headers to be sent
         except:
             if not headers_sent:
                 # call start_response() with a 500 error
                 # status, then write out an error message

             # re-raise the error

     finally:

         # XXX ensure client connection is closed first

         if hasattr(result,'close'):
             result.close()


Of course, all of the above should be wrapped in a try-except that logs any 
errors and continues the server.


>How does wsgi.fatal_errors help servers? Wouldn't servers have to make up
>specialized exceptions for inclusion in wsgi.fatal_errors, in order to
>avoid interfering with catching other exceptions? Now write() and
>start_response() need more logic, to throw only errors in
>wsgi.fatal_errors.

Hm.  Well, the alternative would be that the server has to track state to 
know its state is hosed.  That is, if you try to write() when a client 
connection is lost, subsequent write() calls should fail.  Similarly, 
start_response() after write() should fail, but then so should subsequent 
write() calls.

It seemed to me that it was simpler to raise a fatal error in that case, 
which the application would allow to pass through.  But, if the server has 
to consider the possibility that the app might not be able to enforce this 
(e.g. because of bare 'except:' clauses), then I suppose we might as well 
just have the complexity of state checking and ignore the fatal errors issue.

OTOH, the purpose of fatal_errors is to allow the *app* to know that it's 
pointless to go on, and that it *should* abort.  This still seems somewhat 
useful to me, although it could also be argued that virtually *any* 
exception raised by start_response() and write() should be considered fatal.

Cascading errors are also a potential problem.  Let's say the application 
doesn't propagate a fatal error, but instead "converts" it to a different 
kind of error.  Now, the server must catch the application's error, while 
still knowing that it erred internally first.  Sigh.

This suggests to me that start_response() and write() must have exception 
handlers that set a flag when they have an uncaught exception, so that they 
know to ignore the application's later errors if the problem originated 
within the server.  Ugh.

I suppose the bright side is that we wouldn't need 'wsgi.fatal_errors' any 
more, but my "not so bad" code above now needs some additional error 
handling and an 'internal_errors' state variable.


>And servers can't rely on applications adhering to the
>rules in the specs.

I'm not sure what you mean here, but maybe it's what I just said 
above?  (about apps maybe being broken in their handling of fatal_errors).

From tony at lownds.com  Thu Sep  9 21:45:57 2004
From: tony at lownds.com (tony@lownds.com)
Date: Thu Sep  9 22:07:06 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
Message-ID: <61044.204.162.121.54.1094759157.squirrel@*>

> At 11:32 AM 9/9/04 -0700, tony@lownds.com wrote:
>
>>I still like the idea of having an exception that servers will always
>>catch and send back to the user.
>
> Currently, isn't that *every* exception?  I'm making the assumption that
> the server will want to log and display every non-fatal error.  (Except
> those occurring after the headers are sent, which can only be logged in
> the
> general case.)
>

No, I mean that the server will send back a document that was sent as part
of the exception, not a document derived from the exception and/or
traceback. It is a mechanism that applications can rely on to get an error
notice to the user.

>
>>If an application doesn't know whether a
>>server can display an error page, it will tend to include it's own
>>error-displaying logic (made simpler by the start_response() above). But,
>>if applications take care of displaying those exceptions, then exception
>>catching middleware won't really be useful for those applications.
>
> This seems circular to me: if the application throws an error that's
> actually an application-defined error message, then why is middleware
> going
> to be *useful* here?
>
> You must have some other use case in mind besides the middleware
> presenting
> a friendly message, since presumably the application can produce a
> friendlier message (at least in the sense of being specific to the app and
> looking like the app).  Could you elaborate on your use case?
>

Middleware can use the exception to provide side-effects, like notifying
developers, or displaying diagnostics to certain IPs.

Mainly the use case is that raising an exception with an HTML page is less
error prone for applications and middleware than invoking write from
within an except clause. The server can decide whether it will be able to
send out the error page, rather than the application or middleware having
to try and figure out if it can successfully start a response from
scratch.

> OTOH, the purpose of fatal_errors is to allow the *app* to know that it's
> pointless to go on, and that it *should* abort.  This still seems somewhat
> useful to me, although it could also be argued that virtually *any*
> exception raised by start_response() and write() should be considered
> fatal.
>

Yes, I would have thought so.

>>And servers can't rely on applications adhering to the
>>rules in the specs.
>
> I'm not sure what you mean here, but maybe it's what I just said
> above?  (about apps maybe being broken in their handling of fatal_errors).
>
>

Yep

-Tony

From pje at telecommunity.com  Thu Sep  9 22:31:56 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep  9 22:31:54 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <61044.204.162.121.54.1094759157.squirrel@*>
References: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>

At 12:45 PM 9/9/04 -0700, tony@lownds.com wrote:
> > At 11:32 AM 9/9/04 -0700, tony@lownds.com wrote:
> >
> >>I still like the idea of having an exception that servers will always
> >>catch and send back to the user.
> >
> > Currently, isn't that *every* exception?  I'm making the assumption that
> > the server will want to log and display every non-fatal error.  (Except
> > those occurring after the headers are sent, which can only be logged in
> > the
> > general case.)
> >
>
>No, I mean that the server will send back a document that was sent as part
>of the exception, not a document derived from the exception and/or
>traceback. It is a mechanism that applications can rely on to get an error
>notice to the user.

I'm still not seeing how this is different from the application simply 
catching the exception at its highest level, and doing:

       start_response("500 Error occurred", [('Content-type','text/plain')])
       return ["error body here"]


> > You must have some other use case in mind besides the middleware
> > presenting
> > a friendly message, since presumably the application can produce a
> > friendlier message (at least in the sense of being specific to the app and
> > looking like the app).  Could you elaborate on your use case?
> >
>
>Middleware can use the exception to provide side-effects, like notifying
>developers, or displaying diagnostics to certain IPs.

In that case, why not have the application simply not catch the error, and 
let middleware do it?

I'm still confused as to how having a special exception helps.


>Mainly the use case is that raising an exception with an HTML page is less
>error prone for applications and middleware than invoking write from
>within an except clause. The server can decide whether it will be able to
>send out the error page, rather than the application or middleware having
>to try and figure out if it can successfully start a response from
>scratch.

Ah.  ISTM that use case is effectively handled: use start_response()+return 
[body] as I described above.  If start_response fails, you're in basically 
the same position you'd have been if you were raising a special 
error.  (I.e., your error wasn't going to get reported anyway.)

Of course, it could be argued that the server in that case doesn't have 
anything of interest to log regarding the error.  But that could be handled 
by adding a 'body' argument to 'start_response' as I previously proposed.

Let me see if I understand your actual use case...  you want to be able to 
write an application that, although it handles its own errors, also gives 
users the option of placing error-handling middleware over it to change how 
its errors are rendered, logged, etc.  And, you want that mechanism to be 
based on Python exception information (type, value, traceback) rather than 
on HTTP information (status, headers, content).  Finally, you want this to 
be unconditionally available, rather than having to first check whether the 
exception handling middleware is installed.  Is this correct?

From gabriel.cooper at mediapulse.com  Thu Sep  9 22:41:15 2004
From: gabriel.cooper at mediapulse.com (Gabriel Cooper)
Date: Thu Sep  9 22:39:37 2004
Subject: [Web-SIG] [ANNOUNCE] SnakeSkin: Python Application Toolkit
In-Reply-To: <4140BCE4.8000101@mediapulse.com>
References: <1094756605.12825.25.camel@mike.mediapulse.com>
	<4140BB34.4000200@mediapulse.com> <4140BCE4.8000101@mediapulse.com>
Message-ID: <4140BFEB.8060702@mediapulse.com>


We are proud to announce the release of SnakeSkin, a python application 
toolkit released under an Open Source BSD-Style license, newly available 
at http://snakeskin-tools.sourceforge.net/

In SnakeSkin, developers can customize the framework to the application, 
unlike in traditional frameworks, such as PHP. For example, adding 
custom tags to the templating system is quick and easy. The goal of the 
project is to have a framework that scales down as well as up--a 
"Zope-lite" framework. SnakeSkin can scale down to be useful in a simple 
form-to-email or just to apply a clean-cut design skin. The toolkit can 
just as easily scale up to handle complex content managment systems, B2B 
extranets, and full-fledged e-commerce engines.  We do it all the time.

SnakeSkin, based upon the existing Albatross project maintained by 
Object Craft, runs under several webservers, including CGI based, 
Apache, FastCGI, and its own included webserver (used mainly for 
development).

SnakeSkin has several built in capabilities:

* Dynamic Macro Features (think server-side includes on steroids)
* SQL support in both the application and the template
* Support for Apach 2.0 Filters

... and includes Albatross features ...

* Clean separation of logic and design
* A simple-yet-robust templating system that is Web Designer-friendly 
(Plays nice with Dreamweaver)
* Secure Session Management in hidden fields, server-side data-stores, 
or through a session server

We are ready to consider the current version, 0.9, as a canadiate for 
1.0 release. Anyone that has feedback on the current design and/or finds 
bugs, please send information in though the mailling list ( 
http://lists.sourceforge.net/lists/listinfo/snakeskin-tools-discuss ) or 
file a bug report on sourceforge.net.

Thank You,

The SnakeSkin team.


From andrew at andreweland.org  Fri Sep 10 13:45:25 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Fri Sep 10 13:58:11 2004
Subject: [Web-SIG] Adding status code constants to httplib
Message-ID: <414193D5.6010405@andreweland.org>

Hi,

Over in web-sig, we're discussing PEP 333, the Web Server Gateway 
Interface. Rather than defining our own set of constants for the HTTP 
status code integers, we thought it would be a good idea to add them to 
httplib, allowing other applications to benefit. I've uploaded a 
patch[1] to httplib.py and the corresponding documentation. Do people 
think this is a good idea?

   -- Andrew Eland (http://www.andreweland.org)

[1] 
http://sourceforge.net/tracker/index.php?func=detail&aid=1025790&group_id=5470&atid=305470
From pje at telecommunity.com  Fri Sep 10 17:01:08 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 10 17:01:37 2004
Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib
In-Reply-To: <414193D5.6010405@andreweland.org>
Message-ID: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>

At 12:45 PM 9/10/04 +0100, Andrew Eland wrote:
>Over in web-sig, we're discussing PEP 333, the Web Server Gateway 
>Interface. Rather than defining our own set of constants for the HTTP 
>status code integers, we thought it would be a good idea to add them to 
>httplib, allowing other applications to benefit. I've uploaded a patch[1] 
>to httplib.py and the corresponding documentation. Do people think this is 
>a good idea?

I would also put the statuses in a dictionary, such that:

     status_code[BAD_GATEWAY] = "Bad Gateway"

This could be accomplished via something like:

     status_code = dict([
        (val, key.replace('_',' ').title())
            for key,val in globals.items()
                if key==key.upper() and not key.startswith('HTTP')
                    and not key.startswith('_')
     ])

From andrew at andreweland.org  Fri Sep 10 17:12:02 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Fri Sep 10 17:24:54 2004
Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib
In-Reply-To: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>
Message-ID: <4141C442.8050005@andreweland.org>

Phillip J. Eby wrote:

> I would also put the statuses in a dictionary, such that:
> 
>     status_code[BAD_GATEWAY] = "Bad Gateway"

There's a table mapping status codes to messages on 
BaseHTTPRequestHandler at the moment. It could be moved into httplib to 
make it more publically visible.

   -- Andrew
From py-web-sig at xhaus.com  Fri Sep 10 17:45:35 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Fri Sep 10 17:41:14 2004
Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib
In-Reply-To: <4141C442.8050005@andreweland.org>
References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>
	<4141C442.8050005@andreweland.org>
Message-ID: <4141CC1F.4000207@xhaus.com>

[Phillip J. Eby]
>> I would also put the statuses in a dictionary, such that:
>>
>>     status_code[BAD_GATEWAY] = "Bad Gateway"

[Andrew Eland]
> There's a table mapping status codes to messages on 
> BaseHTTPRequestHandler at the moment. It could be moved into httplib to 
> make it more publically visible.

And that mapping has 2 levels of human readable messages on it, for example

304: ('Not modified', 'Document has not changed singe given time'),

I think that, since the human readable versions are seldom heeded 
anyway, perhaps a single message is all we need?

And I'm -1 on forcing servers, particularly CGI servers, to import the 
client-side httplib (2.3 httplib.pyc == 42K) just to get this mapping.

If the changes are not going to make it in until the next release of 
cpython anyway, then maybe we should just aim for a new module? Or is 
some version of 2.4 the target, in which case minimal patches might make 
it in, whereas new modules won't?

Just my 0,02 euro.

Alan.
From andrew at andreweland.org  Fri Sep 10 17:46:44 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Fri Sep 10 17:59:36 2004
Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib
In-Reply-To: <4141CC1F.4000207@xhaus.com>
References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>	<4141C442.8050005@andreweland.org>
	<4141CC1F.4000207@xhaus.com>
Message-ID: <4141CC64.2090205@andreweland.org>

Alan Kennedy wrote:


> And that mapping has 2 levels of human readable messages on it, for example
> 304: ('Not modified', 'Document has not changed singe given time'),
> I think that, since the human readable versions are seldom heeded 
> anyway, perhaps a single message is all we need?

A simple move would mean we'd have to keep both, for backwards 
compatability. I guess BaseHTTPRequestHandler could mix its long 
messages in with those in a httplib table, but it sounds ugly.

> And I'm -1 on forcing servers, particularly CGI servers, to import the 
> client-side httplib (2.3 httplib.pyc == 42K) just to get this mapping.

I think the number of people who wouldn't import httplib on 
speed/process size grounds is very small. If they're that worried about 
efficiency, they could copy and paste the table, and manage the extra 
development complexity.

   -- Andrew

From pje at telecommunity.com  Fri Sep 10 18:08:37 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 10 18:09:10 2004
Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib
In-Reply-To: <4141C442.8050005@andreweland.org>
References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>
	<5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040910120714.032b5b80@mail.telecommunity.com>

At 04:12 PM 9/10/04 +0100, Andrew Eland wrote:
>Phillip J. Eby wrote:
>
>>I would also put the statuses in a dictionary, such that:
>>     status_code[BAD_GATEWAY] = "Bad Gateway"
>
>There's a table mapping status codes to messages on BaseHTTPRequestHandler 
>at the moment. It could be moved into httplib to make it more publically 
>visible.


It doesn't appear to include HTTP/1.1 status codes.


From py-web-sig at xhaus.com  Fri Sep 10 18:25:53 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Fri Sep 10 18:20:58 2004
Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib
In-Reply-To: <5.1.1.6.0.20040910120714.032b5b80@mail.telecommunity.com>
References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>	<5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com>
	<5.1.1.6.0.20040910120714.032b5b80@mail.telecommunity.com>
Message-ID: <4141D591.2090903@xhaus.com>

[Andrew Eland]
>> There's a table mapping status codes to messages on 
>> BaseHTTPRequestHandler at the moment. It could be moved into httplib 
>> to make it more publically visible.

[Phillip J. Eby]
> It doesn't appear to include HTTP/1.1 status codes.

Hmm. The version I'm seeing, python23/Lib, has all the codes from RFC 2616.

Are you looking at the python 2.1 version, by any chance?

Regards,

Alan.


From mnot at mnot.net  Sat Sep 11 07:24:29 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Sat Sep 11 07:24:44 2004
Subject: [Web-SIG] Adding status code constants to httplib
In-Reply-To: <414193D5.6010405@andreweland.org>
References: <414193D5.6010405@andreweland.org>
Message-ID: <DAB8C847-03B2-11D9-A26E-000A95BD86C0@mnot.net>

FYI; status codes as exceptions;
   http://www.mnot.net/python/http/status.py


On Sep 10, 2004, at 9:45 PM, Andrew Eland wrote:

> Hi,
>
> Over in web-sig, we're discussing PEP 333, the Web Server Gateway  
> Interface. Rather than defining our own set of constants for the HTTP  
> status code integers, we thought it would be a good idea to add them  
> to httplib, allowing other applications to benefit. I've uploaded a  
> patch[1] to httplib.py and the corresponding documentation. Do people  
> think this is a good idea?
>
>   -- Andrew Eland (http://www.andreweland.org)
>
> [1]  
> http://sourceforge.net/tracker/index.php? 
> func=detail&aid=1025790&group_id=5470&atid=305470
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:  
> http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net
>

--
Mark Nottingham     http://www.mnot.net/

From tony at lownds.com  Sat Sep 11 18:24:00 2004
From: tony at lownds.com (tony@lownds.com)
Date: Sat Sep 11 18:45:41 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>
References: <5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
	<5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>
Message-ID: <55666.68.122.33.37.1094919840.squirrel@*>

>>No, I mean that the server will send back a document that was sent as
>> part
>>of the exception, not a document derived from the exception and/or
>>traceback. It is a mechanism that applications can rely on to get an
>> error
>>notice to the user.
>
> I'm still not seeing how this is different from the application simply
> catching the exception at its highest level, and doing:
>
>        start_response("500 Error occurred",
> [('Content-type','text/plain')])
>        return ["error body here"]
>
>


Servers need additional logic to try and support calling start_response
twice. Calling start_response again could still be an error for the
application, masking the error. That code doesn't work from the iterator.

> Let me see if I understand your actual use case...  you want to be able to
> write an application that, although it handles its own errors, also gives
> users the option of placing error-handling middleware over it to change
> how
> its errors are rendered, logged, etc.  And, you want that mechanism to be
> based on Python exception information (type, value, traceback) rather than
> on HTTP information (status, headers, content).  Finally, you want this to
> be unconditionally available, rather than having to first check whether
> the
> exception handling middleware is installed.  Is this correct?

Yes, with the addition of a server-provided exception class that holds the
error document payload.

-Tony

From pje at telecommunity.com  Sat Sep 11 19:13:22 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Sep 11 19:12:30 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <55666.68.122.33.37.1094919840.squirrel@*>
References: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>
	<5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
	<5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com>

At 09:24 AM 9/11/04 -0700, tony@lownds.com wrote:
> >>No, I mean that the server will send back a document that was sent as
> >> part
> >>of the exception, not a document derived from the exception and/or
> >>traceback. It is a mechanism that applications can rely on to get an
> >> error
> >>notice to the user.
> >
> > I'm still not seeing how this is different from the application simply
> > catching the exception at its highest level, and doing:
> >
> >        start_response("500 Error occurred",
> > [('Content-type','text/plain')])
> >        return ["error body here"]
> >
> >
>
>
>Servers need additional logic to try and support calling start_response
>twice.

They'll need it in any case.  What are the odds that all errors will occur 
before start_response happens?


>Calling start_response again could still be an error for the
>application, masking the error.

True.  This is probably the strongest argument for having a special 
exception.  That is, that an exception-in-progress could be masked by the 
error of calling start_response again.  OTOH, there's always:

     try:
         try:
             t,v,tb = sys.exc_info()
             start_response("500 Error occurred", headers)
         except:
             raise t,v,tb   # reraise the original
         else:
             return ["error body here"]
     finally:
          t = v = tb = None

but admittedly, this is "guru-level" coding.  OTOH, we could simply have an 
optional third argument to start_response:

     start_response(status,headers,sys.exc_info())

the idea being that 'start_response' should reraise the exc_info tuple (or 
some private exception type) if the response has already been started.  It 
can also optionally log the error information.

Note that this also allows middleware to trivially intercept error reports 
by overriding start_response.  If it decides to handle the error itself, 
the middleware can simply throw an exception that it then catches as the 
app aborts.


>That code doesn't work from the iterator.

It only would have worked if it was in the first iteration, anyway.  The 
server is probably in the best position to attempt recovery following the 
first iteration.  However, in most cases where such code would *be* in the 
iterator, it's likely a generator that can simply yield the error 
body.  Using the third-argument strategy above, it's going to get an error 
if it wouldn't work.


> > Let me see if I understand your actual use case...  you want to be able to
> > write an application that, although it handles its own errors, also gives
> > users the option of placing error-handling middleware over it to change
> > how
> > its errors are rendered, logged, etc.  And, you want that mechanism to be
> > based on Python exception information (type, value, traceback) rather than
> > on HTTP information (status, headers, content).  Finally, you want this to
> > be unconditionally available, rather than having to first check whether
> > the
> > exception handling middleware is installed.  Is this correct?
>
>Yes, with the addition of a server-provided exception class that holds the
>error document payload.

I think that we can meet this use case without a server-provided exception 
class; the server (or middleware) just needs to know that you're starting 
an error response, and what the error is.  Adding an argument to 
start_response seems like a good, clean way to do this, and it looks easy 
to use/implement on all sides.  What do you think?

From tony at lownds.com  Sat Sep 11 19:38:55 2004
From: tony at lownds.com (tony@lownds.com)
Date: Sat Sep 11 20:00:35 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com>
References: <5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com><5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com><5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com><5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>
	<5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com>
Message-ID: <56402.68.122.33.37.1094924335.squirrel@*>

>>Servers need additional logic to try and support calling start_response
>>twice.
>
> They'll need it in any case.  What are the odds that all errors will occur
> before start_response happens?
>

Not good, hence the requirement that servers support re-starting the
response. With exception handling, they just need a little bit of logic to
decide whether to send the payload of the exception. They don't HAVE to
support re-starting the response. Hmm, except then there would be a lot of
"200 Ok" responses that actually ended in an error.

>
>>Calling start_response again could still be an error for the
>>application, masking the error.
>
> True.  This is probably the strongest argument for having a special
> exception.  That is, that an exception-in-progress could be masked by the
> error of calling start_response again.  OTOH, there's always:
>
>      try:
>          try:
>              t,v,tb = sys.exc_info()
>              start_response("500 Error occurred", headers)
>          except:
>              raise t,v,tb   # reraise the original
>          else:
>              return ["error body here"]
>      finally:
>           t = v = tb = None
>
> but admittedly, this is "guru-level" coding.  OTOH, we could simply have
> an
> optional third argument to start_response:
>
>      start_response(status,headers,sys.exc_info())
>
> the idea being that 'start_response' should reraise the exc_info tuple (or
> some private exception type) if the response has already been started.  It
> can also optionally log the error information.
>
> Note that this also allows middleware to trivially intercept error reports
> by overriding start_response.  If it decides to handle the error itself,
> the middleware can simply throw an exception that it then catches as the
> app aborts.
>

That reasonably handles the exception case. Applications and middleware
should never catch exceptions from start_response then, correct?

> I think that we can meet this use case without a server-provided exception
> class; the server (or middleware) just needs to know that you're starting
> an error response, and what the error is.  Adding an argument to
> start_response seems like a good, clean way to do this, and it looks easy
> to use/implement on all sides.  What do you think?
>

I'm beginning to think that re-startability is important. It makes it much
less likely that a successful HTTP code is returned when the application
actually broke. Given that, I don't
see much of an advantage to the exception.

-Tony

From pje at telecommunity.com  Sat Sep 11 22:34:15 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Sep 11 22:33:21 2004
Subject: [Web-SIG] Reviewing WSGI open issues, again...
In-Reply-To: <56402.68.122.33.37.1094924335.squirrel@*>
References: <5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com>
	<5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>
	<5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040908202255.020f2260@mail.telecommunity.com>
	<5.1.1.6.0.20040909145728.02adaec0@mail.telecommunity.com>
	<5.1.1.6.0.20040909161308.02bd7040@mail.telecommunity.com>
	<5.1.1.6.0.20040911125331.0264abd0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040911162653.0213fd20@mail.telecommunity.com>

At 10:38 AM 9/11/04 -0700, tony@lownds.com wrote:
> > OTOH, we could simply have
> > an
> > optional third argument to start_response:
> >
> >      start_response(status,headers,sys.exc_info())
> >
> > the idea being that 'start_response' should reraise the exc_info tuple (or
> > some private exception type) if the response has already been started.  It
> > can also optionally log the error information.
> >
> > Note that this also allows middleware to trivially intercept error reports
> > by overriding start_response.  If it decides to handle the error itself,
> > the middleware can simply throw an exception that it then catches as the
> > app aborts.
> >
>
>That reasonably handles the exception case. Applications and middleware
>should never catch exceptions from start_response then, correct?

Not if the call to start_response() was made from an error handler, 
no.  But I think it's acceptable to catch errors from normal (2-argument) 
calls to start_response().


> > I think that we can meet this use case without a server-provided exception
> > class; the server (or middleware) just needs to know that you're starting
> > an error response, and what the error is.  Adding an argument to
> > start_response seems like a good, clean way to do this, and it looks easy
> > to use/implement on all sides.  What do you think?
> >
>
>I'm beginning to think that re-startability is important. It makes it much
>less likely that a successful HTTP code is returned when the application
>actually broke. Given that, I don't
>see much of an advantage to the exception.

Good.  We'll take the "third argument" approach, then.  It's going to 
expand the PEP quite a bit, but every error handling proposal so far was 
going to do that.  But this one handles your use case without adding much 
overhead for the more common cases.

I can't believe we've managed to get away without having *any* special 
environ keys for error handling or any custom exception classes (except 
when you want to do something special, of course).

I'm going to try and get all the updates into the PEP this weekend, 
hopefully before the next hurricane goes by.  If it comes too close and we 
lose power here, I don't expect we'll get it back for a week or two, as 
there are too many crews still out fixing power outages from the *last* 
hurricane!

Once all the pending updates are in, I think we'll be almost ready to 
finalize the PEP, and we should plan on another posting to python-list and 
python-dev, giving a finalization deadline and requesting that all 
remaining change requests be submitted in the form of a patch.

Hm.  Actually, that might be premature, since I recall we were planning to 
get the HTTP/1.1 stuff in order first.  Mark, how's that coming along?  :)

From py-web-sig at xhaus.com  Sun Sep 12 18:48:53 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Sun Sep 12 18:43:54 2004
Subject: [Web-SIG] Modjy status.
Message-ID: <41447DF5.1060001@xhaus.com>

Dear Sig,

Just a quick message to let y'all know that modjy is 95% ready, but not 
the 98% or 99% percent I would like. Thus far

1. The code is pretty stable. I've had it under version control for 
several days now, and the changes are fewer and fewer.

2. I have pretty much finished the documentation, (including all those 
nitty gritty little details!)

3. I've tried to make it as friendly to non-J2EE people as possible.

But there are still not enough tests.

All that's being tested at the moment is cases where the application 
causes an exception. There are plenty of cases that need to be checked, 
including many positive ones, e.g. returning a variety of different 
iterable types to the server. Only when  a reasonably comprehensive test 
suite is passing will I feel totally comfortable with people trying it out.

So I'll be finalizing those tests over the next day or two, (after I've 
had a little rest: been working non-stop on this), and then I'll be 
happy for people to download modjy and try it out, safe in the knowledge 
that it's less likely to fall at the first fence, and thus put people 
off modjy for good.

Per ardua ad astra,

Alan.
From pje at telecommunity.com  Mon Sep 13 21:59:54 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep 13 21:59:01 2004
Subject: [Web-SIG] PEP 333 Update
Message-ID: <5.1.1.6.0.20040913154519.0228d750@mail.telecommunity.com>

I'm about to check in a major update to PEP 333; it should be available on 
the PEPs page within about an hour, and from SF CVS some time 
thereafter.  Here is a summary of the changes:

* Added 'wsgi.url_scheme', and updated sections relating thereto (such as 
the "URL Reconstruction" algorithm)

* Replaced the old "Optional Platform-Specific File Handling" section with 
a new one based on 'wsgi.file_wrapper', and expunged all references in the 
rest of the PEP that so much as suggest that returning a file or file-like 
object from an application is something you should ever do.

* Significantly expanded the "Error Handling" section, and other sections 
that relate to the new 'exc_info' parameter to 'start_response()'.

* Changed the definition of 'start_response' such that headers are not 
immediately sent to the client.

* Revised the "CGI gateway" example to include error handling and delayed 
header-sending.

* Miscellaneous explanatory clean-ups, such as linking from the 
specification regarding the use of 'len()' on the returned iterable, to the 
section of the spec that explains why using 'len()' is sometimes helpful.

* Added a (very brief) explanation of why returning an iterable is 
preferable to using 'write()', if the latter can be avoided, and noted that 
'write()' must not be invoked from within the returned iterable.

* Removed requirement that status and headers be pure 7-bit ASCII, 
referring instead to the RFC 2616 definitions.  (But left in the no-folding 
requirement that's specific to the PEP.)

* Added notes on using 'environ' to supply an application with limited 
configuration data

* Removed open issues that are now closed; added an open issue for 
reviewing the currently-required CGI variables, as it may be that some of 
them don't really need to be required.

* Added more kudos for Tony and Alan in the acknowledgements section.

We are now getting very close to finalization, I think.  There are just two 
more open issues to cover, plus some possible re-organization for 
HTTP/1.1-specifc stuff.  After that, I think we should post to python-list 
and python-dev one last time, then finalize the PEP.  After that, the 
semantics would be frozen, and only changes to e.g. the Q&A section, or 
edits for clarity would be allowed.  At that point, framework and server 
developers can then feel comfortable releasing something and calling it PEP 
333-compatible, if in fact it is.  :)

From pje at telecommunity.com  Tue Sep 14 18:59:30 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Sep 14 18:59:22 2004
Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython,
 and CPython 3.0
Message-ID: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com>

I've reviewed last month's Python-Dev discussion about the future Python 
'bytes()' type, and the eventual transition away from Python's current 
8-bit strings.

Mainly, the impression I get is that significant change in this respect 
really can't happen until Python 3.0, because too many things have to 
change at once for it to work.

So, here's what I propose to do about the open issue in PEP 333.  Servers 
and gateways that run under Python implementations where all strings are 
Unicode (e.g. Jython) *may*:

  * accept Unicode statuses and headers, so long as they properly encode 
them for transmission (latin-1 + RFC 2047)

  * accept Unicode for response body segments, so long as each segment may 
be encoded as latin-1 (i.e. only uses chars 0-255)

  * produce Unicode input headers and body strings by decoding from 
latin-1, as long as the produced values are considered type 'str' for that 
Python implementation.

I think that these rules allow conformance with the "letter of the law" for 
the rest of the WSGI spec, since servers, gateways, and applications are 
still required to use 'str' instances in all of the above cases.  The issue 
here is that non-CPython implementations may be able to place arbitrary 
Unicode characters in a 'str' instance, so the encoding rules need to be clear.

I think this is probably the right thing to do, leaving the adoption of any 
"byte array" usage to Python 3.0 and WSGI 2.0 or 3.0 or whatever we're on 
by then.  But I am not a Unicode guru, and I'm definitely not familiar with 
the details of non-CPython 'str' vs. Unicode issues.  So, I hope that there 
are some folks out there (Alan?) who can comment on this.  Thanks.

From paul.boddie at ementor.no  Wed Sep 15 12:33:31 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Wed Sep 15 12:33:35 2004
Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython,
	and CPython 3.0
Message-ID: <0F4BD34E02639E428B4654DCBAB4502D03E17B@100NOOSLMSG004.common.alpharoot.net>

Phillip J. Eby wrote:
>
> I've reviewed last month's Python-Dev discussion about the future
Python 
> 'bytes()' type, and the eventual transition away from Python's current

> 8-bit strings.
> 
> Mainly, the impression I get is that significant change in this
respect 
> really can't happen until Python 3.0, because too many things have to 
> change at once for it to work.

I think there was (and perhaps still is) a runtime option to force
Python to
treat all strings as Unicode objects.

> So, here's what I propose to do about the open issue in PEP 333.
Servers 
> and gateways that run under Python implementations where all strings
are 
> Unicode (e.g. Jython) *may*:
> 
>   * accept Unicode statuses and headers, so long as they properly
encode 
> them for transmission (latin-1 + RFC 2047)

I think I encode all Unicode objects used in this area as US-ASCII in
WebStack.

>   * accept Unicode for response body segments, so long as each segment
may 
> be encoded as latin-1 (i.e. only uses chars 0-255)

It should be possible to be more intelligent about response bodies, but
you
can argue that it isn't up to something like WSGI to go through the
necessary gymnastics to make sure that Unicode objects presented to the
response stream become encoded appropriately.

>   * produce Unicode input headers and body strings by decoding from 
> latin-1, as long as the produced values are considered type 'str' for
that 
> Python implementation.

I think I've left incoming headers as plain strings, but I suppose a
similar
translation could be performed in WebStack.

Paul
From py-web-sig at xhaus.com  Wed Sep 15 16:28:14 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep 15 16:22:59 2004
Subject: [Web-SIG] bytes, strings, and Unicode in Jython, IronPython,
	and CPython 3.0
In-Reply-To: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com>
References: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com>
Message-ID: <4148517E.7040701@xhaus.com>

[Phillip J. Eby]
 > I've reviewed last month's Python-Dev discussion about the future
 > Python  'bytes()' type, and the eventual transition away from Python's
 > current 8-bit strings.
 >
 > Mainly, the impression I get is that significant change in this
 > respect really can't happen until Python 3.0, because too many
 > things have to  change at once for it to work.
 >
 > So, here's what I propose to do about the open issue in PEP 333.
 > Servers and gateways that run under Python implementations where all
 > strings are Unicode (e.g. Jython) *may*:

Encoding issues? "Oh no", screams Alan, turning tail and sprinting away!

;-)

Before starting my response, I just want to point out two things:

1. I'm no bot when it comes to python and character encodings.

2. that the text below may come across a little cold. I've spent a few 
hours thinking through the issues, checking code, rewriting text, 
rewriting, rewriting, .... I think the below is the most accurate 
picture I can present: it won't win any poetry competitions.

Before getting into the WSGI parameter encoding issues, just a quick 
overview of character strings vs. binary strings in jython.

Strings in jython: textual vs. binary
=====================================

Java stores all textual strings as unicode strings, i.e. sequences of 
2-byte characters. These strings can be transcoded to any encoding: when 
they are so transcoded, that delivers a sequence of bytes.

Java keeps the concept of textual unicode strings and byte sequences 
separate, through the use of (rigidly enforced) method signatures. This 
ensures both static type correctness and memory efficiency.

Jython blends the two concepts, by using java.lang.String's to store 
both python text strings and python binary strings, i.e. byte arrays. It 
stores the latter by the trick of only using the lower byte of each 
two-byte unicode character to store data, leaving the upper byte unused. 
You can see this by running this code on jython.

#--------------------------------------------
s = u'\u00E1\u00E9\u00ED\u00F3\u00FA'

u8 = s.encode('utf-8')
u16 = s.encode('utf-16')

for x in [s, u8, u16]:
	print "%d:%s:%s" % (len(x), str(type(x)), `x`)
#--------------------------------------------

which outputs

"""
5:org.python.core.PyString:'\xE1\xE9\xED\xF3\xFA'
10:org.python.core.PyString:'\xC3\xA1\xC3\xA9\xC3\xAD\xC3\xB3\xC3\xBA'
12:org.python.core.PyString:'\xFE\xFF\x00\xE1\x00\xE9\x00\xED\x00\xF3\x00\xFA'"""
"""

The only way to create binary strings in jython is to create them 
explicitly, for example, by transcoding text strings as above, or by 
reading from a byte-oriented stream like a socket, or binary file. These 
binary strings do not have their encoding metadata associated with them, 
in common with cpython: the programmer must know the encoding of the 
byte-array/binary-string they're handling.

When these binary strings are created, and stored as textual unicode 
strings, they look like latin-1 textual strings, since all of the 
upper-bytes of the characters are zero. So on jython, a binary encoded 
latin-1 string and a unicode string containing only latin-1 characters 
are represented identically.

In jython, any other time a string is created, by assignment to a string 
literal ('', "", """ """), or by reading from a text file, text stream, 
etc, the result is always a textual unicode string.

So, on to WSGI


[Phillip J. Eby]
 >  * accept Unicode statuses and headers, so long as they properly encode
 > them for transmission (latin-1 + RFC 2047)

String parameters in jython are always passed as unicode strings, 
containing either textual strings or the binary-string/byte-arrays 
described above. So the strings received by the jython 
start_response_callable will be either textual or binary unicode strings.

The start_response_callable has to be able to operate on these strings 
regardless, i.e. transform them using standard python functions, e.g. 
.split(' '), int(), etc. If these functions fail to operate correctly on 
a binary string, then there is little the start_response_callable can 
do, without knowing the encoding of the binary string so that it can 
decode to a textual string. If the operations fail on a textual string, 
it is because the string contains invalid data for the operation.

Note that this is common with cpython, under which code must also simply 
assume that .split() and int() will simply work on the string passed, 
without knowing its encoding.

Status
======
So, in the case of the http status value, as long as 
int(status_str.split(' ')) returns an integer, that's fine. Which should 
be the case all of the time, as long as what was passed really was a 
string containing an ascii integer followed by a space.

Headers
=======
In the case of the header list, both header names and header values 
could also be passed as either textual or binary strings. There are 
three scenarios for the content of those strings

1. They are binary strings, i.e. have zero upper-bytes, and are 
presumably suitable (application knows best) for use as http headers 
without transformation.
2. They are latin-1 strings, i.e. have zero upper-bytes, and are thus 
suitable for use as http headers without transformation.
3. They are non latin-1 strings, i.e. have non-zero upper-bytes, and so 
will have to be encoded before transmission, according to RFC 2047.

What jython should do
=====================

So any jython middleware, gateway or server that receives a Unicode 
string for a header value must

A: Send it without transformation if all upper-bytes are zero.
B: Encode it according to RFC 2047 if there are non-zero upper-bytes, 
then send it.

In the case of B, how should the jython code know which iso-8859-X 
charset to use for RFC 2047? Is there library code? Is mimify the right 
module to use?

A couple of notes about J2EE
============================

1. Under J2EE, the HttpServletResponse method signatures specify that a 
java.lang.String, i.e. 2-byte unicode, value must be given for header 
names and values (although see next point).

2. The most recent 2.4 version of the servlet specification now permits 
header strings to be an "octet string ... encoded according to RFC 
2047". This was not specified in previous versions of the spec, i.e. 2.3 
or 2.2).

http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/http/HttpServletResponse.html#addHeader(java.lang.String,%20java.lang.String)

3. Which indicates to me that J2EE expects that you have completely 
taken care of encoding yourself, i.e. that you will have RFC-2047 
encoded your header, if required, before passing it to J2EE.

4. So if a jython start_response_callable receives a binary string, it 
should simply transmit it directly. If it receives a unicode string with 
non-zero upper-bytes, it should attempt to encode it in RFC-2047 before 
transmission. This could be done like so

unicode_header = "my value"
try:
   wire_string = unicode_header.encode('latin-1')
except UnicodeError:
   wire_string = encode_in_rfc2047(unicode_header)

Standalone pure jython server
=============================

When running a standalone pure jython WSGI server, jython code will be 
writing header values directly to the client socket. In this case, the 
jython start_response_callable/server needs latin-1/RFC2047 strings to 
transmit down the socket. The same rules as J2EE above apply to the 
treatment of strings in this case.

So, in regards to the WSGI requirement above, the application *must* 
transmit Unicode statuses and headers to a jython 
start_response_callable, which will attempt to appropriately RFC-2047 
encode the strings if they contain anything other than latin-1 characters.

Which I think completely agrees with your requirement as stated, just 
with different wording.


[Phillip J. Eby]
 >  * accept Unicode for response body segments, so long as each segment
 > may be encoded as latin-1 (i.e. only uses chars 0-255)

I would say "jython servers can *only* accept unicode strings for 
response body segments", since this is the jython mechanism for passing 
binary strings.

As you (kind-of) specify, the response body segment is not really a 
latin-1 encoded textual string, it is really a binary string of varying 
encoding, depending on the application. But treating it as a latin-1 
string has the effect of preserving its content as a binary string.

So again, I think that this meets with your requirement, except stated 
differently.

If WSGI response bodies "crossed over" somehow from a cpython 
application to a jython application, through either swig-style linkage 
or through some form of http relay protocol such as FastCGI, the jython 
receiving end of that would have to produce a response body encoded as a 
jython binary string. Which is exactly what jython socket operations, 
etc, produce. So pure python middleware code that distributes WSGI 
requests over, say a network socket, should run identically between 
jython and cpython. Which is nice to know.

And which would probably true for IronPython too: That Jim Hugunin is a 
clever lad. Jython really does all this stuff pretty seamlessly in 
relation to cpython.


[Phillip J. Eby]
 >  * produce Unicode input headers and body strings by decoding from
 > latin-1, as long as the produced values are considered type 'str' for
 > that Python implementation.

On jython, there is no point in decoding latin-1 strings to unicode 
strings, because their representations are identical: both are 
types.StringType, both take 2 bytes per character/byte, with the upper 
byte as zero.

If the recipient is another jython component, all string types will be 
received correctly.

If the recipient is a cpython component, then it will still receive the 
correct string, because whatever interface lies between the cpython and 
the jython will have correctly converted the data (if it was latin-1 data).

So perhaps this requirement could be stated as "jython 
components/applications must produce unicode input headers and body 
strings, which must only contain latin-1 characters"?

Whew! That turned out to be not so bad after all! (Alan crosses his 
fingers behind his back :-)

Regards,

Alan.
From pje at telecommunity.com  Wed Sep 15 18:01:25 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 15 18:02:11 2004
Subject: [Web-SIG] bytes, strings, and Unicode in Jython,
	IronPython, and CPython 3.0
In-Reply-To: <0F4BD34E02639E428B4654DCBAB4502D03E17B@100NOOSLMSG004.comm
	on.alpharoot.net>
Message-ID: <5.1.1.6.0.20040915115842.033643b0@mail.telecommunity.com>

At 12:33 PM 9/15/04 +0200, Paul Boddie wrote:
>Phillip J. Eby wrote:
> > So, here's what I propose to do about the open issue in PEP 333.
>Servers
> > and gateways that run under Python implementations where all strings
>are
> > Unicode (e.g. Jython) *may*:
> >
> >   * accept Unicode statuses and headers, so long as they properly
>encode
> > them for transmission (latin-1 + RFC 2047)
>
>I think I encode all Unicode objects used in this area as US-ASCII in
>WebStack.
>
> >   * accept Unicode for response body segments, so long as each segment
>may
> > be encoded as latin-1 (i.e. only uses chars 0-255)
>
>It should be possible to be more intelligent about response bodies, but
>you
>can argue that it isn't up to something like WSGI to go through the
>necessary gymnastics to make sure that Unicode objects presented to the
>response stream become encoded appropriately.
>
> >   * produce Unicode input headers and body strings by decoding from
> > latin-1, as long as the produced values are considered type 'str' for
>that
> > Python implementation.
>
>I think I've left incoming headers as plain strings, but I suppose a
>similar
>translation could be performed in WebStack.

You only need to worry about these things in WebStack if it's running under 
conditions where 'str' objects may contain any Unicode 
character.  Currently that's only Jython, and maybe IronPython.  As far as 
I know, CPython's -U option is broken; that is, not all of the Python 
stdlib works correctly with Unicode 'str' objects, so for the time being 
it's unlikely you'll need to worry about any of this under CPython.

From pje at telecommunity.com  Wed Sep 15 18:23:41 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 15 18:24:29 2004
Subject: [Web-SIG] bytes, strings, and Unicode in Jython,
	IronPython, and CPython 3.0
In-Reply-To: <4148517E.7040701@xhaus.com>
References: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com>
	<5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040915120132.03364c30@mail.telecommunity.com>

At 03:28 PM 9/15/04 +0100, Alan Kennedy wrote:
>String parameters in jython are always passed as unicode strings, 
>containing either textual strings or the binary-string/byte-arrays 
>described above. So the strings received by the jython 
>start_response_callable will be either textual or binary unicode strings.
>
>The start_response_callable has to be able to operate on these strings 
>regardless, i.e. transform them using standard python functions, e.g. 
>.split(' '), int(), etc. If these functions fail to operate correctly on a 
>binary string, then there is little the start_response_callable can do, 
>without knowing the encoding of the binary string so that it can decode to 
>a textual string. If the operations fail on a textual string, it is 
>because the string contains invalid data for the operation.

The point here is that a Jython WSGI server should either invoke 
'.encode("latin1")' on all strings supplied to it (whether in 
'start_response()', 'write()', or yielded by the iterable), or otherwise 
verify that there are either no non-latin1 characters, or (optionally) 
transcode any non-latin1 characters as prescribed by RFC 2047 
(status/headers only).  It should be a fatal error to send a non-latin1 
string to 'write()' or yield one from the iterable, however.


>What jython should do
>=====================
>
>So any jython middleware, gateway or server that receives a Unicode string 
>for a header value must
>
>A: Send it without transformation if all upper-bytes are zero.
>B: Encode it according to RFC 2047 if there are non-zero upper-bytes, then 
>send it.
>
>In the case of B, how should the jython code know which iso-8859-X charset 
>to use for RFC 2047? Is there library code? Is mimify the right module to use?

Actually, 'B' is optional.  (Note that my proposal said a server *may* 
accept Unicode, not that it was required to do so.)  It is also perfectly 
valid for a server or gateway to reject Unicode that can't be rendered as 
latin1.  In other words, only 'A' is required.  That's because applications 
are already required to do their own latin1/RFC 2047 encoding.

But after looking at all of your comments and thinking this over a bit, I'm 
thinking that there's a simpler way to specify the intent of my proposal; 
something like:

"""On Python platforms where the 'str' or 'StringType' type is 
Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all strings 
must contain only characters representable in ISO-8859-1 encoding (\u0000 
through \u00FF, inclusive).  It should be considered a fatal error for an 
application to supply strings containing any other Unicode character, 
whether the string is in the 'headers', the 'status', supplied to 
'write()', or is produced by the application's returned iterable."""

Adding this to the current "Unicode" section would suffice, I think, to 
deal with the current and future platforms in a cleanly compatible way.  It 
also makes it clear that there is no additional burden on either the 
server/gateway or application sides: it's just a clarification of what it 
means to be a 'str' for WSGI's purposes.

From pje at telecommunity.com  Wed Sep 15 19:00:36 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 15 19:01:29 2004
Subject: [Web-SIG] Loosening the CGI variable requirements in PEP 333
Message-ID: <5.1.1.6.0.20040915125541.0293c6e0@mail.telecommunity.com>

Currently, the requirement for CGI variables reads like this:

"""``environ`` Variables
---------------------

The ``environ`` dictionary is required to contain these CGI
environment variables, as defined by the Common Gateway Interface
specification [2]_.  The following variables **must** be present, but
**may** be an empty string, if there is no more appropriate value for
them:"""

I'd like to change that last sentence to:

"""The following variables **must** be present (unless their value
would be an empty string, in which case they may be omitted):"""

This means that other parts of the spec would need to use e.g. 
'environ.get("PATH_INFO","")'.  But, I think this change will make it a 
little bit easier on servers or gateways that already have some sort of CGI 
basis or support, without substantially affecting anything else.

Comments, anyone?

(By the way, as far as I can tell, this is the very last open issue for PEP 
333, so once this one's decided, I think it's time to begin the 
finalization process.)

From py-web-sig at xhaus.com  Wed Sep 15 20:56:25 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Wed Sep 15 20:52:11 2004
Subject: [Web-SIG] bytes, strings, and Unicode in Jython,  IronPython,
	and CPython 3.0
In-Reply-To: <5.1.1.6.0.20040915120132.03364c30@mail.telecommunity.com>
References: <5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com>
	<5.1.1.6.0.20040914122858.03208bc0@mail.telecommunity.com>
	<5.1.1.6.0.20040915120132.03364c30@mail.telecommunity.com>
Message-ID: <41489059.4090904@xhaus.com>

[Phillip J. Eby]
 > But after looking at all of your comments and thinking this over a
 > bit, I'm thinking that there's a simpler way to specify the intent
 > of my  proposal; something like:
 >
 > """On Python platforms where the 'str' or 'StringType' type is
 > Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
 > strings must contain only characters representable in ISO-8859-1
 > encoding (\u0000 through \u00FF, inclusive).  It should be considered
 > a fatal error for an application to supply strings containing any
 > other Unicode character, whether the string is in the 'headers', the
 > 'status', supplied to 'write()', or is produced by the application's
 > returned iterable."""

Great: Says it all, in a neat and concise way. Nice job!

+1

Regards,

Alan.

From floydophone at gmail.com  Thu Sep 16 00:48:11 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Thu Sep 16 00:48:21 2004
Subject: [Web-SIG] WSGI woes
Message-ID: <6654eac40409151548295fd2d9@mail.gmail.com>

It looks like WSGI is not well received over at twisted.web.

http://twistedmatrix.com/pipermail/twisted-web/2004-September/000644.html

I thought the blocking call was handled by the iterator, but maybe I'm wrong.
From pje at telecommunity.com  Thu Sep 16 01:12:49 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 01:14:26 2004
Subject: [Web-SIG] WSGI woes
In-Reply-To: <6654eac40409151548295fd2d9@mail.gmail.com>
Message-ID: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>

At 06:48 PM 9/15/04 -0400, Peter Hunt wrote:
>It looks like WSGI is not well received over at twisted.web.
>
>http://twistedmatrix.com/pipermail/twisted-web/2004-September/000644.html

Excerpting from that post:

"""The WSGI spec is unsuitable for use with asynchronous servers and
applications. Basically, once the application callable returns, the
server (or "gateway" as wsgi calls it) must consider the page finished
rendering."""

This is incorrect.  Here is a simple WSGI application that demonstrates 
yielding 50 data blocks for transmission *after* the "application callable 
returns".

     def an_application(environ, start_response):
         start_response("200 OK", [('Content-Type','text/plain')])
         for i in range(1,51):
             yield "Block %d" % i

This has been a valid WSGI application since the August 8th posting of the 
WSGI pre-PEP.

It may be, however, that Mr. Preston means that applications which want to 
use 'write()' or a similar push-oriented approach to produce data cannot do 
so after the application returns.  If so, we should discuss that use case 
further, preferably on the Web-SIG.


>I thought the blocking call was handled by the iterator, but maybe I'm wrong.

I'm not sure what you mean, but if you're asking whether the iterable is 
allowed to create output blocks after the application callable returns, 
then yes.

From floydophone at gmail.com  Thu Sep 16 05:06:04 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Thu Sep 16 05:06:11 2004
Subject: [Web-SIG] WSGI - alternate ideas, part II
Message-ID: <6654eac4040915200673ed116e@mail.gmail.com>

I know we've come a long way fleshing out WSGI, so remember, these are
just ideas. I'm not saying we should trash what we have, but I just
wanted to throw this out there.

I've been programming my own "web development kit", that is, a
platform (i.e. cgi, fastcgi, mod_python) independent templating and
controller system. Basically, it, along with lots of other efforts,
simply require a standard "request" and "response" object. In
addition, I think the application should call the gateway, instead of
the other way around. I also propose that the API be simple and use as
much standard, prewritten code as possible. Finally, it should be
extensible, such that we don't load, say, sessions if they aren't
needed for a certain application.

Thus, here is my "WSGI-X" proposal. The application will call the
gateway, opposite of WSGI. For example, a CGI WSGI-X application may
begin with:

#!/usr/bin/env python
if __name__ == "__main__":
      from wsgix import cgi
      req = cgi.get_request()

The req object is the core of the interface. In essence, it's
extremely simple. The Request class has four attributes:
fs - an object which mimics cgi.FieldStorage
environ - a dictionary corresponding to the CGI environment
stdout - the raw, unbuffered direct output stream to the client
finish_hooks - list or iterable of functions that are called when
finish() is called

It also declares one method, which may or may not be needed to be
overridden by subclasses specific to the gateway:
finish() - finish the request

Now we have a basic interface to interact with HTTP. If one wants to
write an extension to provide services like simplified cookie
handling, sessions, or buffered headers and content, they write an
extension function. A simple one for cookies would look like:

def cookie_extension(req):
      if not hasattr(req, "cookie"):
            req.cookie = Cookie.SimpleCookie(req.environ.get("HTTP_COOKIE",""))

It modifies the request object if it hasn't already been modified.
This saves us a bit of overhead so we won't need to parse the cookie
again in case it is called twice (as it will if other extensions
depend on it). Finish hooks have the same signature and execute when
the finish() method is called. For example, a buffering extension
would flush the buffer. Extensions can also add methods to the request
object, for items such as add_header().

There's my proposal. Tear it apart :) I'm going to post some example
code tomorrow or the day after, most likely.
From pje at telecommunity.com  Thu Sep 16 05:27:18 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 05:26:28 2004
Subject: [Web-SIG] WSGI - alternate ideas, part II
In-Reply-To: <6654eac4040915200673ed116e@mail.gmail.com>
Message-ID: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com>

At 11:06 PM 9/15/04 -0400, Peter Hunt wrote:

>Thus, here is my "WSGI-X" proposal. The application will call the
>gateway, opposite of WSGI. For example, a CGI WSGI-X application may
>begin with:
>
>#!/usr/bin/env python
>if __name__ == "__main__":
>       from wsgix import cgi
>       req = cgi.get_request()

That's pretty hard to implement correctly in any number of 
servers.  Really, pretty much every server wants to call the application, 
rather than the other way around, because servers want to use their own 
event loop.


>stdout - the raw, unbuffered direct output stream to the client

So, header parsing is required?  Or are only 'nph-' CGI scripts allowed?


>Now we have a basic interface to interact with HTTP. If one wants to
>write an extension to provide services like simplified cookie
>handling, sessions, or buffered headers and content, they write an
>extension function. A simple one for cookies would look like:
>
>def cookie_extension(req):
>       if not hasattr(req, "cookie"):
>             req.cookie = 
> Cookie.SimpleCookie(req.environ.get("HTTP_COOKIE",""))

Note that this can easily be accomplished in WSGI, by changing 
'hasattr(req,"cookie")' to '"my_extension.cookie" in environ' and 
'req.cookie' to 'environ["my_extension.cookie"]'.


>It modifies the request object if it hasn't already been modified.
>This saves us a bit of overhead so we won't need to parse the cookie
>again in case it is called twice (as it will if other extensions
>depend on it).

Also achievable within 'environ'.


>Finish hooks have the same signature and execute when
>the finish() method is called. For example, a buffering extension
>would flush the buffer. Extensions can also add methods to the request
>object, for items such as add_header().

Under WSGI, such "finish" hooks can be rendered as a 'close()' method on an 
iterable by a piece of middleware.


From dp at ulaluma.com  Thu Sep 16 07:13:52 2004
From: dp at ulaluma.com (Donovan Preston)
Date: Thu Sep 16 07:14:18 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
Message-ID: <335B4CE1-079F-11D9-A6FD-000A95864FC4@ulaluma.com>


On Sep 15, 2004, at 7:12 PM, Phillip J. Eby wrote:

> At 06:48 PM 9/15/04 -0400, Peter Hunt wrote:
>> It looks like WSGI is not well received over at twisted.web.
>>
>> http://twistedmatrix.com/pipermail/twisted-web/2004-September/ 
>> 000644.html
>
> Excerpting from that post:
>
> """The WSGI spec is unsuitable for use with asynchronous servers and
> applications. Basically, once the application callable returns, the
> server (or "gateway" as wsgi calls it) must consider the page finished
> rendering."""
>
> This is incorrect.

As I said in my original post, I hadn't mentioned anything about this  
yet because I didn't have a solution or proposal to fix the problem,  
which I maintain remains. I will attempt to suggest solutions, but I am  
unsure whether they will work or make sense in all environments. Allow  
me to explain:

>   Here is a simple WSGI application that demonstrates yielding 50 data  
> blocks for transmission *after* the "application callable returns".
>
>     def an_application(environ, start_response):
>         start_response("200 OK", [('Content-Type','text/plain')])
>         for i in range(1,51):
>             yield "Block %d" % i
>
> This has been a valid WSGI application since the August 8th posting of  
> the WSGI pre-PEP.

According to the spec, """The application object must return an  
iterable yielding strings.""" Whether the application callable calls  
write before returning or yields strings to generate content, the  
effect is the same -- there is no way for the application callable to  
say "Wait, hang on a second, I'm not ready to generate more content  
yet. I'll tell you when I am." This means the only way the application  
can pause for network activity is by blocking. For example, a page  
which performed an XML-RPC call and transformed the output into HTML  
would be required to perform the XML-RPC call synchronously. Or a page  
which initiated a telnet session and streamed the results into a web  
page would be required to perform reads on the socket synchronously.  
The server or gateway, by calling next(), is assuming that the call  
will yield a string value, and only a string value.

Of course, Twisted has a canonical way of indicating that a result is  
not yet ready, the Deferred. An asynchronous application could yield a  
Deferred and an asynchronous server would attach a callback to this  
Deferred which invoked the next() method upon resolution. This is how  
Nevow handles Deferreds (in Nevow SVN head at  
nevow.flat.twist.deferflatten).

However, the WSGI spec says nothing about Deferred and indeed, Deferred  
would be useless in the case of another asynchronous server such as  
Medusa. I would suggest that WSGI include a simple Deferred  
implementation, but WSGI is simply a spec which is not intended to have  
any actual code. Thus, one solution would be for the WSGI spec to be  
amended to state:

"""The application object must return an iterable yielding strings or  
objects implementing the following interface:

def addCallback(callable):
	'''Add 'callable' to the list of callables to be invoked when a string
	is available. Callable should take a single argument, which will be a  
string.'''

The application object must invoke the callable passed to addCallback,  
passing a string which will be written to the request.
"""

This places additional burdens upon implementors of WSGI servers or  
gateways. In the case of a threaded HTTP server which uses blocking  
writes, implementing support for these promises would have to look  
something like this:

import Queue

def handle_request(inSocket, outSocket):
     ... read inSocket, parse the request and dispatch ...

     iterable = application(environ, start_response)

     try:
         while True:
             val = iterable.next()
             if isinstance(val, str):
                 outSocket.write(val)
             else:
                 result = Queue.Queue()
                 val.addCallback(result.put)
                 outSocket.write(result.get())
     except StopIteration:
         outSocket.close()

> It may be, however, that Mr. Preston means that applications which  
> want to use 'write()' or a similar push-oriented approach to produce  
> data cannot do so after the application returns.  If so, we should  
> discuss that use case further, preferably on the Web-SIG.

And now we come to my other half-baked proposal.

Instead of merely returning a write callable, start_response could  
return a tuple of (write, finish) callables. The application would be  
free to call write at any time until it calls finish, at which point  
calling either callable becomes illegal. Again, the synchronous server  
support for this would have to use spin locking in a fashion such as  
this:

import threading

def handle_request(inSocket, outSocket):
     ... read request, dispatch ...
     finished = threading.Semaphore()

     def start_response(...):
         ... write headers ...
         return outSocket.write, finished.release

     iterable = application(environ, start_response)
     if iterable is None:
         finished.acquire()
         # Once we get here, the application is done with the request.

Finally, we come to the task of implementing a server or gateway which  
can asynchronously support either asynchronous or blocking  
applications. Since there is no way for the server or gateway to know  
whether the application object it is about to invoke will block,  
starving the main loop and preventing network activity from being  
serviced, it must invoke all applications in a new thread or process. A  
solution to this would be to require application callables to provide  
additional metadata, perhaps via function or object attributes, which  
indicate whether they are capable of running in asynchronous, threaded,  
or multiprocess environments. Since it's getting late and this message  
is getting long I will leave this discussion for another day.

dp

From pje at telecommunity.com  Thu Sep 16 08:37:06 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 08:36:05 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <335B4CE1-079F-11D9-A6FD-000A95864FC4@ulaluma.com>
References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>

At 01:13 AM 9/16/04 -0400, Donovan Preston wrote:

>On Sep 15, 2004, at 7:12 PM, Phillip J. Eby wrote:
>
>>At 06:48 PM 9/15/04 -0400, Peter Hunt wrote:
>>>It looks like WSGI is not well received over at twisted.web.
>>>
>>>http://twistedmatrix.com/pipermail/twisted-web/2004-September/ 000644.html
>>
>>Excerpting from that post:
>>
>>"""The WSGI spec is unsuitable for use with asynchronous servers and
>>applications. Basically, once the application callable returns, the
>>server (or "gateway" as wsgi calls it) must consider the page finished
>>rendering."""
>>
>>This is incorrect.
>
>As I said in my original post, I hadn't mentioned anything about this
>yet because I didn't have a solution or proposal to fix the problem,
>which I maintain remains.

Reading the rest of your post, I see that you are actually addressing the 
issue of asynchronous *applications*, and I have only been addressing 
asynchronous *servers* in the spec to date.  (Technically "half-async" 
servers, since to be properly portable, a WSGI server *must* support 
synchronous applications, and therefore an async WSGI server must have a 
thread pool for running applications, even if it contains only one thread.)

However, I'm not certain that it's actually possible to support *portable* 
asynchronous  applications under WSGI, since such asynchrony requires 
additional support such as an event loop service.  As a practical matter, 
asynchronous applications today require a toolset such as Twisted or 
peak.events in addition to the web server, and I don't really know of a way 
to make such applications portable across web servers, since the web server 
might use a different toolset that insists on having its own event 
loop.  Or it might be like mod_python or CGI, and not really have any 
meaningful way to create an event loop: it could be utterly synchronous in 
nature and impossible to make otherwise.

Thus, as a practical matter, applications that make use of asynchronous I/O 
*may* be effectively outside WSGI's scope, if they have no real chance of 
portability.  As I once said on the Web-SIG, the idea of WSGI is more aimed 
at allowing non-Twisted apps to run under a Twisted web server, than at 
allowing Twisted applications to run under other web servers!  The latter, 
obviously, is much more ambitious than the former.

But I'm happy to nonetheless explore whether there is any way to support 
such applications without unduly complicating middleware.  I don't expect 
it would complicate servers much, but middleware can be quite difficult, 
because middleware currently isn't even required to return when the 
application does!  It's not recommended, but a middleware component can sit 
there and iterate over the return value and call its parent's write() 
method all it wants.  In the presence of this kind of behavior, there isn't 
any real way to guarantee that a thread isn't going to be tied up with 
processing.  But realistically, that's what an async server's thread pool 
is *for*.

Anyway, as you'll see below, WSGI can actually run async apps with minimal 
blocking even without any modifications to the spec, and with 
server-specific extensions you can eliminate *all* the blocking, as long as 
middleware doesn't do anything pathological.  In practice, of course, I 
think the spec *should* be updated so that middleware is prohibited from 
interfering with the control flow, and I'll give some thought as to how 
that should be phrased.


>According to the spec, """The application object must return an
>iterable yielding strings.""" Whether the application callable calls
>write before returning or yields strings to generate content, the
>effect is the same -- there is no way for the application callable to
>say "Wait, hang on a second, I'm not ready to generate more content
>yet. I'll tell you when I am." This means the only way the application
>can pause for network activity is by blocking.

That is correct.  The application must block for such activities.  However, 
as a practical matter, this isn't a problem for e.g. database access, since 
using Twisted's adbapi would still tie up *some* thread with the exact same 
blocking I/O, so there's actually no loss in simply doing unadorned DBAPI 
access from within the application.


>  For example, a page
>which performed an XML-RPC call and transformed the output into HTML
>would be required to perform the XML-RPC call synchronously. Or a page
>which initiated a telnet session and streamed the results into a web
>page would be required to perform reads on the socket synchronously.

Technically, it could perform these tasks asynchronously, as long as the 
data were queued such that the application's return iterable simply 
retrieved results from the queue.  However, this would naturally block 
whenever the client was ready for I/O, but no results were available yet.

However, an asynchronous server isn't going to sit there in a loop calling 
next()!  Presumably, it's going to wait until the previous string gets sent 
to the client, before calling next() again.  And, it's presumably going to 
round-robin the active iterables through the threadpool, so that it doesn't 
keep blocking on iterables that aren't likely to have any data to produce 
as yet.

Yes, this arrangement can still block threads sometimes, if there are only 
a few iterables active and they are waiting for some very slow async 
I/O.  But the frequency of such blockages can be further reduced with a 
couple of extensions.  Suppose there was an 'environ["async.sleep"]' and 
'environ["async.wake"]'.  A call to 'sleep' would mean, "don't bother 
iterating over me again until you get a 'wake' call".

This *still* wouldn't prevent some item of middleware from hogging a thread 
in the threadpool, but I suppose you could actually make the 'sleep' 
function sit in a loop and run active iterables' next() methods until one 
of the suspended iterables in the current thread "wakes", at which point it 
would return control to whatever iterable it was called from.  Or, if you 
want to use Greenlets, you can always return control directly to the 
iterable that needs to "wake up".

Anyway, my point here is that it's possible to get a pretty decent setup 
for async applications, without any need to actually modify the base WSGI 
spec.  And, if you add some optional extensions, you can get an even 
smoother setup for async I/O.

Finally, I'm open to trying to define the 'sleep/wake' facilities as 
"standard options" in WSGI, as well as clarifying the middleware control 
flow to support this better.


>The server or gateway, by calling next(), is assuming that the call
>will yield a string value, and only a string value.

The spec doesn't rule out empty strings, however, which would be the 
natural way to indicate that no data is available.  So, the protocol in an 
async app's iterator would be something like:

      while queue.empty():
          if 'async.wake' in environ:
              someDeferred.addCallback(environ['async.wake'])
              environ['async.sleep']()
              yield ""
              # We should only get to this line once environ['async.wake'] 
has been called
          else:
              yield ""
              # delay an exponentially increasing period if queue is still 
empty

If middleware is required to match the control flow of the application it 
wraps (e.g. write()=>write(), yield=>yield), then this would result in 
complete non-blockingness when the server supports the 'async' extensions.

Of course, a blocking delay *is* required when running in a server that 
doesn't support the async extensions, but that's unavoidable in that 
case.  (Technically, you might be better off just doing synchronous I/O if 
you're being run in a synchronous server, but that's of course optional.)


>"""The application object must return an iterable yielding strings or
>objects implementing the following interface:
>
>def addCallback(callable):
>         '''Add 'callable' to the list of callables to be invoked when a 
> string
>         is available. Callable should take a single argument, which will 
> be a
>string.'''
>
>The application object must invoke the callable passed to addCallback,
>passing a string which will be written to the request.
>"""
>
>This places additional burdens upon implementors of WSGI servers or
>gateways.

And a near-intolerable burden on middleware, which would have to have a way 
to "pass through" this facility.  It would be much better to limit the 
pass-through requirements to covering write and yield, rather than 
requiring middleware to implement addCallback facilities as well.


>Finally, we come to the task of implementing a server or gateway which
>can asynchronously support either asynchronous or blocking
>applications. Since there is no way for the server or gateway to know
>whether the application object it is about to invoke will block,
>starving the main loop and preventing network activity from being
>serviced, it must invoke all applications in a new thread or process.

But *some* thread is going to be working on it, and this is true whether 
you use a thread pool or the server is purely synchronous.  And, because a 
WSGI server *must* support synchronous applications, it *must* have some 
thread available that is amenable to blocking.

Of course "new" threads are not required.  I assume that in the case of 
Twisted, something like reactor.deferToThread() will be used to wrap a WSGI 
application's initial invocation, and each individual 'next()' call.

From wilk-ml at flibuste.net  Thu Sep 16 10:58:34 2004
From: wilk-ml at flibuste.net (William Dode)
Date: Thu Sep 16 10:58:36 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	(Phillip J. Eby's message of "Thu, 16 Sep 2004 02:37:06 -0400")
References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
Message-ID: <874qly3805.fsf@blakie.riol>

"Phillip J. Eby" <pje@telecommunity.com> writes:

> At 01:13 AM 9/16/04 -0400, Donovan Preston wrote:
>
>>On Sep 15, 2004, at 7:12 PM, Phillip J. Eby wrote:
>>
>>>At 06:48 PM 9/15/04 -0400, Peter Hunt wrote:
>>>>It looks like WSGI is not well received over at twisted.web.
>>>>
>>>>http://twistedmatrix.com/pipermail/twisted-web/2004-September/ 000644.html
>>>
>>>Excerpting from that post:
>>>
>>>"""The WSGI spec is unsuitable for use with asynchronous servers and
>>>applications. Basically, once the application callable returns, the
>>>server (or "gateway" as wsgi calls it) must consider the page finished
>>>rendering."""
>>>
>>>This is incorrect.
>>
>>As I said in my original post, I hadn't mentioned anything about this
>>yet because I didn't have a solution or proposal to fix the problem,
>>which I maintain remains.
>
> Reading the rest of your post, I see that you are actually addressing
> the issue of asynchronous *applications*, and I have only been
> addressing asynchronous *servers* in the spec to date.  (Technically
> "half-async" servers, since to be properly portable, a WSGI server
> *must* support synchronous applications, and therefore an async WSGI
> server must have a thread pool for running applications, even if it
> contains only one thread.)
>
> However, I'm not certain that it's actually possible to support
> *portable* asynchronous  applications under WSGI, since such
> asynchrony requires additional support such as an event loop service.

Like others, i did my litle framework who can work on top of twisted,
cgi or BaseHTTPServer. So it's possible ;-)
But it doesn't mean that i whant to run my application on any
server. Generaly i use twisted server when i have specials need, like
telnet, irc... So this application will not run under cgi. But i like
to can reuse quickly somes litle cgi application under twisted.
I need the same framework for all the servers to can share 90% of my
api, to map the url to a resource, for session, cookies...

So, i hope we can find a solution to run simple application anywhere,
and to be open for very specific uses.

Sorry, because of my poor english, i cannot help a lot in the
discussion...

-- 
William Dod? - http://flibuste.net
From floydophone at gmail.com  Thu Sep 16 14:02:38 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Thu Sep 16 14:02:44 2004
Subject: [Web-SIG] WSGI - alternate ideas, part II
In-Reply-To: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com>
References: <6654eac4040915200673ed116e@mail.gmail.com>
	<5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com>
Message-ID: <6654eac4040916050254f7297f@mail.gmail.com>

I think that the application should be passed a finish() method as a
parameter or start_response return value. If the WSGI application is
not a generator and returns wsgi.NOT_DONE_YET (similar to
Twisted.web's NOT_DONE_YET), it is required to call finish().
Otherwise, the gateway will call finish() after the generator is
finished or a string value is returned.

That way, one could do all of the deferred calls they want, and simply
return NOT_DONE_YET and call finish().

How does that sound?

On Wed, 15 Sep 2004 23:27:18 -0400, Phillip J. Eby
<pje@telecommunity.com> wrote:
> At 11:06 PM 9/15/04 -0400, Peter Hunt wrote:
> 
> >Thus, here is my "WSGI-X" proposal. The application will call the
> >gateway, opposite of WSGI. For example, a CGI WSGI-X application may
> >begin with:
> >
> >#!/usr/bin/env python
> >if __name__ == "__main__":
> >       from wsgix import cgi
> >       req = cgi.get_request()
> 
> That's pretty hard to implement correctly in any number of
> servers.  Really, pretty much every server wants to call the application,
> rather than the other way around, because servers want to use their own
> event loop.
> 
> 
> >stdout - the raw, unbuffered direct output stream to the client
> 
> So, header parsing is required?  Or are only 'nph-' CGI scripts allowed?
> 
> 
> >Now we have a basic interface to interact with HTTP. If one wants to
> >write an extension to provide services like simplified cookie
> >handling, sessions, or buffered headers and content, they write an
> >extension function. A simple one for cookies would look like:
> >
> >def cookie_extension(req):
> >       if not hasattr(req, "cookie"):
> >             req.cookie =
> > Cookie.SimpleCookie(req.environ.get("HTTP_COOKIE",""))
> 
> Note that this can easily be accomplished in WSGI, by changing
> 'hasattr(req,"cookie")' to '"my_extension.cookie" in environ' and
> 'req.cookie' to 'environ["my_extension.cookie"]'.
> 
> 
> >It modifies the request object if it hasn't already been modified.
> >This saves us a bit of overhead so we won't need to parse the cookie
> >again in case it is called twice (as it will if other extensions
> >depend on it).
> 
> Also achievable within 'environ'.
> 
> 
> >Finish hooks have the same signature and execute when
> >the finish() method is called. For example, a buffering extension
> >would flush the buffer. Extensions can also add methods to the request
> >object, for items such as add_header().
> 
> Under WSGI, such "finish" hooks can be rendered as a 'close()' method on an
> iterable by a piece of middleware.
> 
>
From neel at mediapulse.com  Thu Sep 16 15:41:07 2004
From: neel at mediapulse.com (Michael C. Neel)
Date: Thu Sep 16 15:40:48 2004
Subject: [Web-SIG] WSGI - alternate ideas, part II
In-Reply-To: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com>
References: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com>
Message-ID: <1095342067.30862.9.camel@mike.mediapulse.com>

On Wed, 2004-09-15 at 23:27, Phillip J. Eby wrote:
> At 11:06 PM 9/15/04 -0400, Peter Hunt wrote:
> 
> >Thus, here is my "WSGI-X" proposal. The application will call the
> >gateway, opposite of WSGI. For example, a CGI WSGI-X application may
> >begin with:
> >
> >#!/usr/bin/env python
> >if __name__ == "__main__":
> >       from wsgix import cgi
> >       req = cgi.get_request()
> 
> That's pretty hard to implement correctly in any number of 
> servers.  Really, pretty much every server wants to call the application, 
> rather than the other way around, because servers want to use their own 
> event loop.

To insert my highly unqualified 2 cents; this is simialr to the way
SnakeSkin/Albatross work:

import snakeskin
from snakeskin.cgiapp import Request

app = snakeskin.SimpleApp(...)
app.run(Request())

....
which allows me a chance to do something like:

app = snakeskin.SimpleApp(...)
myReq = Request()
myReq.custom_data = {...}
app.run(myReq)

changing to mod_python, chane line two to

from snakeskin.apacheapp import Request

the rest is the same.  I don't think there is an issue with the current
wsgi where app is callable; calling the object would just imply a:

from snakeskin.wsgiapp import Request

Mike


From py-web-sig at xhaus.com  Thu Sep 16 16:59:15 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep 16 16:54:40 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
References: <5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
Message-ID: <4149AA43.6000803@xhaus.com>

[Phillip J. Eby]
 > However, an asynchronous server isn't going to sit there in a loop
 > calling next()!  Presumably, it's going to wait until the previous
 > string gets sent to the client, before calling next() again.  And,
 > it's presumably going to round-robin the active iterables through the
 > threadpool, so that it doesn't keep blocking on iterables that aren't
 > likely to have any data to produce as yet.
 >
 > Yes, this arrangement can still block threads sometimes, if there are
 > only a few iterables active and they are waiting for some very slow
 > async I/O.  But the frequency of such blockages can be further reduced
 > with a couple of extensions.  Suppose there was an
 > 'environ["async.sleep"]' and 'environ["async.wake"]'.  A call to
 > 'sleep' would mean, "don't bother iterating over me again until you
 > get a 'wake' call".

and

 > Anyway, my point here is that it's possible to get a pretty decent
 > setup for async applications, without any need to actually modify the
 > base WSGI spec.  And, if you add some optional extensions, you can get
 > an even smoother setup for async I/O.
 >
 > Finally, I'm open to trying to define the 'sleep/wake' facilities as
 > "standard options" in WSGI, as well as clarifying the middleware
 > control flow to support this better.

What would be really nice would be if there were some way for the 
application to return, to event-based servers or gateways, an object 
that could be included in the server's event loop, e.g. its select/poll 
loop.

For example, if an application were waiting on return data from a 
database, through a network socket, it could return that 
database-connection-socket descriptor to the server. The server would 
then check for activity on the database socket in its event loop, i.e. 
select.poll.POLLIN. When this event, i.e. database data, appears, the 
server can have *reasonable* confidence that a call to the applications 
iterator will then yield data. Of course, it is not guaranteed that the 
application will have data available (e.g. the database socket contains 
half the data required by the app, or the database connection is shared 
between multiple apps). But it's better than the application blocking.

But I can't think of any unified way to generalise this solution to 
non-descriptor based event loops or applications. For example, what if 
the application is waiting for data on a Queue.Queue? Or a 
threading.Event? How could the application enable the server to check 
for the Queue.Queue or threading.Event it awaits?

Perhaps the server could maintain an extra event loop for checking such 
threaded event notification mechanisms? Or it could associate an "app 
ready" flag with each client connection? It could go something like this:-

1. The application returns to the server an instance of a class that 
indicates it will only generate content when a thread notification 
primitive is set. Or perhaps the thread notification primitive has an 
optional attribute of the returned iterable, e.g. if hasattr(iterable, 
'ready_to_go'): etc

2. The server adds this thread notification primitive to its 
lists/"event loop", or associates the notification primitive with the 
descriptor for the incoming/outgoing client socket.

3. When the client socket becomes ready for output, the server checks 
the ready_to_go flag on the application. If the flag is not set, it 
simply passes over that individual socket to the next.

4. When the client socket is ready to consume output *and* the 
application is ready to produce output, i.e. it's ready flag is set, the 
server gets the data from the app's iterator and transmits it down the 
client socket. The server could conceivably loop until either the client 
socket is full or the application iterator is empty, and then just 
suspend that client/application pair. Or it could spin that app->client 
transfer into a separate dedicated thread.

I don't like the idea of adding callbacks to WSGI: that's too twisted 
specific. I can picture, for example, a very simple coroutine based 
async server that would not need to have callbacks. Instead, they would 
simply yield a NO-OP state to the server/scheduler/dispatcher, 
indicating they have no data ready right now.

And, of course, that's what we're really discussing here: server 
scheduling, and how servers ensure that application output gets 
transmitted to clients with maximum efficiency and timeliness. IMHO, 
asynchronous server scheduling algorithms and concerns have no place in 
core WSGI, although a well-designed optional extension to support 
effiency might have a nice unification effect on python asynchronous 
server architectures.

Just my ?0,02

Regards,

Alan.
From pje at telecommunity.com  Thu Sep 16 17:14:47 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 17:13:44 2004
Subject: [Web-SIG] WSGI - alternate ideas, part II
In-Reply-To: <6654eac4040916050254f7297f@mail.gmail.com>
References: <5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com>
	<6654eac4040915200673ed116e@mail.gmail.com>
	<5.1.1.6.0.20040915232134.0214fe20@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916111212.021585b0@mail.telecommunity.com>

At 08:02 AM 9/16/04 -0400, Peter Hunt wrote:
>I think that the application should be passed a finish() method as a
>parameter or start_response return value. If the WSGI application is
>not a generator and returns wsgi.NOT_DONE_YET (similar to
>Twisted.web's NOT_DONE_YET), it is required to call finish().
>Otherwise, the gateway will call finish() after the generator is
>finished or a string value is returned.
>
>That way, one could do all of the deferred calls they want, and simply
>return NOT_DONE_YET and call finish().
>
>How does that sound?

Way too complicated in the general case.  I'd prefer a solution that 
doesn't excessively complicate middleware or synchronous servers, just to 
support asynchronous applications that are unlikely to be portable anyway.

From pje at telecommunity.com  Thu Sep 16 17:22:36 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 17:21:33 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <4149AA43.6000803@xhaus.com>
References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>

At 03:59 PM 9/16/04 +0100, Alan Kennedy wrote:
>And, of course, that's what we're really discussing here: server 
>scheduling, and how servers ensure that application output gets 
>transmitted to clients with maximum efficiency and timeliness. IMHO, 
>asynchronous server scheduling algorithms and concerns have no place in 
>core WSGI, although a well-designed optional extension to support effiency 
>might have a nice unification effect on python asynchronous server 
>architectures.

Right.  I'd encourage people to experiment with async extensions like my 
sleep/wake idea, and if there's sufficient consensus we could add a 
"standard extension" to the spec.  But I don't want to disturb the 
write()+iterable model, since that allows middleware to be mostly oblivious 
to the sync/async issue, and only apps or servers that care have to deal 
with it.  While asynchronous servers are fairly common, most existing 
asynchronous applications are going to be tied to a particular async server 
architecture no matter what we do in WSGI.

From pje at telecommunity.com  Thu Sep 16 17:27:44 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 17:26:41 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <874qly3805.fsf@blakie.riol>
References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916112253.026bdd30@mail.telecommunity.com>

At 10:58 AM 9/16/04 +0200, William Dode wrote:
>But it doesn't mean that i whant to run my application on any
>server. Generaly i use twisted server when i have specials need, like
>telnet, irc... So this application will not run under cgi. But i like
>to can reuse quickly somes litle cgi application under twisted.
>I need the same framework for all the servers to can share 90% of my
>api, to map the url to a resource, for session, cookies...
>
>So, i hope we can find a solution to run simple application anywhere,
>and to be open for very specific uses.

As I said, WSGI should let any WSGI application run under more 
sophisticated architectures like Twisted; it's just that an application 
that uses Twisted-specific features isn't going to be able to move to a 
server that's not Twisted-compatible.

And, if you're using Twisted-specific features in a WSGI app (as opposed to 
just writing a pure Twisted app), you'll have some additional work needed 
to deal with the asynchrony.  However, the only reason I can think of why 
you'd want to make such an application use the WSGI interface is if you 
wanted to be able to use WSGI-based middleware features.  At some point, 
that may be attractive, but I really doubt that in the short term anybody 
using Twisted-specific features in an application would want to bother with 
making it WSGI-compatible.

From pje at telecommunity.com  Thu Sep 16 18:18:26 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 18:17:24 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <15672B46-07F3-11D9-AC9C-000A95A50FB2@fuhm.net>
References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916113124.025e97d0@mail.telecommunity.com>

At 11:14 AM 9/16/04 -0400, James Y Knight wrote:
>On Sep 16, 2004, at 2:37 AM, Phillip J. Eby wrote:
>>Reading the rest of your post, I see that you are actually addressing the 
>>issue of asynchronous *applications*, and I have only been addressing 
>>asynchronous *servers* in the spec to date.  (Technically "half-async" 
>>servers, since to be properly portable, a WSGI server *must* support 
>>synchronous applications, and therefore an async WSGI server must have a 
>>thread pool for running applications, even if it contains only one thread.)
>
> From the point of view of Twisted as the server, running a WSGI 
> application, the big question is:
>Can you (as a host server) assume WSGI applications will run non-blocking?
>
>The answer is clearly No and I don't imagine that would change.

Right, because the ability to wrap existing applications is a must, and 
most existing applications are synchronous.


>  (well, right now it's currently not even possible to write a 
> non-blocking WSGI application, but even if it were..)

That depends on what you define as "non-blocking".  :)


>The only sensible thing is to assume a WSGI app will block for some 
>arbitrarily long amount of time. Therefore, the only solution is to spawn 
>threads for simultaneous WSGI applications.

Right; this has been in the discussions of WSGI since day one, last 
December.  The assumption is that async servers would have to use a thread 
pool (e.g. via reactor.deferToThread) to run WSGI applications.  Since the 
point was to allow non-Twisted applications and frameworks (e.g. Zope) to 
run under Twisted or any other web server, this was the only possible approach.


>So, basically, I concur: WSGI is implementable for async servers, but only 
>to implement blocking applications.

If by "blocking" you mean, you can't absolutely guarantee that no operation 
will tie up the current thread, then yes.  If you mean "tie up the current 
thread for the entire request", then no, since it's possible to pause the 
output with a few minor changes to the spec.


>>However, I'm not certain that it's actually possible to support 
>>*portable* asynchronous  applications under WSGI, since such asynchrony 
>>requires additional support such as an event loop service.
>>As a practical matter, asynchronous applications today require a toolset 
>>such as Twisted or peak.events in addition to the web server, and I don't 
>>really know of a way to make such applications portable across web 
>>servers, since the web server might use a different toolset that insists 
>>on having its own event loop.  Or it might be like mod_python or CGI, and 
>>not really have any meaningful way to create an event loop: it could be 
>>utterly synchronous in nature and impossible to make otherwise.
>>
>>Thus, as a practical matter, applications that make use of asynchronous 
>>I/O *may* be effectively outside WSGI's scope, if they have no real 
>>chance of portability.  As I once said on the Web-SIG, the idea of WSGI 
>>is more aimed at allowing non-Twisted apps to run under a Twisted web 
>>server, than at allowing Twisted applications to run under other web 
>>servers!  The latter, obviously, is much more ambitious than the former.
>
>Yes, there is no way that I can see to make WSGI suitable for writing 
>async applications without significant work.  There are two obvious 
>issues: the input stream only provides blocking read(), not a selectable 
>fd, and there is no way to pause output.

The sleep/wake extensions I proposed would allow pausing output.  I hadn't 
thought about the input stream issue.


>If the write callback was extended into a write/finish callback, it 
>wouldn't completely fix the second problem. Twisted would have to call the 
>write() callback from its reactor loop (having no access to the original 
>request thread). Especially if there is any middleware, the *write* might 
>block! There's also the question of whether the write() and finish() 
>methods are threadsafe or not -- would it even be safe to call from a 
>separate thread from that in which the request was started?

That's one reason why sleep/wake over iterables is a better solution than 
write/finish for the "pausing output" issue.


>Writing an async application *is* an interesting question, because then, 
>possibly, you could take the framework half of twisted web and run it as a 
>WSGI application. However, if this question is punted by WSGI (as I think 
>is likely a good idea..), twisted web framework can continue to work with 
>other servers by using HTTP proxying -- which is a _perfectly good_ 
>solution, and something major webservers already support. HTTP is a pretty 
>good protocol for talking between webservers and webapps.
>
>Also, if WSGI becomes really popular on servers that cannot do HTTP 
>proxying natively, twisted could provide a WSGI "application" that simply 
>proxies the requests over a socket to a separate twisted web server 
>process. This would provide essentially no advantage to HTTP proxying 
>where that works, however.

No *technical* advantage, true, but if WSGI becomes a popular buzzword, the 
mere existence of such a solution allows you to boast that Twisted Web can 
be used with any WSGI-compliant server, as well as any server that supports 
HTTP proxying, which makes it sound like you have twice as many deployment 
options from a "marketecture" perspective.  :)

From py-web-sig at xhaus.com  Thu Sep 16 18:45:36 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep 16 18:40:24 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
Message-ID: <4149C330.8060009@xhaus.com>

[Phillip J. Eby]
 > I'd encourage people to experiment with async extensions like my
 > sleep/wake idea,

Actually, the more I think about it, the more I like your idea.

My solution of using a thread-safe condition variable as an optional 
attribute of application return objects is too heavyweight. Whereas your 
solution can be implement with complexity relative to the server. For 
example, on a single-process server, wsgi.sleep could be defined like this

def sleep():
	# return a closure wrapping a method which sets a simple binary

whereas a threaded server might use

import threading

def sleep():
	# return a closure wrapping a threading.Condition().set()

Also, having the wrapper in the environment means that its meaning can 
be changed by middleware.

The only thing I disagree on are the names "sleep" and "wake", which 
IMHO come with too many semantic hangovers from the threading world. 
When an application calls wsgi.sleep(), it's not really sleeping, it's 
just declaring that it currently has no output: a call to its iterator 
will succeed, but the returned value will be an empty string.

So basically, WSGI is providing an on/off indicator for every instance 
of a middleware stack, which indicates to the server if there is 
currently output available.

Thinking afresh.
================

The server is just acting as a mediator between the client and 
application. When the application has data, and the client is ready to 
receive data, the server transfers data between the two. But that client 
to application conversation is full-duplex, i.e. the client may be 
sending input to the application.

In an asynchronous situation, the application cannot simply do a 
blocking read on the input: that will tie up the server thread. So we 
need a way for the application to be notified/called when input becomes 
available from the client.

Perhaps we need to add an environment entry, e.g. "wsgi.input_handler", 
which the app uses to pass a callable to the server. This callable would 
be called whenever data became available on the input stream.

So how would that work in the middleware stack?

Would the first application in the stack set the callback for the 
input_stream, and perhaps not even invoke the next component up in the 
stack until some input has arrived? Does this mean that input handling 
will have to separated out into a new state in the server->application 
state model?

Or would each component in the stack set its own callback?

I'm beginning to think that we may have to treat output and input 
identically in WSGI: i.e. from the servers point of view, there is no 
difference between the application->client stream and the 
client->application stream: there is symmetry between server's 
connection to the client and server's "connection" to the application. 
Hmmm: Must think some more about this.

Regards,

Alan.
From pje at telecommunity.com  Thu Sep 16 19:41:54 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 19:40:52 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <4149C330.8060009@xhaus.com>
References: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>

At 05:45 PM 9/16/04 +0100, Alan Kennedy wrote:
>The only thing I disagree on are the names "sleep" and "wake", which IMHO 
>come with too many semantic hangovers from the threading world. When an 
>application calls wsgi.sleep(), it's not really sleeping, it's just 
>declaring that it currently has no output: a call to its iterator will 
>succeed, but the returned value will be an empty string.
>
>So basically, WSGI is providing an on/off indicator for every instance of 
>a middleware stack, which indicates to the server if there is currently 
>output available.

Well, I'm proposing it as an optional extension, not a required 
feature.  And, I think I'd like to streamline it to a single 
'wsgi.pause_output' function, e.g.:

     resume = environ['wsgi.pause_output']()

Where 'resume' is then a callback function that can be invoked to resume 
iteration.  This keeps it to a single extension key, helps ensure the 
correct sequence of actions, and makes it easier to implement in some 
cases, while not making other cases any harder.


>In an asynchronous situation, the application cannot simply do a blocking 
>read on the input: that will tie up the server thread.

What do you mean by "server thread"?  A truly asynchronous server (one 
using "no threads") cannot serve multiple WSGI requests simultaneously.  In 
the general case, a WSGI server can only serve as many requests 
simultaneously as it has available threads for.  However, WSGI applications 
that use iteration in place of 'write()' can sometimes be run with fewer 
than one thread per simultaneous request -- that's why iteration is 
recommended for applications that can be implemented that way.


>  So we need a way for the application to be notified/called when input 
> becomes available from the client.
>
>Perhaps we need to add an environment entry, e.g. "wsgi.input_handler", 
>which the app uses to pass a callable to the server. This callable would 
>be called whenever data became available on the input stream.
>
>So how would that work in the middleware stack?

You would have to pass either 'environ' or 'wsgi.input' *into* this input 
handler request function, so that the server can verify it hasn't been 
replaced by any middleware.  This is the standard way in WSGI of providing 
enhanced communication facilities that could "bypass" middleware.  See:

     http://www.python.org/peps/pep-0333.html#server-extension-apis

So, in principle, if the spec is modified to require middleware to honor 
child applications' block boundaries, then you could use an extension API 
to pause iteration until input is available, in much the same way that you 
would pause iteration for any other reason.

Neither of these "pause iteration" solutions are especially elegant, at 
least from the POV of an async application author.  But my objective here 
is only to make it *possible*, not necessarily pretty.  I imagine that if 
there's actual demand for async apps to run under WSGI, it should be 
possible to create wrappers to let an application written in Twisted's 
continuation-passing style be run as a WSGI app.

Such a wrapper would basically be just a function returning an iterator, 
with a bunch of pausing logic and a queue to communicate with the actual 
asynchronous app.  And, such wrappers should only need to be written once 
for each asynchronous API, which as a practical matter probably means only 
Twisted, anyway, as (IMO) it has no real competitors in the Python async 
framework space.

From foom at fuhm.net  Thu Sep 16 19:57:16 2004
From: foom at fuhm.net (James Y Knight)
Date: Thu Sep 16 19:57:20 2004
Subject: [Web-SIG] WSGI & transfer-encodings
Message-ID: <D8A70FC4-0809-11D9-AC9C-000A95A50FB2@fuhm.net>

It is unclear to me from the WSGI spec what parts of HTTP a WSGI 
application is responsible for handling, and what the host server or 
middleware has to expect from the app. Sorry if this has been discussed 
previously, but it doesn't appear in the PEP.

1) Does the server need to decode incoming chunked encoding? The CGI 
spec essentially forbids incoming requests with chunked (and thus all 
others as well) transfer-encoding, as the CONTENT_LENGTH header is 
required to be present when there is incoming content. Does WSGI do the 
same thing?

I would suggest the answer should be that WSGI does *not* require 
CONTENT_LENGTH to be present when there is incoming data. This requires 
at least the modification of:

> The server is not required to read past the client's specified 
> Content-Length, and is allowed to simulate an end-of-file condition if 
> the application attempts to read past that point. The application 
> should not attempt to read more data than is specified by the 
> CONTENT_LENGTH variable.

This would have to state something like: "The server must simulate an 
end-of-file condition if the application attempts to read more data 
than is specified by the Content-Length or the incoming 
Transfer-Encoding."

The only way to tell if there's incoming data is therefore to attempt 
to read() the input stream. read() will either immediately return an 
EOF condition (returning '') or else read the data. Also, it seems that 
read() with no args isn't allowed? Perhaps it should be.


2) The server is responsible for connection-oriented headers, and the 
spec states it may override the client's headers in this case. I would 
take this to mean I should just ignore the client provided Connection 
and Transfer-Encoding headers and supply those myself according to HTTP 
spec.

But what about transfer-encoding? The spec says the server is allowed 
to add a chunked encoding. But,
- Is an application allowed to yield data that has already been encoded 
into chunked form?
- What if it does so and you're talking to a HTTP 1.0 client? Should 
the server decode the chunking? Or should it just let the application 
produce bogus output?
- May the application provide data with a gzip transfer-encoding?
- What if the server already handles all connection-oriented behavior 
transparently and doesn't even pass on the Connection, Keep-Alive, TE, 
Trailers, Transfer-Encoding, Upgrade headers to the client? Is that 
okay?
- Wouldn't providing pre-encoded data screw up middleware that is 
expecting to do something useful with the data going through it?

I would suggest that that the correct answer is: the application should 
have nothing to do with any connection oriented behavior. It should not 
send a Connection or Transfer-Encoding header and should not expect to 
receive the Connection, Keep-Alive, TE, Trailers, Transfer-Encoding, or 
Upgrade headers, although it is optional for the server to strip them. 
The application should not apply a transfer-encodng to its?output and 
the server should not give it a transfer-encoded input.

James
From pje at telecommunity.com  Thu Sep 16 20:03:27 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 20:02:36 2004
Subject: [Web-SIG] Draft language for WSGI to forbid blocking by middleware
Message-ID: <5.1.1.6.0.20040916135706.021445a0@mail.telecommunity.com>

I'm proposing the following language to be added to PEP 333, as a 
subsection under "Buffering and Streaming", just before the subsection 
entitled, "The write() callable".  It doesn't address pausing or resuming 
iteration (which I don't have a PEP-able proposal for yet), but it should 
ensure that middleware doesn't introduce any additional blocking issues:

===excerpt start===

Middleware Handling of Block Boundaries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to better support asynchronous applications and servers,
middleware components **must not** block iteration waiting for
multiple values from an application iterable.  If the middleware
needs to accumulate more data from the application before it can
produce any output, it **must** yield an empty string.

To put this requirement another way, a middleware component **must
yield at least one value** each time its underlying application
yields a value.  If the middleware cannot yield any other value,
it must yield an empty string.

This requirement ensures that asynchronous applications and servers
can conspire to reduce the number of threads that are required
to run a given number of application instances simultaneously.

Note also that this requirement means that middleware **must**
return an iterable as soon as its underlying application returns
an iterable.  It is also forbidden for middleware to use the
``write()`` callable to transmit data that is yielded by an
underlying application.  Middleware may only use their parent
server's ``write()`` callable to transmit data that the
underlying application sent using a middleware-provided ``write()``
callable.

===excerpt end===

In addition to this insertion, I would modify the 'start_response()' 
specification to note that HTTP headers should not be sent until the first 
*non-empty* string is yielded from the iterable.

Comments?

From foom at fuhm.net  Thu Sep 16 20:04:59 2004
From: foom at fuhm.net (James Y Knight)
Date: Thu Sep 16 20:05:03 2004
Subject: [Web-SIG] Re: WSGI & transfer-encodings
In-Reply-To: <D8A70FC4-0809-11D9-AC9C-000A95A50FB2@fuhm.net>
References: <D8A70FC4-0809-11D9-AC9C-000A95A50FB2@fuhm.net>
Message-ID: <EC97C57D-080A-11D9-AC9C-000A95A50FB2@fuhm.net>

On Sep 16, 2004, at 1:57 PM, James Y Knight wrote:
> 2) The server is responsible for connection-oriented  override the 
> client's headers in this case. I would take this to mean I should just 
> ignore the client provided Connection and Transfer-Encoding headers 
> and supply those myself according to HTTP spec.

I said "client" here a few times, but I meant WSGI "application" 
instead for all of them but the phrase "HTTP 1.0 client". Sorry for any 
confusion.

James

From pje at telecommunity.com  Thu Sep 16 20:30:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 20:29:32 2004
Subject: [Web-SIG] WSGI & transfer-encodings
In-Reply-To: <D8A70FC4-0809-11D9-AC9C-000A95A50FB2@fuhm.net>
Message-ID: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com>

At 01:57 PM 9/16/04 -0400, James Y Knight wrote:
>It is unclear to me from the WSGI spec what parts of HTTP a WSGI 
>application is responsible for handling, and what the host server or 
>middleware has to expect from the app.

The general section for such issues is:

     http://www.python.org/peps/pep-0333.html#other-http-features

The advice is that in general, a WSGI server should consider itself an HTTP 
proxy server, and should consider the application an HTTP origin server.

However, this doesn't fully cover the two issues you've brought up, so 
thanks for bringing them to my attention!


>1) Does the server need to decode incoming chunked encoding? The CGI spec 
>essentially forbids incoming requests with chunked (and thus all others as 
>well) transfer-encoding, as the CONTENT_LENGTH header is required to be 
>present when there is incoming content. Does WSGI do the same thing?
>
>I would suggest the answer should be that WSGI does *not* require 
>CONTENT_LENGTH to be present when there is incoming data.

Hm.  An interesting conundrum.  Do any Python servers or applications exist 
today that *work* when there's no content-length?

Personally, I'm thinking that WSGI should follow CGI here, and decode 
incoming transfer encodings.  If this means HTTP/1.1 servers have to dump 
the incoming data to a file first, so be it.


>The only way to tell if there's incoming data is therefore to attempt to 
>read() the input stream. read() will either immediately return an EOF 
>condition (returning '') or else read the data. Also, it seems that read() 
>with no args isn't allowed? Perhaps it should be.

A no-argument read would be problematic in some environments -- CGI for 
example.


>2) The server is responsible for connection-oriented headers, and the spec 
>states it may override the client's headers in this case. I would take 
>this to mean I should just ignore the client provided Connection and 
>Transfer-Encoding headers and supply those myself according to HTTP spec.
>
>But what about transfer-encoding? The spec says the server is allowed to 
>add a chunked encoding. But,
>- Is an application allowed to yield data that has already been encoded 
>into chunked form?
>- What if it does so and you're talking to a HTTP 1.0 client? Should the 
>server decode the chunking? Or should it just let the application produce 
>bogus output?
>- May the application provide data with a gzip transfer-encoding?
>- What if the server already handles all connection-oriented behavior 
>transparently and doesn't even pass on the Connection, Keep-Alive, TE, 
>Trailers, Transfer-Encoding, Upgrade headers to the client? Is that okay?

The answer to all these questions, according to the current spec, is yes, 
absolutely.  (Per the "server=proxy server, application=origin server" model).


>- Wouldn't providing pre-encoded data screw up middleware that is 
>expecting to do something useful with the data going through it?

Yes, it would.  There are at least two ways to handle it, though:

1. Don't use middleware that's not smart enough to handle your app's output

2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other 
parameters on the way in to the application, so that the application (if 
written correctly) won't send data the server or middleware can't handle.


>I would suggest that that the correct answer is: the application should 
>have nothing to do with any connection oriented behavior. It should not 
>send a Connection or Transfer-Encoding header and should not expect to 
>receive the Connection, Keep-Alive, TE, Trailers, Transfer-Encoding, or 
>Upgrade headers, although it is optional for the server to strip them. The 
>application should not apply a transfer-encodng to its output and the 
>server should not give it a transfer-encoded input.

I like most of this, *except* that I'd like to leave open the option of an 
application providing transfer-encoding on its output.  I'd rather have 
servers and middleware set HTTP_ACCEPT_ENCODING to "identity;q=1.0, *;q=0" 
(or an empty string, or delete the entry), if they interpret content, and 
have applications be required to respect this.  Specifically, an 
application can only apply a content-encoding if it matches a non-zero 
quality in HTTP_ACCEPT_ENCODING.

From foom at fuhm.net  Thu Sep 16 21:03:53 2004
From: foom at fuhm.net (James Y Knight)
Date: Thu Sep 16 21:03:56 2004
Subject: [Web-SIG] WSGI & transfer-encodings
In-Reply-To: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com>
References: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com>
Message-ID: <26AD023A-0813-11D9-AC9C-000A95A50FB2@fuhm.net>


On Sep 16, 2004, at 2:30 PM, Phillip J. Eby wrote:

> Hm.  An interesting conundrum.  Do any Python servers or applications 
> exist today that *work* when there's no content-length?

Unknown.

> Personally, I'm thinking that WSGI should follow CGI here, and decode 
> incoming transfer encodings.  If this means HTTP/1.1 servers have to 
> dump the incoming data to a file first, so be it.

Following CGI means: do not allow requests without a Content-Length. No 
servers I know of will dump the data to a file to determine the length 
first before sending to a CGI. I would not ask them to either: that's 
like saying "Pleeease denial of service me!". And, really, the only 
place I've seen incoming chunked requests used is for streaming data -- 
and that will "never" finish.

>> The only way to tell if there's incoming data is therefore to attempt 
>> to read() the input stream. read() will either immediately return an 
>> EOF condition (returning '') or else read the data. Also, it seems 
>> that read() with no args isn't allowed? Perhaps it should be.
>
> A no-argument read would be problematic in some environments -- CGI 
> for example.

No -- CGI requires CONTENT_LENGTH, so in the CGI environment it is 
perfectly possible to simulate EOF at the end of the data. read could 
look something like this:

class CGIReq:
   def __init__(self):
     self.maxlength = int(environ.get('CONTENT_LENGTH', 0))

   def read(self, length=None):
     if length is None:
       length = self.maxlength
     else:
       length = min(self.maxlength, length)
     data = sys.stdin.read(length)
     self.maxlength -= len(data)
     return data

>> - Wouldn't providing pre-encoded data screw up middleware that is 
>> expecting to do something useful with the data going through it?
>
> Yes, it would.  There are at least two ways to handle it, though:
>
> 1. Don't use middleware that's not smart enough to handle your app's 
> output
>
> 2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other 
> parameters on the way in to the application, so that the application 
> (if written correctly) won't send data the server or middleware can't 
> handle.

You've confused Content-Encoding with Transfer-Encoding. TE is the 
request header that goes with Transfer-Encoding response header. And 
according to HTTP 1.1, chunked is always acceptable, so no amount of 
header munging can change that. So under the "WSGI application is a 
HTTP origin server" interpretation, all pieces of middleware must be 
prepared to deal with chunked output. I think that's silly -- there is 
no reason for a WSGI application to produce chunked-encoded strings, as 
it already has a way to produce chunks via the iterator.

>> I would suggest that that the correct answer is: the application 
>> should have nothing to do with any connection oriented behavior. It 
>> should not send a Connection or Transfer-Encoding header and should 
>> not expect to receive the Connection, Keep-Alive, TE, Trailers, 
>> Transfer-Encoding, or Upgrade headers, although it is optional for 
>> the server to strip them. The application should not apply a 
>> transfer-encodng to its output and the server should not give it a 
>> transfer-encoded input.
>
> I like most of this, *except* that I'd like to leave open the option 
> of an application providing transfer-encoding on its output.  I'd 
> rather have servers and middleware set HTTP_ACCEPT_ENCODING to 
> "identity;q=1.0, *;q=0" (or an empty string, or delete the entry), if 
> they interpret content, and have applications be required to respect 
> this.  Specifically, an application can only apply a content-encoding 
> if it matches a non-zero quality in HTTP_ACCEPT_ENCODING.

Again: I'm talking only about Transfer-Encoding, not Content-Encoding. 
Content-Encoding is an end-to-end function and thus properly belongs to 
the application. Transfer-Encoding is a hop-by-hop header, and properly 
belongs to the server. If you want a transfer-encoded output, you can 
always request it via a server-specific extension or configuration 
mechanism.

Both Transfer-Encoding and Content-Encoding have a gzip argument, but 
these mean significantly different things. The first is connection 
compression, the second is transferring a compressed file over an 
uncompressed connection.

James

From py-web-sig at xhaus.com  Thu Sep 16 21:29:31 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep 16 21:24:30 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
References: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
Message-ID: <4149E99B.3060003@xhaus.com>

[Alan Kennedy]
 >> In an asynchronous situation, the application cannot simply do a
 >> blocking read on the input: that will tie up the server thread.

[Phillip J. Eby]
 > What do you mean by "server thread"?  A truly asynchronous server (one
 > using "no threads") cannot serve multiple WSGI requests
 > simultaneously.  In the general case, a WSGI server can only serve as
 > many requests simultaneously as it has available threads for.

Sorry, I should have paid more attention to phrasing in this context.

By  "server thread" I mean the thread of execution that is running the 
select/poll operation in the server (which needs at least *one* thread). 
If the application did a blocking read of the input running in a simple, 
single-threaded asyncore-style server, that single thread would block, 
holding up event processing.

[Phillip J. Eby]
 >
 > [About asynchronous input handlers]
 >
 > Such a wrapper would basically be just a function returning an
 > iterator, with a bunch of pausing logic and a queue to communicate
 > with the actual asynchronous app.  And, such wrappers should only
 > need to be written once for each asynchronous API, which as a
 > practical matter probably means only Twisted, anyway, as (IMO) it has
 > no real competitors in the Python async framework space.

I see the need for returning an iterator: the application processing the 
input has to produce a response as well: for a form-processing app 
returning a "thank you for your submission" page.

But I don't see the need for pausing logic or queues? Why can't the 
server simply call directly into the application, e.g. using a 
"process_input" method, in effect saying "you have some input ready".

And I'm not sure I see the need for the application to check that the 
wsgi.input hasn't been replaced: if there were middleware further down 
that stack that was intercepting and transforming the input stream, then 
*it* should be the one receiving the asynchronous notification from the 
server. This lower level component would then read some input, process 
it, and then call a "process_input" method on the next component up in 
the stack, etc, etc.

I suppose I'm talking about the server "pushing" the input through the 
middleware stack, whereas you're talking about the application at the 
stop of the stack "pulling" the data up through the stack. Is that right?

And I'd be interested to see how your approach would handle a situation 
where there is both streaming input and output. For example, a server 
that takes strings of any length, say 10**9 bytes, and 
.encode('rot13')'s each byte in turn, before sending it back to the client.

I'll be thinking about this some more.

Regards,

Alan.


From pje at telecommunity.com  Thu Sep 16 22:08:48 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 22:07:47 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <4149E99B.3060003@xhaus.com>
References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>

At 08:29 PM 9/16/04 +0100, Alan Kennedy wrote:
>[Alan Kennedy]
> >> In an asynchronous situation, the application cannot simply do a
> >> blocking read on the input: that will tie up the server thread.
>
>[Phillip J. Eby]
> > What do you mean by "server thread"?  A truly asynchronous server (one
> > using "no threads") cannot serve multiple WSGI requests
> > simultaneously.  In the general case, a WSGI server can only serve as
> > many requests simultaneously as it has available threads for.
>
>Sorry, I should have paid more attention to phrasing in this context.
>
>By  "server thread" I mean the thread of execution that is running the 
>select/poll operation in the server (which needs at least *one* thread). 
>If the application did a blocking read of the input running in a simple, 
>single-threaded asyncore-style server, that single thread would block, 
>holding up event processing.

Right, which is (one reason) why a WSGI server can in the general case only 
serve as many WSGI requests simultaneously as it has available threads for, 
although it's possible to improve on that worst-case condition by 
appropriate use of iterators.


>But I don't see the need for pausing logic or queues? Why can't the server 
>simply call directly into the application, e.g. using a "process_input" 
>method, in effect saying "you have some input ready".
>
>And I'm not sure I see the need for the application to check that the 
>wsgi.input hasn't been replaced: if there were middleware further down 
>that stack that was intercepting and transforming the input stream, then 
>*it* should be the one receiving the asynchronous notification from the 
>server. This lower level component would then read some input, process it, 
>and then call a "process_input" method on the next component up in the 
>stack, etc, etc.
>
>I suppose I'm talking about the server "pushing" the input through the 
>middleware stack, whereas you're talking about the application at the stop 
>of the stack "pulling" the data up through the stack. Is that right?

That's correct, and that's what I'm trying to avoid if at all possible, 
because it enormously complicates middleware, to the sole benefit of 
asynchronous apps -- that mostly aren't going to be portable anyway.

So, going by STASCTAP theory (Simple Things Are Simple, Complex Things Are 
Possible), the pause/resume approach makes asynchronous applications 
*possible*, while keeping the nominal synchronous cases and middleware 
*simple*.


>And I'd be interested to see how your approach would handle a situation 
>where there is both streaming input and output. For example, a server that 
>takes strings of any length, say 10**9 bytes, and .encode('rot13')'s each 
>byte in turn, before sending it back to the client.

Presumably, the function to pause for input needs to take a minimum length, 
or have some way to communicate available length to the application.

I don't pretend to fully understand the needed use cases here, because I 
have little experience writing web applications that need to wait on other 
network services (other than databases) while a client is waiting.  And if 
I were writing an asynchronous server, I'd probably at least consider using 
Greenlets to context-switch blocking operations so that they wouldn't tie 
up an active thread.  Such an approach is conceptually easier to deal with, 
IMO, than writing everything in continuation-passing style.

But I *do* want WSGI to make it *possible* to meet async apps' use cases, 
which is why I'm seeking input from those that do have the relevant 
experience.  The trade-off is that it shouldn't excessively complicate 
nominal compliance with WSGI.  In particular, I'd prefer that the current 
"example CGI gateway" in PEP 333 not require any major changes or 
significant expansion.

From pje at telecommunity.com  Thu Sep 16 22:22:04 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 16 22:21:03 2004
Subject: [Web-SIG] WSGI & transfer-encodings
In-Reply-To: <26AD023A-0813-11D9-AC9C-000A95A50FB2@fuhm.net>
References: <5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com>
	<5.1.1.6.0.20040916140415.025f6bd0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916160858.0277e1a0@mail.telecommunity.com>

At 03:03 PM 9/16/04 -0400, James Y Knight wrote:

>On Sep 16, 2004, at 2:30 PM, Phillip J. Eby wrote:
>
>>Hm.  An interesting conundrum.  Do any Python servers or applications 
>>exist today that *work* when there's no content-length?
>
>Unknown.
>
>>Personally, I'm thinking that WSGI should follow CGI here, and decode 
>>incoming transfer encodings.  If this means HTTP/1.1 servers have to dump 
>>the incoming data to a file first, so be it.
>
>Following CGI means: do not allow requests without a Content-Length. No 
>servers I know of will dump the data to a file to determine the length 
>first before sending to a CGI. I would not ask them to either: that's like 
>saying "Pleeease denial of service me!". And, really, the only place I've 
>seen incoming chunked requests used is for streaming data -- and that will 
>"never" finish.

Hm.  I suppose it's in theory possible that one could write some kind of 
streaming-over-HTTP application with WSGI.  So I guess we should consider 
allowing it.


>>>The only way to tell if there's incoming data is therefore to attempt to 
>>>read() the input stream. read() will either immediately return an EOF 
>>>condition (returning '') or else read the data. Also, it seems that 
>>>read() with no args isn't allowed? Perhaps it should be.
>>
>>A no-argument read would be problematic in some environments -- CGI for 
>>example.
>
>No -- CGI requires CONTENT_LENGTH, so in the CGI environment it is 
>perfectly possible to simulate EOF at the end of the data.

I mainly meant that environments like CGI already have a suitable file-like 
object for use as 'wsgi.input', and that supporting 'read()' with no 
arguments requires implementing a replacement 'wsgi.input'.


>>>- Wouldn't providing pre-encoded data screw up middleware that is 
>>>expecting to do something useful with the data going through it?
>>
>>Yes, it would.  There are at least two ways to handle it, though:
>>
>>1. Don't use middleware that's not smart enough to handle your app's output
>>
>>2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other 
>>parameters on the way in to the application, so that the application (if 
>>written correctly) won't send data the server or middleware can't handle.
>
>You've confused Content-Encoding with Transfer-Encoding. TE is the request 
>header that goes with Transfer-Encoding response header. And according to 
>HTTP 1.1, chunked is always acceptable, so no amount of header munging can 
>change that. So under the "WSGI application is a HTTP origin server" 
>interpretation, all pieces of middleware must be prepared to deal with 
>chunked output. I think that's silly -- there is no reason for a WSGI 
>application to produce chunked-encoded strings, as it already has a way to 
>produce chunks via the iterator.

Fair enough; the only parts that has any business reading or writing 
chunked encoding is the "real" server; I'll update the PEP 333 "Other HTTP 
Features" section accordingly.


>>I like most of this, *except* that I'd like to leave open the option of 
>>an application providing transfer-encoding on its output.  I'd rather 
>>have servers and middleware set HTTP_ACCEPT_ENCODING to "identity;q=1.0, 
>>*;q=0" (or an empty string, or delete the entry), if they interpret 
>>content, and have applications be required to respect 
>>this.  Specifically, an application can only apply a content-encoding if 
>>it matches a non-zero quality in HTTP_ACCEPT_ENCODING.
>
>Again: I'm talking only about Transfer-Encoding, not Content-Encoding. 
>Content-Encoding is an end-to-end function and thus properly belongs to 
>the application. Transfer-Encoding is a hop-by-hop header, and properly 
>belongs to the server. If you want a transfer-encoded output, you can 
>always request it via a server-specific extension or configuration mechanism.
>
>Both Transfer-Encoding and Content-Encoding have a gzip argument, but 
>these mean significantly different things. The first is connection 
>compression, the second is transferring a compressed file over an 
>uncompressed connection.

Thanks for clearing up my confusion; between your explanation and RFC 2616 
I think I can now see how to clarify this.  In effect, WSGI applications 
*must not* send hop-by-hop headers or interpret them, and servers *should 
not* provide them to applications.  And WSGI middleware *must* follow RFC 
2616, section 13.5, regarding what headers may be changed in transit when.

One way of looking at it is that WSGI servers and middleware are like HTTP 
proxy servers, but using a private inter-server transport mechanism that 
effectively replaces any normal HTTP hop-by-hop control mechanisms.

From py-web-sig at xhaus.com  Thu Sep 16 23:41:31 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Thu Sep 16 23:36:17 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
Message-ID: <414A088B.7040601@xhaus.com>

[Alan Kennedy]
 >> I suppose I'm talking about the server "pushing" the input through
 >> the middleware stack, whereas you're talking about the application at
 >> the stop of the stack "pulling" the data up through the stack. Is
 >> that right?

[Phillip J. Eby]
 > That's correct, and that's what I'm trying to avoid if at all
 > possible, because it enormously complicates middleware, to the sole
 > benefit of asynchronous apps -- that mostly aren't going to be
 > portable anyway.

Hmmm. Perhaps I'll resort to explaining my idea through code rather than 
text. Here is my take on a putative blocking *and* asynchronous rot-13 
stream encoder.

But before showing you the blocking and async one, I want to show what I 
think the blocking one would look like

class blocking_rot13_streamer:

   def __init__(self, environ, start_response):
     self.in_stream = environ['wsgi.input']
     start_response("200 OK", [('context-type', 'text/plain-rot13')])

   def __iter__(self):
     return self

   def next(self):
     try:
       return self.in_stream.read().encode('rot-13')
     except EndOfStream:
       raise StopIteration

This looks nice and simple to me. The one that works in both async mode 
and blocking mode looks like this

class rot13_streamer:

   def __init__(self, environ, start_response):
     self.in_stream = environ['wsgi.input']
     self.buffer = []
     self.end_of_stream = False
     if environ.has_key('wsgi.async_input_handler'):
       self.async = True
       environ['wsgi.async_input_handler'](self.input_handler)
     else:
       self.async = False
     self.pause_output = environ['wsgi.pause_output']
     start_response("200 OK", [('context-type', 'text/plain-rot13')])

   def input_handler(self):
     try:
       data = self.environ['wsgi.input'].read()
       self.buffer.append(data)
       if self.resume:
         self.resume()
         self.resume = None # Are resumes one-hit or "re-entrant"?
     except EndOfStream:
       self.end_of_stream = True

   def __iter__(self):
     return self

   def next(self):
     if async:
       if self.buffer:
         return self.buffer.pop().encode('rot-13')
       else:
         if self.end_of_stream:
           raise StopIteration
         else:
           self.resume = self.pause_output()
           return ""
     else:
       try:
         return self.in_stream.read().encode('rot-13')
       except EndOfStream:
         raise StopIteration

In this way, there could be a middleware component below the 
rot13_streamer in the stack that, say, does chunked_transfer encoding 
and decoding. It would be the same in form as the above, except that it 
would

1. Change the environ entry for 'wsgi.async_input_handler' to be its own 
callable that records the callback for the next layer up in the stack, 
the rot13_streamer.input_handler.

2. Create its own buffer, into which it will store chunks decoded from 
the input stream. This buffer, e.g. a StringIO, then replaces 
'wsgi.input' in the environ passed to next middleware component up.

3. When chunks arrive from the client, the server calls the dechunker 
input_handler. This reads the (possibly partial) chunk from the stream, 
decodes it and stores it in its StringIO buffer.

4. When it has a complete chunk it calls the input_handler of the next 
component in the stack, which will then read the decoded chunk from its 
wsgi.input stream, i.e. the dechunkers StringIO.

I think that this proposed approach is clean, and not overly complex for 
async or blocking programmers to handle.

But I think we do have to cleanly separate the two. I think there are 
problems associated with trying to run *all* components seamlessly 
across async or blocking servers. I think that middleware components 
that are always going to behave correctly in an async situation will 
have to be designed like that from the ground up. It's dangerous to take 
components written in a blocking environment and run them in an async 
environment.

And lastly, if it is desired to spin jobs into a different thread, e.g. 
the rot-13 job above, then that should be a middleware concern, not the 
WSGI server's. So if a twisted component wants to pass a job to a 
service thread, some other twisted comonent lower down the stack, 
possibly the framework itself, must have already created the 
threads/queues to enable this. The twisted rot-13 component would then 
have very thin methods (run from the server's main thread) which 
interact with the twisted space i.e. transferring data and receiving 
data back through queues, and layer WSGI semantics on those 
interactions, i.e. pause_output, yield result, yield empty_string, etc.

When I described your approach as "pulling data up the stack", I saw a 
bigger difference between the two approaches. I'm thinking now that 
there is little difference between our proposals, except that in mine 
it's the bottom component that gets notified of the input by the server, 
and in yours it's the top component. Though I suppose having the top 
component pulling input from an iterator chain mirrors nicely the 
situation where the server pulls output from an iterator chain.

And my approach basically entails a bunch of nested calls, which might 
be less efficient elegant than if, say, generators were used in an input 
processing chain.

You're right again Phillip :-)

Regards,

Alan.
From py-web-sig at xhaus.com  Fri Sep 17 00:12:28 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Fri Sep 17 00:07:04 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <414A088B.7040601@xhaus.com>
References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
	<414A088B.7040601@xhaus.com>
Message-ID: <414A0FCC.50502@xhaus.com>

[Alan Kennedy]
> Though I suppose having the top 
> component pulling input from an iterator chain mirrors nicely the 
> situation where the server pulls output from an iterator chain.

Which also means that the top component must be prepared to receive "" 
from the component below it in the input chain.

Say for example that the headers for a new chunk body arrive on the 
client socket, but not a chunk-encoded body, yet.

The top iterator, e.g. the uploaded-file processor, pulls data from the 
component below it, which is say the dechunker. The dechunker will read 
the headers and get the relevant metadata for the chunk. But since there 
is no actual data available now, it must yield "" to the next component up.

I was wondering if we might need to mirror the pause/resume facility on 
the input stream. But it's not a required, because the application is 
getting a callback directly from the server when there is data 
available. It's just that the data on socket that gave rise to the 
notification may not translate to actual data for the called application.

Regards,

Alan.

From pje at telecommunity.com  Fri Sep 17 00:37:39 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 17 00:36:43 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <414A088B.7040601@xhaus.com>
References: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com>

At 10:41 PM 9/16/04 +0100, Alan Kennedy wrote:

>In this way, there could be a middleware component below the 
>rot13_streamer in the stack that, say, does chunked_transfer encoding and 
>decoding. It would be the same in form as the above, except that it would

FYI, middleware and apps are now banned from dealing in any kind of 
transfer-encodings, per James' very valuable input on that subject.  Like 
connection properties, these should be the exclusive province of the actual 
web server.


>1. Change the environ entry for 'wsgi.async_input_handler' to be its own 
>callable that records the callback for the next layer up in the stack, the 
>rot13_streamer.input_handler.

This would lead to the unacceptable situation of every middleware component 
having to know in principle about extensions.  The "Server Extension APIs" 
section of the PEP demands that any "bypass" API verify replacement for 
this very reason.


>I think that this proposed approach is clean, and not overly complex for 
>async or blocking programmers to handle.

Unless of course they're writing middleware that does something with the input.


>But I think we do have to cleanly separate the two. I think there are 
>problems associated with trying to run *all* components seamlessly across 
>async or blocking servers. I think that middleware components that are 
>always going to behave correctly in an async situation will have to be 
>designed like that from the ground up. It's dangerous to take components 
>written in a blocking environment and run them in an async environment.

It is a non-goal for WSGI to support running multiple requests 
simultaneously in a single-threaded asynchronous server, so the issue 
doesn't really come up.  A WSGI server *must* allow for the fact that WSGI 
apps use up a thread while they're running or producing a value: that's the 
price of being able to run "traditional" web applications under WSGI.


>And lastly, if it is desired to spin jobs into a different thread, e.g. 
>the rot-13 job above, then that should be a middleware concern, not the 
>WSGI server's.

I agree with you -- for *asynchronous* applications.  Synchronous web 
applications are the default case in WSGI and the world in general, so 
servers *must* use a thread pool to start applications and to run 'next()' 
calls, if they are asynchronous.  But, asynchronous applications wish to 
yield control, to avoid hogging resources in that thread pool, so they need 
to delegate the work to their I/O thread, and then yield an empty string to 
pause output, freeing up that thread for another iterable next(), or 
application start.

Notice, however, that if the server is *synchronous* (e.g. CGI, 
single-threaded FastCGI containers, mod_python under Apache 1.x, etc., ), 
then this is a complete waste of time, because you'll only be running one 
simultaneous request in this process anyway, so you're spinning off a 
second thread to keep from tying up the first thread, but all the first 
thread is doing is waiting for the second thread to finish!  This is 
wasteful, to say the least.

The only case where pausing output (whether for unrelated network I/O, or 
because of a need to read from the input stream) is actually useful is when 
the server is *also* asynchronous -- hence the value of making such pausing 
an optional extension API.  The application can then detect when it's 
*useful* to pause, and synchronous applications needn't worry about it.

Of course, even if the server and application are *both* asynchronous, 
that's no guarantee that they're using compatible event loops!  If you try 
to run a Twisted app under asyncore or vice versa, you're going to be 
spinning off an extra thread to run a second event loop, so there's a bit 
of a trade-off to determining whether your asynchrony is going to actually 
*gain* anything.  But that's a separate question.  WSGI will allow you to 
be asynchronous if you really want to, no matter how bad an idea it might 
be in some cases.  :)


>The twisted rot-13 component would then have very thin methods (run from 
>the server's main thread) which interact with the twisted space i.e. 
>transferring data and receiving data back through queues, and layer WSGI 
>semantics on those interactions, i.e. pause_output, yield result, yield 
>empty_string, etc.

You're pretty much describing what I suggested earlier: that async app 
frameworks like Twisted may want to have a model whereby a generic "thin 
wrapper" WSGI application object is used to communicate with an application 
that's written using the underlying framework's async idioms.  So, for 
example, one might perhaps design a Twisted "Transport" that was 
implemented as a WSGI application.  (I don't know if "Transport" is really 
the correct abstraction to use, I'm just giving an example here.)

Anyway, for such a thing to really work, I think you might need 
server-specific reactor plugins, to integrate Twisted's event loop with 
that of the server.


>When I described your approach as "pulling data up the stack", I saw a 
>bigger difference between the two approaches. I'm thinking now that there 
>is little difference between our proposals, except that in mine it's the 
>bottom component that gets notified of the input by the server, and in 
>yours it's the top component. Though I suppose having the top component 
>pulling input from an iterator chain mirrors nicely the situation where 
>the server pulls output from an iterator chain.

Actually, I'm saying you pull data *down* the stack.  The bottom-most 
application iterator calls 'read()' on an input stream provided by a parent 
middleware component, which then calls read on a higher-level component, 
and so on.


>And my approach basically entails a bunch of nested calls, which might be 
>less efficient elegant than if, say, generators were used in an input 
>processing chain.
>
>You're right again Phillip :-)

Not entirely, actually.  For my approach to really work, the middleware 
would have to be guaranteed to return something from read(), as long as the 
parent's read() returns something.  Otherwise, the resumption would block, 
unless the middleware were much smarter.  I've got to think about it some 
more, because right now I'm still not happy with the specifics of any of 
the proposals for pausing and resuming output.

From pje at telecommunity.com  Fri Sep 17 00:59:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 17 00:58:31 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com>
References: <414A088B.7040601@xhaus.com>
	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916184337.02ec9680@mail.telecommunity.com>

At 06:37 PM 9/16/04 -0400, Phillip J. Eby wrote:
>Not entirely, actually.  For my approach to really work, the middleware 
>would have to be guaranteed to return something from read(), as long as 
>the parent's read() returns something.  Otherwise, the resumption would 
>block, unless the middleware were much smarter.  I've got to think about 
>it some more, because right now I'm still not happy with the specifics of 
>any of the proposals for pausing and resuming output.

Aha!  There's the problem.  The 'read()' protocol is what's wrong.  If 
'wsgi.input' were an *iterator* instead of a file-like object, it would be 
fairly straightforward for async servers to implement "would block" reads 
as yielding empty strings.  And, servers could actually support streaming 
input via chunked encoding, because they could just yield blocks once 
they've arrived.

The downside to making 'wsgi.input' an iterator is that you lose control 
over how much data to read at a time: the upstream server or middleware 
determines how much data you get.  But, it's quite possible to make a 
buffering, file-like wrapper over such an iterator, if that's what you 
really need, and your code is synchronous.  (This will slightly increase 
the coding burden for interfacing applications and frameworks that expect 
to have a readable stream for CGI input.)  For asynchronous code, you're 
just going to invoke some sort of callback with each block, and it's the 
callback's job to deal with it.

What does everybody think?  If combined with a "pause iterating me until 
there's input data available" extension API, this would let the input 
stream be non-blocking, and solve the chunked-encoding input issue all in 
one change to the protocol.  Or am I missing something here?

From py-web-sig at xhaus.com  Fri Sep 17 01:04:36 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Fri Sep 17 00:59:19 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com>
References: <5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
	<5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com>
Message-ID: <414A1C04.9010306@xhaus.com>

[Alan Kennedy]
>> When I described your approach as "pulling data up the stack", I saw a 
>> bigger difference between the two approaches. I'm thinking now that 
>> there is little difference between our proposals, except that in mine 
>> it's the bottom component that gets notified of the input by the 
>> server, and in yours it's the top component. Though I suppose having 
>> the top component pulling input from an iterator chain mirrors nicely 
>> the situation where the server pulls output from an iterator chain.

[Phillip J. Eby]
> Actually, I'm saying you pull data *down* the stack.  The bottom-most 
> application iterator calls 'read()' on an input stream provided by a 
> parent middleware component, which then calls read on a higher-level 
> component, and so on.

Hmm. That only makes sense to me if your stacks grow downwards :-)

In my mental picture, stacks grow upwards. The server is level ground, 
and each middleware component is placed on top of the other, with the 
"most wrapped" component at the top.

So to me what your description above says is that the component closest 
to the server is the one that gets to see the input last, after all the 
more wrapped components, with the most wrapped component getting first 
dibs on the input. Which doesn't make sense to me.

Perhaps your stacks grow downwards?

Anyway, I *think* we're talking about the same thing.

Which leads onto the next question: Why not insist on an iterable for 
the input stream as well as the output stream. It appears to me that 
there should be symmetry between the output write()/iterable split and 
the input read()/iterable split.

Regards,

Alan.
From pje at telecommunity.com  Fri Sep 17 01:08:42 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 17 01:07:41 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <414A1C04.9010306@xhaus.com>
References: <5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com>
	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916155713.03963460@mail.telecommunity.com>
	<5.1.1.6.0.20040916180735.02fa5e80@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916190734.02f8aec0@mail.telecommunity.com>

At 12:04 AM 9/17/04 +0100, Alan Kennedy wrote:
>Which leads onto the next question: Why not insist on an iterable for the 
>input stream as well as the output stream. It appears to me that there 
>should be symmetry between the output write()/iterable split and the input 
>read()/iterable split.

Looks like you had the same "aha" as I just did a few minutes ago, so I'll 
take your comment as a +1 on that approach.  :)

From floydophone at gmail.com  Fri Sep 17 01:16:26 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Fri Sep 17 01:16:32 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
Message-ID: <6654eac4040916161612849362@mail.gmail.com>

Alan, that design looks okay. A bit complex, but it works well once
you sit down to look at it.

It would be nice if applications that didn't need a separate thread
didn't use one up, so performance-oriented programmers (like the
Twisted/Nevow guys) won't be able to have that excuse. Perhaps
start_response() could have a "threaded" boolean optional argument
that defaults to true which decides whether or not the iterable will
be called in a separate thread. This, of course, requires that the
application callable itself doesn't have any blocking code.

Does this requirement overcomplicate things?
From pje at telecommunity.com  Fri Sep 17 01:57:59 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 17 01:56:59 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <6654eac4040916161612849362@mail.gmail.com>
Message-ID: <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com>

At 07:16 PM 9/16/04 -0400, Peter Hunt wrote:
>Alan, that design looks okay. A bit complex, but it works well once
>you sit down to look at it.
>
>It would be nice if applications that didn't need a separate thread
>didn't use one up, so performance-oriented programmers (like the
>Twisted/Nevow guys) won't be able to have that excuse. Perhaps
>start_response() could have a "threaded" boolean optional argument
>that defaults to true which decides whether or not the iterable will
>be called in a separate thread. This, of course, requires that the
>application callable itself doesn't have any blocking code.
>
>Does this requirement overcomplicate things?

Yes.  The vast majority of existing web applications are synchronous, and 
so are a significant number of Python web server environments that would 
run WSGI applications.  Therefore the WSGI "common case" is to have 
synchronous behavior, and WSGI is most efficient with either a synchronous 
server/gateway, or a "half-async" server/gateway (i.e., one that runs 
application code in a thread pool, separate from the main I/O thread.)

The few applications that can behave in a non-blocking fashion, can and 
should use the iterable interface to provide their output, producing empty 
strings when they are not yet ready to produce output.  (Plus, when such 
applications are run in a synchronous server or gateway, they might as well 
behave synchronously, since they will actually incur more overhead by 
trying to be asynchronous!)

The only scenario that isn't served by this approach is a single-threaded, 
asynchronous server with no threading capability.  However, such a server 
*cannot* be WSGI-compatible and still serve multiple requests, and there is 
no way around that without forcing *every* application to be asynchronous, 
which just isn't an acceptable tradeoff.  The idea of having a flag 
(whether passed to start_response, or introspected on the application 
object, etc.) doesn't help the fact that the server still has to be able to 
*have* multiple threads in such a case.

Note, by the way, that the need for a second thread is caused by having a 
possible difference between the synchrony model of a server and an 
application.  That is, if both are synchronous or both are asynchronous, no 
threading is required.  However, a server is not limited to running just 
*one* application, so in the general case, a given server has to be able to 
handle both.

However, since the common case is for apps to be synchronous, then the 
common case for an asynchronous server is that it must be threaded, and the 
common case for a synchronous server is that it need not be 
threaded.  Thus, logically, the case of an asynchronous application is the 
"odd one out", in the sense that it is the only one that ever forces 
additional threading, beyond what was inherently required for that server 
model.

In other words, an async server has to have threading in the common case, 
and a synchronous application doesn't.  So, an async app in an async server 
doesn't *add* any threading requirement: the async server already has to 
have an I/O thread and at least one application thread.  And a synchronous 
app doesn't add any additional threading requirements to either kind of 
server, for the same reason.  Only an asynchronous application in a 
synchronous server forces any extra overhead beyond the effective default 
required threading configuration.  Thus, it makes sense (to me, anyway) to 
in that case put the burden on the asynchronous application to manage 
communication with its extra thread, if any, or to have it adapt to local 
circumstances and behave synchronously (since that's more efficient in that 
case).

But in the end, all of this comes down to a basically simple idea: I think 
that in WSGI, synchronous applications should be simple, and asynchronous 
applications possible, because that will best support the goals of the PEP.

From floydophone at gmail.com  Fri Sep 17 02:37:22 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Fri Sep 17 02:37:29 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com>
References: <6654eac4040916161612849362@mail.gmail.com>
	<5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com>
Message-ID: <6654eac4040916173766fa4cf1@mail.gmail.com>

Yes, but an async app running in an async server in a thread is
overkill, don't you think? We don't need to spawn an extra thread to
run it. I'm not talking about "possible", I'm talking about "optimal".


On Thu, 16 Sep 2004 19:57:59 -0400, Phillip J. Eby
<pje@telecommunity.com> wrote:
> 
> 
> At 07:16 PM 9/16/04 -0400, Peter Hunt wrote:
> >Alan, that design looks okay. A bit complex, but it works well once
> >you sit down to look at it.
> >
> >It would be nice if applications that didn't need a separate thread
> >didn't use one up, so performance-oriented programmers (like the
> >Twisted/Nevow guys) won't be able to have that excuse. Perhaps
> >start_response() could have a "threaded" boolean optional argument
> >that defaults to true which decides whether or not the iterable will
> >be called in a separate thread. This, of course, requires that the
> >application callable itself doesn't have any blocking code.
> >
> >Does this requirement overcomplicate things?
> 
> Yes.  The vast majority of existing web applications are synchronous, and
> so are a significant number of Python web server environments that would
> run WSGI applications.  Therefore the WSGI "common case" is to have
> synchronous behavior, and WSGI is most efficient with either a synchronous
> server/gateway, or a "half-async" server/gateway (i.e., one that runs
> application code in a thread pool, separate from the main I/O thread.)
> 
> The few applications that can behave in a non-blocking fashion, can and
> should use the iterable interface to provide their output, producing empty
> strings when they are not yet ready to produce output.  (Plus, when such
> applications are run in a synchronous server or gateway, they might as well
> behave synchronously, since they will actually incur more overhead by
> trying to be asynchronous!)
> 
> The only scenario that isn't served by this approach is a single-threaded,
> asynchronous server with no threading capability.  However, such a server
> *cannot* be WSGI-compatible and still serve multiple requests, and there is
> no way around that without forcing *every* application to be asynchronous,
> which just isn't an acceptable tradeoff.  The idea of having a flag
> (whether passed to start_response, or introspected on the application
> object, etc.) doesn't help the fact that the server still has to be able to
> *have* multiple threads in such a case.
> 
> Note, by the way, that the need for a second thread is caused by having a
> possible difference between the synchrony model of a server and an
> application.  That is, if both are synchronous or both are asynchronous, no
> threading is required.  However, a server is not limited to running just
> *one* application, so in the general case, a given server has to be able to
> handle both.
> 
> However, since the common case is for apps to be synchronous, then the
> common case for an asynchronous server is that it must be threaded, and the
> common case for a synchronous server is that it need not be
> threaded.  Thus, logically, the case of an asynchronous application is the
> "odd one out", in the sense that it is the only one that ever forces
> additional threading, beyond what was inherently required for that server
> model.
> 
> In other words, an async server has to have threading in the common case,
> and a synchronous application doesn't.  So, an async app in an async server
> doesn't *add* any threading requirement: the async server already has to
> have an I/O thread and at least one application thread.  And a synchronous
> app doesn't add any additional threading requirements to either kind of
> server, for the same reason.  Only an asynchronous application in a
> synchronous server forces any extra overhead beyond the effective default
> required threading configuration.  Thus, it makes sense (to me, anyway) to
> in that case put the burden on the asynchronous application to manage
> communication with its extra thread, if any, or to have it adapt to local
> circumstances and behave synchronously (since that's more efficient in that
> case).
> 
> But in the end, all of this comes down to a basically simple idea: I think
> that in WSGI, synchronous applications should be simple, and asynchronous
> applications possible, because that will best support the goals of the PEP.
> 
>
From dp at ulaluma.com  Fri Sep 17 02:39:36 2004
From: dp at ulaluma.com (Donovan Preston)
Date: Fri Sep 17 02:40:12 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
References: <5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
Message-ID: <0CD0CE45-0842-11D9-AF84-000A95864FC4@ulaluma.com>


On Sep 16, 2004, at 1:41 PM, Phillip J. Eby wrote:

>     resume = environ['wsgi.pause_output']()
>
> Where 'resume' is then a callback function that can be invoked to 
> resume iteration.  This keeps it to a single extension key, helps 
> ensure the correct sequence of actions, and makes it easier to 
> implement in some cases, while not making other cases any harder.

Well, I guess I sparked some discussion here. Great! I am +1 on the 
above construct, calling pause_output and yielding an empty string. I'm 
glad this technique came up because I hadn't paid enough attention to 
the environ dict and how it could be used to do something like this.

I think with servers providing a pause_output callable like this, 
asynchronous applications will be possible and the isolation between 
the layers can be preserved. I am going to try writing some code using 
this construct and provide further feedback after I do.

dp

From pje at telecommunity.com  Fri Sep 17 02:58:37 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 17 02:58:00 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <6654eac4040916173766fa4cf1@mail.gmail.com>
References: <5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com>
	<6654eac4040916161612849362@mail.gmail.com>
	<5.1.1.6.0.20040916192001.025e1570@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916204439.026362f0@mail.telecommunity.com>

At 08:37 PM 9/16/04 -0400, Peter Hunt wrote:
>Yes, but an async app running in an async server in a thread is
>overkill, don't you think? We don't need to spawn an extra thread to
>run it. I'm not talking about "possible", I'm talking about "optimal".

Nothing in the spec stops an async server from providing a configuration 
option to say, "this app+middleware combination is completely non-blocking, 
so don't bother running it in a separate thread".  I've just been speaking 
about the general case, and what the server is required to do to support 
the general case of "an arbitrary WSGI application", with no additional 
information.

In the same way, nothing in the spec stops servers from providing 
per-application configuration options for any number of extended behaviors; 
WSGI is a starting point for server capabilities, not an ending point.

Still, I will admit that I tend to speak of things almost as if WSGI were 
an ending point, because I just assume we're talking about what the spec 
should or should not *require* or *forbid*.  When a use case doesn't need 
any "musts" or "must nots" added (like your use case above), I tend not to 
focus on it directly, because it seems obvious to me that anybody can add 
it on if they like, as a server-specific extension.

So, this may lead sometimes to people getting the impression WSGI doesn't 
allow a use case that in fact it does; it's just that the use case should 
be implemented using an optional extension, rather than being considered a 
common case and made into a requirement.  If I tried to enumerate every 
possible optional extension to WSGI, I'd go mad sooner than you can say 
"Content-Transfer-Encoding".  :)

From pje at telecommunity.com  Fri Sep 17 03:02:31 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Sep 17 03:01:31 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <0CD0CE45-0842-11D9-AF84-000A95864FC4@ulaluma.com>
References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040916205856.02637d40@mail.telecommunity.com>

At 08:39 PM 9/16/04 -0400, Donovan Preston wrote:

>On Sep 16, 2004, at 1:41 PM, Phillip J. Eby wrote:
>
>>     resume = environ['wsgi.pause_output']()
>>
>>Where 'resume' is then a callback function that can be invoked to resume 
>>iteration.  This keeps it to a single extension key, helps ensure the 
>>correct sequence of actions, and makes it easier to implement in some 
>>cases, while not making other cases any harder.
>
>Well, I guess I sparked some discussion here. Great! I am +1 on the above 
>construct, calling pause_output and yielding an empty string. I'm glad 
>this technique came up because I hadn't paid enough attention to the 
>environ dict and how it could be used to do something like this.
>
>I think with servers providing a pause_output callable like this, 
>asynchronous applications will be possible and the isolation between the 
>layers can be preserved. I am going to try writing some code using this 
>construct and provide further feedback after I do.

Keep in mind that this is proposed as an optional construct, so if the 
server doesn't provide it, the application iterable will either need to be 
okay being next()-ed repeatedly, or else "go synchronous" and either do the 
work in-thread or block on a queue from the I/O thread.

And, until I get some feedback on the other part of this (making 
'wsgi.input' an iterator too, and having a way to "pause until input"), I'm 
not ready to add this to the PEP as 'wsgi.pause_output'.  But again, 
nothing stops a server from providing e.g. a 'twisted.pause_output' 
extension API, with whatever semantics you'd like it to have.

From floydophone at gmail.com  Fri Sep 17 03:41:14 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Fri Sep 17 03:41:20 2004
Subject: [Web-SIG] Updated WSGIHTTPServer.py
Message-ID: <6654eac404091618414494b1bb@mail.gmail.com>

I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest
PEP posted on python.org.

http://st0rm.hopto.org/wsgi/
From pje at telecommunity.com  Thu Sep 23 03:01:38 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 23 03:00:32 2004
Subject: [Twisted-web] Re: [Web-SIG] WSGI woes
In-Reply-To: <0CD0CE45-0842-11D9-AF84-000A95864FC4@ulaluma.com>
References: <5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040915190031.0215e7d0@mail.telecommunity.com>
	<5.1.1.6.0.20040916012218.02145cf0@mail.telecommunity.com>
	<5.1.1.6.0.20040916111612.0215fec0@mail.telecommunity.com>
	<5.1.1.6.0.20040916131647.0213d820@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040922205847.024ffde0@mail.telecommunity.com>

At 08:39 PM 9/16/04 -0400, Donovan Preston wrote:

>On Sep 16, 2004, at 1:41 PM, Phillip J. Eby wrote:
>
>>     resume = environ['wsgi.pause_output']()
>>
>>Where 'resume' is then a callback function that can be invoked to resume 
>>iteration.  This keeps it to a single extension key, helps ensure the 
>>correct sequence of actions, and makes it easier to implement in some 
>>cases, while not making other cases any harder.
>
>Well, I guess I sparked some discussion here. Great! I am +1 on the above 
>construct, calling pause_output and yielding an empty string. I'm glad 
>this technique came up because I hadn't paid enough attention to the 
>environ dict and how it could be used to do something like this.
>
>I think with servers providing a pause_output callable like this, 
>asynchronous applications will be possible and the isolation between the 
>layers can be preserved. I am going to try writing some code using this 
>construct and provide further feedback after I do.

So...  how'd it work out?  :)

From pje at telecommunity.com  Thu Sep 23 03:56:36 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 23 03:55:31 2004
Subject: [Web-SIG] A more Twisted approach to async apps in WSGI
Message-ID: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com>

Hi all.  I've been away for a few days due to loss of e-mail service when 
my dedicated server lost a hard drive.  Unfortunately my ISP didn't support 
the OS version any more, so I had to rebuild everything for the new OS version.

Anyway, on to the topic of my post.  Should 'wsgi.input' become an 
iterator?  Or should we develop a different API for asynchronous applications?

On the positive side of the iterator approach, it could make it easier for 
asynchronous applications to pause waiting for input, and it could in 
principle support "chunked" transfer encoding of the input stream.

However, since we last discussed this, I did some Googling on CGI and 
chunked encoding.  By far and away, the most popular links regarding 
chunked encoding and CGI, are all about bugs in IIS and Apache leading to 
various vulnerabilities when chunked encoding is used.  :(

Once you get past those items (e.g. by adding "-IIS -vulnerability" to your 
search), you then find *our* discussion here on the Web-SIG!  Finally, 
digging further, I found some 1998 discussion from the IPP (Internet 
Printing Protocol!) mailing list about what HTTP/1.1 servers support 
chunked encoding for CGI and which don't.

Anyway, the long and short of it is that CGI and chunked encoding are quite 
simply incompatible, which means that relying on its availability would be 
nonportable in a WSGI application anyway.

That leaves the asynchronous use case, but the benefit is rather strained 
at that point.  Many frameworks reuse the 'cgi' module's 'FieldStorage' 
class in order to parse browser input, and the 'cgi' module's 
implementation requires an object with a 'readline()' method.  That means 
that if we switch from an input stream to an iterator, a lot of people are 
going to be trying to make sensible wrappers to convert the iterator back 
to an input stream, and that's just getting ridiculous, especially since in 
many cases the server or gateway has a file-like object to start with.

So, I'm thinking we should shift the burden to an async-specific API.  But, 
in this case, "burden" means that we get to give asynchronous apps an API 
much more suited to their use cases.

Suppose that we did something similar to 'wsgi.file_wrapper'?  That is, 
suppose we had an optional extension that a server could provide, to wrap 
specialized application object(s) in a fashion that then provides backward 
compatibility to the spec?

That is, suppose we had a 'wsgi.async_wrapper', used like this:

     if 'wsgi.async_wrapper' in environ:
         controller=environ['wsgi.async_wrapper'](environ)
         # do stuff with controller, like register its
         # methods as callbacks
         return controller

The idea is that this would create an iterator that the server/gateway 
could recognize as "special", similar to the file-wrapper trick.  But, the 
object returned would provide an extra API for use by the asynchronous 
application, maybe something like:

     put(data) -- queue data for retrieval when the controller is iterated over

     finish() -- mark the iterator finished, so it raises StopIteration

     on_get(length,callback) -- call 'callback(data)' when 'length' bytes 
are available on 'wsgi.input' (but return immediately from the 'on_get()' call)

While this API is an optional extension, it seems it would be closer to 
what some async fans wanted, and less of a kludge.  It won't do away with 
the possibility that middleware might block waiting for input, of course, 
but when no middleware is present or the middleware isn't transforming the 
input stream, it should work out quite well.

In any case, the implementation of the methods and the iterator interface 
are pretty straightforward, either for synchronous or asynchronous servers.

What do y'all think?  I'd especially like feedback from Twisted folk, as to 
whether this looks anything like the right kind of API for async apps.  (I 
expect it will need some tweaking and tuning.)

But if this is the overall right approach, I'd like to drop the current 
proposals to make 'wsgi.input' an iterator and add optional 
'pause'/'resume' APIs, since they were rather kludgy compared to giving 
async apps their own mini-API for nonblocking I/O.

Comments?  Questions?

From pje at telecommunity.com  Thu Sep 23 04:41:49 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 23 04:40:44 2004
Subject: [Web-SIG] Updated WSGIHTTPServer.py
Message-ID: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com>

 >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest
 >PEP posted on python.org.
 >
 >http://st0rm.hopto.org/wsgi/

FYI, there's an error in your WSGIHTTPServer implementation: it sends a 
'Status: XXX etc' header to the client, but the correct format for HTTP is 
just the "XXX etc" part.  Looks like you might've copied that part from the 
PEP's CGI example.  This error is probably being masked by the fact that 
you're also sending the status to the client when start_response is 
initially called, rather than delaying until the first write operation or 
non-empty yielded string.  Also, 'start_response' doesn't actually re-raise 
'exc_info' as it should; it only prints the exception to stderr.

You should also not use 'map()' to wrap the application result 
iterator.  It's not illegal, but it's ill-advised since an application is 
allowed to produce an unlimited number of empty strings in its output, 
resulting in unbounded growth of the list that could use up arbitrarily 
large amounts of memory.

Finally, while this is not a violation of the spec in any way, I notice 
that your approach to loading application scripts will recompile and reload 
them on every hit.  I don't know if this was intentional or not.

Oh, and one last thing...  you're checking for 'HTTPS=on' in the 
environment, but that's not where it would be found, because your code is 
the only code that could set it.  I don't know if the stdlib HTTP server 
supports HTTPS, but if it does, you should check the appropriate attribute 
or method instead.  Otherwise, it suffices to always set 'wsgi.url_scheme' 
to "http".

From wilk-ml at flibuste.net  Thu Sep 23 10:33:33 2004
From: wilk-ml at flibuste.net (William Dode)
Date: Thu Sep 23 10:33:34 2004
Subject: [Web-SIG] Re: [Twisted-web] A more Twisted approach to async apps
	in WSGI
In-Reply-To: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com>
	(Phillip J. Eby's message of "Wed, 22 Sep 2004 21:56:36 -0400")
References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com>
Message-ID: <87wtyliduq.fsf@blakie.riol>

"Phillip J. Eby" <pje@telecommunity.com> writes:

> Hi all.  I've been away for a few days due to loss of e-mail service
> when my dedicated server lost a hard drive.  Unfortunately my ISP
> didn't support the OS version any more, so I had to rebuild everything
> for the new OS version.
>
> Anyway, on to the topic of my post.  Should 'wsgi.input' become an
> iterator?  Or should we develop a different API for asynchronous
> applications?
>
> On the positive side of the iterator approach, it could make it easier
> for asynchronous applications to pause waiting for input, and it could
> in principle support "chunked" transfer encoding of the input stream.
>
> However, since we last discussed this, I did some Googling on CGI and
> chunked encoding.  By far and away, the most popular links regarding
> chunked encoding and CGI, are all about bugs in IIS and Apache leading
> to various vulnerabilities when chunked encoding is used.  :(
>
> Once you get past those items (e.g. by adding "-IIS -vulnerability" to
> your search), you then find *our* discussion here on the Web-SIG!
> Finally, digging further, I found some 1998 discussion from the IPP
> (Internet Printing Protocol!) mailing list about what HTTP/1.1 servers
> support chunked encoding for CGI and which don't.
>
> Anyway, the long and short of it is that CGI and chunked encoding are
> quite simply incompatible, which means that relying on its
> availability would be nonportable in a WSGI application anyway.

I don't understand the problem with an iterator on CGI. A CGI script is
by definition multi-process. If one block, a new script will be run and
anyway the first client will wait... If no one block, an iterator or not
will not change anything for him.

It will be up to the server to decide if he can use chunked encoding or
not. If the script block and doesn't use chunked encoding, it will be
not possible to run the script in cgi anyway... I know people who use
chunked encoding in cgi, they know what they do and it's fine, i'm sure
they will use iterator.

I don't see the difference between

[sleep...]
[sleep...]
[sleep...]
return data

and 

[sleep...]
yield
[sleep...]
yield
[sleep...]
yield

for a cgi script if it's not possible to don't sleep.

-- 
William Dod? - http://flibuste.net
From pje at telecommunity.com  Thu Sep 23 15:04:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Sep 23 15:03:29 2004
Subject: [Web-SIG] Re: [Twisted-web] A more Twisted approach to
	async apps in WSGI
In-Reply-To: <87wtyliduq.fsf@blakie.riol>
References: <5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com>
	<5.1.1.6.0.20040922204838.024f61c0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040923090205.02f25bd0@mail.telecommunity.com>

At 10:33 AM 9/23/04 +0200, William Dode wrote:

>I don't see the difference between
>
>[sleep...]
>[sleep...]
>[sleep...]
>return data
>
>and
>
>[sleep...]
>yield
>[sleep...]
>yield
>[sleep...]
>yield
>
>for a cgi script if it's not possible to don't sleep.

As previously discussed, the existence of an asynchronous API only matters 
for asynchronous servers and gateways.

From floydophone at gmail.com  Thu Sep 23 21:36:04 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Thu Sep 23 21:36:19 2004
Subject: [Web-SIG] Updated WSGIHTTPServer.py
In-Reply-To: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com>
References: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com>
Message-ID: <6654eac404092312365ff04728@mail.gmail.com>

Thanks for taking a look. I very very quickly upgraded it by ripping
out a lot of the spec's code, and my example app ran OK, so I put it
up.

I'll make those fixes soon.

Also, I'm pretty sure that execfile _will_ reload application scripts,
but I may be wrong.

On Wed, 22 Sep 2004 22:41:49 -0400, Phillip J. Eby
<pje@telecommunity.com> wrote:
> >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest
> >PEP posted on python.org.
> >
> >http://st0rm.hopto.org/wsgi/
> 
> FYI, there's an error in your WSGIHTTPServer implementation: it sends a
> 'Status: XXX etc' header to the client, but the correct format for HTTP is
> just the "XXX etc" part.  Looks like you might've copied that part from the
> PEP's CGI example.  This error is probably being masked by the fact that
> you're also sending the status to the client when start_response is
> initially called, rather than delaying until the first write operation or
> non-empty yielded string.  Also, 'start_response' doesn't actually re-raise
> 'exc_info' as it should; it only prints the exception to stderr.
> 
> You should also not use 'map()' to wrap the application result
> iterator.  It's not illegal, but it's ill-advised since an application is
> allowed to produce an unlimited number of empty strings in its output,
> resulting in unbounded growth of the list that could use up arbitrarily
> large amounts of memory.
> 
> Finally, while this is not a violation of the spec in any way, I notice
> that your approach to loading application scripts will recompile and reload
> them on every hit.  I don't know if this was intentional or not.
> 
> Oh, and one last thing...  you're checking for 'HTTPS=on' in the
> environment, but that's not where it would be found, because your code is
> the only code that could set it.  I don't know if the stdlib HTTP server
> supports HTTPS, but if it does, you should check the appropriate attribute
> or method instead.  Otherwise, it suffices to always set 'wsgi.url_scheme'
> to "http".
> 
>
From floydophone at gmail.com  Sun Sep 26 16:29:37 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Sun Sep 26 16:29:40 2004
Subject: [Web-SIG] Updated WSGIHTTPServer.py
In-Reply-To: <6654eac404092312365ff04728@mail.gmail.com>
References: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com>
	<6654eac404092312365ff04728@mail.gmail.com>
Message-ID: <6654eac404092607293c0b3e1e@mail.gmail.com>

I uploaded the fixed WSGIHTTPServer.py. I'm going to rework it pretty
substantially pretty soon (probably implemented using Medusa or
Twisted) and streamline it. It's pretty rough as it is right now, but
it works.


On Thu, 23 Sep 2004 15:36:04 -0400, Peter Hunt <floydophone@gmail.com> wrote:
> Thanks for taking a look. I very very quickly upgraded it by ripping
> out a lot of the spec's code, and my example app ran OK, so I put it
> up.
> 
> I'll make those fixes soon.
> 
> Also, I'm pretty sure that execfile _will_ reload application scripts,
> but I may be wrong.
> 
> 
> 
> On Wed, 22 Sep 2004 22:41:49 -0400, Phillip J. Eby
> <pje@telecommunity.com> wrote:
> > >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest
> > >PEP posted on python.org.
> > >
> > >http://st0rm.hopto.org/wsgi/
> >
> > FYI, there's an error in your WSGIHTTPServer implementation: it sends a
> > 'Status: XXX etc' header to the client, but the correct format for HTTP is
> > just the "XXX etc" part.  Looks like you might've copied that part from the
> > PEP's CGI example.  This error is probably being masked by the fact that
> > you're also sending the status to the client when start_response is
> > initially called, rather than delaying until the first write operation or
> > non-empty yielded string.  Also, 'start_response' doesn't actually re-raise
> > 'exc_info' as it should; it only prints the exception to stderr.
> >
> > You should also not use 'map()' to wrap the application result
> > iterator.  It's not illegal, but it's ill-advised since an application is
> > allowed to produce an unlimited number of empty strings in its output,
> > resulting in unbounded growth of the list that could use up arbitrarily
> > large amounts of memory.
> >
> > Finally, while this is not a violation of the spec in any way, I notice
> > that your approach to loading application scripts will recompile and reload
> > them on every hit.  I don't know if this was intentional or not.
> >
> > Oh, and one last thing...  you're checking for 'HTTPS=on' in the
> > environment, but that's not where it would be found, because your code is
> > the only code that could set it.  I don't know if the stdlib HTTP server
> > supports HTTPS, but if it does, you should check the appropriate attribute
> > or method instead.  Otherwise, it suffices to always set 'wsgi.url_scheme'
> > to "http".
> >
> >
>
From floydophone at gmail.com  Sun Sep 26 17:10:49 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Sun Sep 26 17:10:51 2004
Subject: [Web-SIG] Updated WSGIHTTPServer.py
In-Reply-To: <6654eac404092607293c0b3e1e@mail.gmail.com>
References: <5.1.1.6.0.20040922222550.02105820@mail.telecommunity.com>
	<6654eac404092312365ff04728@mail.gmail.com>
	<6654eac404092607293c0b3e1e@mail.gmail.com>
Message-ID: <6654eac4040926081030a7ada6@mail.gmail.com>

In addition, I fixed an embarrassing bug in which it deleted
querystrings. I'm going to improve on it a lot as time goes on: moving
away from using execfile and dealing with headers in a cleaner
fashion.

I also uploaded my testhttpserver.py script, which contains three
simple test scripts for it. It depends on my new middleware.py module,
something which may turn into a sort of WSGI middleware library. Maybe
we should collaborate on a "standard extensions" type of library?

By the way, to avoid embarrassing bugs such as mine, and since the
spec is finally nearing completion, we should write some unit tests to
ensure compatibility across WSGI implementations.

On Sun, 26 Sep 2004 10:29:37 -0400, Peter Hunt <floydophone@gmail.com> wrote:
> I uploaded the fixed WSGIHTTPServer.py. I'm going to rework it pretty
> substantially pretty soon (probably implemented using Medusa or
> Twisted) and streamline it. It's pretty rough as it is right now, but
> it works.
> 
> 
> 
> 
> On Thu, 23 Sep 2004 15:36:04 -0400, Peter Hunt <floydophone@gmail.com> wrote:
> > Thanks for taking a look. I very very quickly upgraded it by ripping
> > out a lot of the spec's code, and my example app ran OK, so I put it
> > up.
> >
> > I'll make those fixes soon.
> >
> > Also, I'm pretty sure that execfile _will_ reload application scripts,
> > but I may be wrong.
> >
> >
> >
> > On Wed, 22 Sep 2004 22:41:49 -0400, Phillip J. Eby
> > <pje@telecommunity.com> wrote:
> > > >I've updated WSGIHTTPServer.py and wsgicgi.py to reflect the latest
> > > >PEP posted on python.org.
> > > >
> > > >http://st0rm.hopto.org/wsgi/
> > >
> > > FYI, there's an error in your WSGIHTTPServer implementation: it sends a
> > > 'Status: XXX etc' header to the client, but the correct format for HTTP is
> > > just the "XXX etc" part.  Looks like you might've copied that part from the
> > > PEP's CGI example.  This error is probably being masked by the fact that
> > > you're also sending the status to the client when start_response is
> > > initially called, rather than delaying until the first write operation or
> > > non-empty yielded string.  Also, 'start_response' doesn't actually re-raise
> > > 'exc_info' as it should; it only prints the exception to stderr.
> > >
> > > You should also not use 'map()' to wrap the application result
> > > iterator.  It's not illegal, but it's ill-advised since an application is
> > > allowed to produce an unlimited number of empty strings in its output,
> > > resulting in unbounded growth of the list that could use up arbitrarily
> > > large amounts of memory.
> > >
> > > Finally, while this is not a violation of the spec in any way, I notice
> > > that your approach to loading application scripts will recompile and reload
> > > them on every hit.  I don't know if this was intentional or not.
> > >
> > > Oh, and one last thing...  you're checking for 'HTTPS=on' in the
> > > environment, but that's not where it would be found, because your code is
> > > the only code that could set it.  I don't know if the stdlib HTTP server
> > > supports HTTPS, but if it does, you should check the appropriate attribute
> > > or method instead.  Otherwise, it suffices to always set 'wsgi.url_scheme'
> > > to "http".
> > >
> > >
> >
>
From floydophone at gmail.com  Sun Sep 26 17:31:15 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Sun Sep 26 17:31:17 2004
Subject: [Web-SIG] PEP suggestions
Message-ID: <6654eac40409260831b7e05ec@mail.gmail.com>

I've been reading through the PEP, and I've been having trouble
following the code in some parts. Eventually, I got it, but I really
think that run_with_cgi() could use some heavy commenting. Perhaps the
other code samples could, too.
From pje at telecommunity.com  Mon Sep 27 04:24:48 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Sep 27 04:23:34 2004
Subject: [Web-SIG] PEP suggestions
In-Reply-To: <6654eac40409260831b7e05ec@mail.gmail.com>
Message-ID: <5.1.1.6.0.20040926222434.03880b90@mail.telecommunity.com>

At 11:31 AM 9/26/04 -0400, Peter Hunt wrote:
>I've been reading through the PEP, and I've been having trouble
>following the code in some parts. Eventually, I got it, but I really
>think that run_with_cgi() could use some heavy commenting. Perhaps the
>other code samples could, too.

Feel free to send diffs.  ;)

From paul.boddie at ementor.no  Mon Sep 27 13:13:12 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Sep 27 13:13:28 2004
Subject: [Web-SIG] WebStack 0.7
Message-ID: <0F4BD34E02639E428B4654DCBAB4502D0B1BAB@100NOOSLMSG004.common.alpharoot.net>

Hello,

Just a quick note to say that WebStack 0.7 has been released. More
information here:

http://www.python.org/pypi?%3Aaction=search&name=WebStack

Compared to previous releases, this one is a lot more strict and
specific about various things such as character encodings, request
parameters, authentication, cookies and so on, but additional
functionality has also been introduced: for example, Zope 2.x products
can now be written using the WebStack API.

Have fun,

Paul
From mnot at mnot.net  Tue Sep 28 20:02:13 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Tue Sep 28 20:02:18 2004
Subject: [Web-SIG] HTTP 1.1 trailers
Message-ID: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net>

I just realised that WGSI doesn't allow applications to send headers as 
trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's OK, 
as pretty much nobody uses them, and it would require a pretty radical 
change in WGSI's design to support them, but I think the PEP should 
mention it.

Cheers,

--
Mark Nottingham     http://www.mnot.net/

From foom at fuhm.net  Tue Sep 28 23:01:02 2004
From: foom at fuhm.net (James Y Knight)
Date: Tue Sep 28 23:01:06 2004
Subject: [Web-SIG] HTTP 1.1 trailers
In-Reply-To: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net>
References: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net>
Message-ID: <81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net>


On Sep 28, 2004, at 2:02 PM, Mark Nottingham wrote:

> I just realised that WGSI doesn't allow applications to send headers 
> as trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's 
> OK, as pretty much nobody uses them, and it would require a pretty 
> radical change in WGSI's design to support them, but I think the PEP 
> should mention it.

Nah, it's pretty easy for a webserver to add this feature as a WSGI 
extension, and for a client to do:
   if 'mycoolwebserver.set_trailers' in environ:
     environ['mycoolwebserver.set_trailers']([('Content-MD5', 
'blahblah')])

Since it's easy to add as an implementation specific enhancement, and 
since trailers are very close to completely useless, I don't think it 
really needs to be in the core standard.

James

From mnot at mnot.net  Tue Sep 28 23:06:09 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Tue Sep 28 23:06:13 2004
Subject: [Web-SIG] HTTP 1.1 trailers
In-Reply-To: <81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net>
References: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net>
	<81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net>
Message-ID: <387227E4-1192-11D9-88DC-000A95BD86C0@mnot.net>

/me hits head; good point.

Cheers,


On Sep 28, 2004, at 2:01 PM, James Y Knight wrote:

>
> On Sep 28, 2004, at 2:02 PM, Mark Nottingham wrote:
>
>> I just realised that WGSI doesn't allow applications to send headers 
>> as trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's 
>> OK, as pretty much nobody uses them, and it would require a pretty 
>> radical change in WGSI's design to support them, but I think the PEP 
>> should mention it.
>
> Nah, it's pretty easy for a webserver to add this feature as a WSGI 
> extension, and for a client to do:
>   if 'mycoolwebserver.set_trailers' in environ:
>     environ['mycoolwebserver.set_trailers']([('Content-MD5', 
> 'blahblah')])
>
> Since it's easy to add as an implementation specific enhancement, and 
> since trailers are very close to completely useless, I don't think it 
> really needs to be in the core standard.
>
> James
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net
>

--
Mark Nottingham     http://www.mnot.net/

From pje at telecommunity.com  Wed Sep 29 00:51:33 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 29 00:51:46 2004
Subject: [Web-SIG] HTTP 1.1 trailers
In-Reply-To: <81AAD541-1191-11D9-9E53-000A95A50FB2@fuhm.net>
References: <863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net>
	<863F06BA-1178-11D9-88DC-000A95BD86C0@mnot.net>
Message-ID: <5.1.1.6.0.20040928184536.02a884d0@mail.telecommunity.com>

At 05:01 PM 9/28/04 -0400, James Y Knight wrote:

>On Sep 28, 2004, at 2:02 PM, Mark Nottingham wrote:
>
>>I just realised that WGSI doesn't allow applications to send headers as 
>>trailers (RFC2616, 3.6.1 Chunked Transfer Coding). I think that's OK, as 
>>pretty much nobody uses them, and it would require a pretty radical 
>>change in WGSI's design to support them, but I think the PEP should mention it.
>
>Nah, it's pretty easy for a webserver to add this feature as a WSGI 
>extension, and for a client to do:
>   if 'mycoolwebserver.set_trailers' in environ:
>     environ['mycoolwebserver.set_trailers']([('Content-MD5', 'blahblah')])

It's actually a bit more complex than that, since it needs to follow the 
procedures for "safe exts", from paragraph 4 of:

    http://www.python.org/peps/pep-0333.html#server-extension-apis

Keep in mind that an intervening piece of middleware might want to munge 
some headers, and if it doesn't support the trailer extension, stuff can 
break.  Essentially, the set_trailers extension would need to take 
start_response as a parameter so it can ensure that middleware hasn't 
replaced it.

Anyway, this definitely falls into the "diminishing returns" bucket.

(By the way, James, did you see my proposal for "A more Twisted approach to 
async apps in WSGI"?  Do you think it's better than the previous "pause 
iteration" proposal, or worse?  I'd really like to get a WSGI async API 
nailed down soon so we can look into finalizing the PEP.)

From mnot at mnot.net  Wed Sep 29 02:24:59 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Wed Sep 29 02:25:19 2004
Subject: [Web-SIG] PEP 333 (19-Sep-04) Feedback
Message-ID: <FF6DFAAF-11AD-11D9-88DC-000A95BD86C0@mnot.net>

Overall, this PEP looks really good; these comments are mostly nits and 
editorial points to make it more precise, clear, etc.

* In "Specification Details," the start_response callable has 
illustrative arguments of "status" and "headers." It would be *very* 
helpful if the latter were called "response_headers," for clarity.

* The same section later states "The application object must return an 
iterable yielding strings." Return when? We're cautioned that the 
write() callable should not be used; how is the iterable returned, 
then?

* Later, "The server or gateway must not modify supplied strings in any 
way..." This effectively rules out the server/gateway implementing 
transfer-encodings, range requests, delta encoding, automatic content 
encoding, etc. Suggest dropping this paragraph; it doesn't really add 
any value, as servers that are malicious or incorrect in this respect 
won't really be stopped by it anyway.

* In "environ Variables," it is specified that "In general, a server or 
gateway should attempt to provide as many other CGI variables as are 
applicable, including e.g. the nonstandard SSL variables such as 
HTTPS=on , if an SSL connection is in effect." This sentence hedges in 
four different ways; "In general," "should," "attempt," "as many... as 
are applicable." Besides the redundancy, I'm concerned about the 
inclusion of nonstandard variables; how will people know which ones to 
include? I'd suggest listing those that aren't in the CGI standard, so 
there's an even playing field.

* Later in the same section, a construct called a 'stream' is defined. 
It would be good to directly relate this to a 'file-like object,' for 
the benefit of readers familiar with the terms used in the 
documentation of Python's standard library.

* The same section defines a number of environment variables with 
Boolean values (e.g., wsgi.multithread). When these definitions say 
"This value should be true if..." does it mean that they should be a 
Python types.BooleanType, or that it should evaluate to true (e.g., if 
wgsi.multithread: ...)?

* In 'Input and Error Streams', item 4 in the numbered list of notes to 
the table says 'Since the errors stream may not be rewound, a 
container..." This is the first instance of the term 'container'; could 
an existing term be used?

* In "The start_response() Callable", it says "The status argument is 
an HTTP "status" string like "200 OK" or "404 Not Found." This should 
reference the definition of status strings in the specification; 
suggest "The status argument is a string consisting of a Status-Code 
and a Reason-Phrase, in that order and separated by a single space, 
with no surrounding whitespace or other characters. See RFC2616, 
Section 6.1.1 for more information."

* In the next paragraph, "Each header_name must be a valid HTTP header 
name." For the same reasons as above, suggest "Each header_name must be 
a HTTP header field-name, as defined in RFC2616 Section 4.2."

* In the next paragraph, "If the application omits a needed header, the 
server or gateway should add it." Who determines whether it's needed? 
Suggest "If the application omits a header required by HTTP or other 
relevant
specifications in effect, the server or gateway must add it." (note 
must, not should)

* The next paragraph is confusingly worded; I'd suggest "The server or 
gateway must not actually transmit the HTTP headers until the first 
write call, or until after the first iteration of the application 
return value that yeilds a non-empty string...."

* "Buffering and Streaming," is, again, confusing about when an 
iterator is supposed to be returned.

* Finally, "Other HTTP Features" states that "In a sense, a server 
should consider itself to be like an HTTP 'proxy server'..." This isn't 
a good analogy; the function it performs is much closer to an HTTP 
gateway; See the terminology section of RFC2616.

Cheers,

--
Mark Nottingham     http://www.mnot.net/

From ianb at colorstudy.com  Wed Sep 29 06:47:55 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Sep 29 06:47:58 2004
Subject: [Web-SIG] WSGI tests
Message-ID: <415A3E7B.4020706@colorstudy.com>

I've written some code for testing WSGI applications and servers.  As 
before, it's at svn://colorstudy.com/trunk/WSGI , or 
http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/

The test so far has three parts.  There's a simple "echo" application; 
it actually does several things depending on what variables you give it. 
  There's a "lint" middleware.  It checks for both server and 
application compliance with WSGI.  Then there's a test that fetches 
pages (via urllib) and interprets the response (tests/echotest.py).

The idea is that these can be recombined in some ways.  The echo 
application will probably be expanded to do more things, and to better 
exercise WSGI; e.g., calling start_response twice, using write and 
iterators at the same time, etc.  It could also be expanded to perform 
illegal operations, e.g., call write inside the iterator, to see what 
happens in these cases.

Another option would be some middleware that takes the output of any 
application, and plays around with it to exercise all of WSGI.

Either way, the echo application could be implemented under different 
frameworks, and once it's implemented you could run these other tests 
against your framework.

Then there's the lint middleware.  This doesn't modify the request in 
any way (though it does wrap start_response and other objects).  It just 
checks various things; right now it mostly checks that required 
environmental variables are there and that everything is of the right 
type.  It doesn't test any of the more subtle aspects of WSGI, or test 
any failure cases.  It doesn't test the exc_info stuff either; I haven't 
kept up, and I only partly understand the motivation there.

Then there's the system/functional test (echotest).  Right now it's just 
a bunch of asserts, but I'll refactor it for unittest soon.  The idea is 
that in addition to doing some tests directly against echo, this also 
exercises portions that lint or other middleware is implicitly testing.

Anyway, that's what I got now.  Not a ton of code (despite this long 
email).  Suggestions welcome.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Wed Sep 29 06:57:46 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 29 06:58:05 2004
Subject: [Web-SIG] PEP 333 (19-Sep-04) Feedback
In-Reply-To: <FF6DFAAF-11AD-11D9-88DC-000A95BD86C0@mnot.net>
Message-ID: <5.1.1.6.0.20040929004017.02813a60@mail.telecommunity.com>

At 05:24 PM 9/28/04 -0700, Mark Nottingham wrote:
>Overall, this PEP looks really good; these comments are mostly nits and 
>editorial points to make it more precise, clear, etc.
>
>* In "Specification Details," the start_response callable has illustrative 
>arguments of "status" and "headers." It would be *very* helpful if the 
>latter were called "response_headers," for clarity.

Will do.


>* The same section later states "The application object must return an 
>iterable yielding strings." Return when?

When it's called, of course.  I'll change that to, "When called, the 
application object must..."


>  We're cautioned that the write() callable should not be used; how is the 
> iterable returned, then?

Huh?


>* Later, "The server or gateway must not modify supplied strings in any 
>way..." This effectively rules out the server/gateway implementing 
>transfer-encodings, range requests, delta encoding, automatic content 
>encoding, etc. Suggest dropping this paragraph; it doesn't really add any 
>value, as servers that are malicious or incorrect in this respect won't 
>really be stopped by it anyway.

I'll take out the modify supplied strings in any way part, but I think it's 
important to point out that the strings are binary byte sequences.  I'll 
consider some alternatives here.


>* In "environ Variables," it is specified that "In general, a server or 
>gateway should attempt to provide as many other CGI variables as are 
>applicable, including e.g. the nonstandard SSL variables such as HTTPS=on 
>, if an SSL connection is in effect." This sentence hedges in four 
>different ways; "In general," "should," "attempt," "as many... as are 
>applicable." Besides the redundancy, I'm concerned about the inclusion of 
>nonstandard variables; how will people know which ones to include? I'd 
>suggest listing those that aren't in the CGI standard, so there's an even 
>playing field.

Is there a standard for SSL extensions to CGI?  These are really the only 
"non-standard" variables I actually care about.  I'll tweak the rest of 
this more or less as you suggest.


>* Later in the same section, a construct called a 'stream' is defined. It 
>would be good to directly relate this to a 'file-like object,' for the 
>benefit of readers familiar with the terms used in the documentation of 
>Python's standard library.

Will do.


>* The same section defines a number of environment variables with Boolean 
>values (e.g., wsgi.multithread). When these definitions say "This value 
>should be true if..." does it mean that they should be a Python 
>types.BooleanType, or that it should evaluate to true (e.g., if 
>wgsi.multithread: ...)?

The latter; I thought this was obvious by virtue of the fact that it 
doesn't say ``True`` in typewriter font.  Good Python style (and 
performance) demands that one never perform truth tests by comparing 
directly to ``True`` or ``False``, so in theory it shouldn't matter unless 
you want to be tricky and use the value as an index.

Were you actually confused by this bit, or are you just looking for 
ambiguities?  I'd like to avoid cluttering these definitions further, if 
possible.


>* In 'Input and Error Streams', item 4 in the numbered list of notes to 
>the table says 'Since the errors stream may not be rewound, a 
>container..." This is the first instance of the term 'container'; could an 
>existing term be used?

Argh.  Pollution carried through from the original December 2003 
draft...  will fix.


>* In "The start_response() Callable", it says "The status argument is an 
>HTTP "status" string like "200 OK" or "404 Not Found." This should 
>reference the definition of status strings in the specification; suggest 
>"The status argument is a string consisting of a Status-Code and a 
>Reason-Phrase, in that order and separated by a single space, with no 
>surrounding whitespace or other characters. See RFC2616, Section 6.1.1 for 
>more information."

Okay.


>* In the next paragraph, "Each header_name must be a valid HTTP header 
>name." For the same reasons as above, suggest "Each header_name must be a 
>HTTP header field-name, as defined in RFC2616 Section 4.2."

Okay.


>* In the next paragraph, "If the application omits a needed header, the 
>server or gateway should add it." Who determines whether it's needed? 
>Suggest "If the application omits a header required by HTTP or other relevant
>specifications in effect, the server or gateway must add it." (note must, 
>not should)

Sure.


>* The next paragraph is confusingly worded; I'd suggest "The server or 
>gateway must not actually transmit the HTTP headers until the first write 
>call, or until after the first iteration of the application return value 
>that yeilds a non-empty string...."

Your phrasing doesn't work either, because 'start_response()' can't wait 
around until those things happen; it has to return immediately.  I'll try 
another phrasing.


>* "Buffering and Streaming," is, again, confusing about when an iterator 
>is supposed to be returned.

An application always returns an iterable when called, as per "The 
application object must return an iterable yielding strings." in 
"Specification Details".

I'll put another note about this under "The Application/Framework Side".

Keep in mind that applications *always always always* MUST return an 
iterable, with absolutely no exceptions ever.  Use of 'write()' does not 
absolve an application from returning an iterable.  (I'll add a note to 
this effect in the section on 'write()'.


>* Finally, "Other HTTP Features" states that "In a sense, a server should 
>consider itself to be like an HTTP 'proxy server'..." This isn't a good 
>analogy; the function it performs is much closer to an HTTP gateway; See 
>the terminology section of RFC2616.

Will do.

From pje at telecommunity.com  Wed Sep 29 07:05:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 29 07:05:47 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <415A3E7B.4020706@colorstudy.com>
Message-ID: <5.1.1.6.0.20040929010139.020e8af0@mail.telecommunity.com>

At 11:47 PM 9/28/04 -0500, Ian Bicking wrote:
>I've written some code for testing WSGI applications and servers.  As 
>before, it's at svn://colorstudy.com/trunk/WSGI , or 
>http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/
>
>The test so far has three parts.  There's a simple "echo" application; it 
>actually does several things depending on what variables you give it.

FYI, 'echo.application' does not return an iterable, and is therefore not a 
valid application object.  The 'lint' application also has a path that 
returns None.

The part of the spec that allowed applications to return None instead of an 
iterable has been gone from the spec for weeks; I mentioned its removal in 
one of my regular "recent changes to the spec" posts here.

Applications *must* always return an iterable.

From pje at telecommunity.com  Wed Sep 29 07:21:11 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 29 07:21:29 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <415A3E7B.4020706@colorstudy.com>
Message-ID: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>

At 11:47 PM 9/28/04 -0500, Ian Bicking wrote:

>Then there's the lint middleware.  This doesn't modify the request in any 
>way (though it does wrap start_response and other objects).

The wrapper is broken: 'exc_info = args[3]' should be 'exc_info = args[2]'.


>   It just checks various things; right now it mostly checks that required 
> environmental variables are there and that everything is of the right type.

Some of the variables you're checking for are not actually required any 
more; see

     http://www.python.org/peps/pep-0333.html#environ-variables

for details.

Also, your header checks are requiring non-duplicated headers, but 
duplicate header names are in fact allowed, per discussion on the 
list.  But, this isn't explicitly stated in the spec, so I should fix that.

I'm also not positive that a Content-Type header is absolutely required, 
e.g. for redirects.  I guess I should dig up the HTTP spec on this point.


>   It doesn't test any of the more subtle aspects of WSGI, or test any 
> failure cases.

Apart from the fact that it doesn't always return an iterable, the lint app 
is WSGI compliant, but "overprotective", in that it requires things not 
required by the spec.

Other than those nits, it's a pretty nice piece of middleware and I'll 
probably use it to help in writing a WSGI "reference library".


>   It doesn't test the exc_info stuff either; I haven't kept up, and I 
> only partly understand the motivation there.

exc_info should be a three-element tuple containing a type, an instance of 
the type, and a traceback object.  If start_response() is called more than 
once, it's a fatal error not to include exc_info (because the only time 
it's valid to call start_response() a second time is if an error occurred 
while you were writing or yielding output).  If exc_info is supplied and 
headers have already been sent to the server, the server *must* raise an 
error, and *should* raise the supplied exc_info triplet.  So, some of these 
things can be tested by your 'lint' program.

See also:

     http://www.python.org/peps/pep-0333.html#the-start-response-callable

from paragraph 7 on.

From ianb at colorstudy.com  Wed Sep 29 09:19:28 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Sep 29 09:19:33 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
References: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
Message-ID: <415A6200.4000802@colorstudy.com>

Phillip J. Eby wrote:
> At 11:47 PM 9/28/04 -0500, Ian Bicking wrote:
> 
>> Then there's the lint middleware.  This doesn't modify the request in 
>> any way (though it does wrap start_response and other objects).
> 
> 
> The wrapper is broken: 'exc_info = args[3]' should be 'exc_info = args[2]'.

Fixed.

> 
>>   It just checks various things; right now it mostly checks that 
>> required environmental variables are there and that everything is of 
>> the right type.
> 
> 
> Some of the variables you're checking for are not actually required any 
> more; see
> 
>     http://www.python.org/peps/pep-0333.html#environ-variables
> 
> for details.

The only one I was mistakenly requiring seems to be QUERY_STRING; from 
my reading, all these are required:

'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', 'SERVER_PORT'

Well, maybe SCRIPT_NAME isn't required.

> Also, your header checks are requiring non-duplicated headers, but 
> duplicate header names are in fact allowed, per discussion on the list.  
> But, this isn't explicitly stated in the spec, so I should fix that.
> 
> I'm also not positive that a Content-Type header is absolutely required, 
> e.g. for redirects.  I guess I should dig up the HTTP spec on this point.

I believe it is required for any response that has a body, but it's true 
that's not all responses.  There's some 2xx responses that have no body. 
  I've taken out the requirement, but noted that it should be in there 
somewhere.  I'm okay if this embodies some requirements of HTTP 
inaddition to specifically WSGI requirements.

>>   It doesn't test any of the more subtle aspects of WSGI, or test any 
>> failure cases.
> 
> 
> Apart from the fact that it doesn't always return an iterable, the lint 
> app is WSGI compliant, but "overprotective", in that it requires things 
> not required by the spec.
> 
> Other than those nits, it's a pretty nice piece of middleware and I'll 
> probably use it to help in writing a WSGI "reference library".
> 
> 
>>   It doesn't test the exc_info stuff either; I haven't kept up, and I 
>> only partly understand the motivation there.
> 
> 
> exc_info should be a three-element tuple containing a type, an instance 
> of the type, and a traceback object.  If start_response() is called more 
> than once, it's a fatal error not to include exc_info (because the only 
> time it's valid to call start_response() a second time is if an error 
> occurred while you were writing or yielding output).  If exc_info is 
> supplied and headers have already been sent to the server, the server 
> *must* raise an error, and *should* raise the supplied exc_info 
> triplet.  So, some of these things can be tested by your 'lint' program.

I suppose I could trigger these conditions in echo, and then test that 
they are handled properly in lint.  I'll have to think about what 
exactly "properly" is first.

> See also:
> 
>     http://www.python.org/peps/pep-0333.html#the-start-response-callable
> 
> from paragraph 7 on.

I read that, and didn't feel entirely clear on the intention.  An 
example in that section would probably be helpful.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Wed Sep 29 17:07:10 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 29 17:07:36 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <415A6200.4000802@colorstudy.com>
References: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>

At 02:19 AM 9/29/04 -0500, Ian Bicking wrote:

>The only one I was mistakenly requiring seems to be QUERY_STRING; from my 
>reading, all these are required:
>
>'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', 'SERVER_PORT'
>
>Well, maybe SCRIPT_NAME isn't required.

Or PATH_INFO - if the request is addressed directly to the application, and 
there's no trailing '/', it can be empty, and is therefore allowed to be 
missing, as in CGI.


>I suppose I could trigger these conditions in echo, and then test that 
>they are handled properly in lint.  I'll have to think about what exactly 
>"properly" is first.
>
>>See also:
>>     http://www.python.org/peps/pep-0333.html#the-start-response-callable
>>from paragraph 7 on.
>
>I read that, and didn't feel entirely clear on the intention.  An example 
>in that section would probably be helpful.

I'll see what I can do.

By the way, I found another issue with lint: IteratorWrapper doesn't close 
the original iterable if it had a close() method.

From ianb at colorstudy.com  Wed Sep 29 18:21:52 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Sep 29 18:22:48 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
References: <5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
Message-ID: <415AE120.10609@colorstudy.com>

Phillip J. Eby wrote:
> At 02:19 AM 9/29/04 -0500, Ian Bicking wrote:
> 
>> The only one I was mistakenly requiring seems to be QUERY_STRING; from 
>> my reading, all these are required:
>>
>> 'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', 
>> 'SERVER_PORT'
>>
>> Well, maybe SCRIPT_NAME isn't required.
> 
> 
> Or PATH_INFO - if the request is addressed directly to the application, 
> and there's no trailing '/', it can be empty, and is therefore allowed 
> to be missing, as in CGI.

OK, fixed.

> By the way, I found another issue with lint: IteratorWrapper doesn't 
> close the original iterable if it had a close() method.

Fixed as well.

Also, I added back in the content-type check, unless there's a response 
code of 204 No Content; I think that's the only response code where 
there shouldn't be a content-type.  I'd rather be a little overly 
restrictive.  It's a useful check, because most frameworks have default 
content-types, and WSGI does not.  And some browsers (specifically IE) 
try to fix broken content-types.  And some servers add default 
content-types, e.g., Apache's DefaultType.  So it's a bug that might be 
missed.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From mnot at mnot.net  Wed Sep 29 18:34:42 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Wed Sep 29 18:34:45 2004
Subject: [Web-SIG] PEP 333 (19-Sep-04) Feedback
In-Reply-To: <5.1.1.6.0.20040929004017.02813a60@mail.telecommunity.com>
References: <5.1.1.6.0.20040929004017.02813a60@mail.telecommunity.com>
Message-ID: <7748360D-1235-11D9-88DC-000A95BD86C0@mnot.net>

Thanks for the quick response. Answers inline below.

On Sep 28, 2004, at 9:57 PM, Phillip J. Eby wrote:
>> * The same section later states "The application object must return 
>> an iterable yielding strings." Return when?
>
> When it's called, of course.  I'll change that to, "When called, the 
> application object must..."
>
>
>>  We're cautioned that the write() callable should not be used; how is 
>> the iterable returned, then?
>
> Huh?

I found the flow of calls confusing in this section; I'll think on how 
to improve it and make a concrete suggestion if I come up with 
something.


>> * In "environ Variables," it is specified that "In general, a server 
>> or gateway should attempt to provide as many other CGI variables as 
>> are applicable, including e.g. the nonstandard SSL variables such as 
>> HTTPS=on , if an SSL connection is in effect." This sentence hedges 
>> in four different ways; "In general," "should," "attempt," "as 
>> many... as are applicable." Besides the redundancy, I'm concerned 
>> about the inclusion of nonstandard variables; how will people know 
>> which ones to include? I'd suggest listing those that aren't in the 
>> CGI standard, so there's an even playing field.
>
> Is there a standard for SSL extensions to CGI?  These are really the 
> only "non-standard" variables I actually care about.  I'll tweak the 
> rest of this more or less as you suggest.

Not to my knowledge; maybe just document that one and don't mention 
others.


>> * The same section defines a number of environment variables with 
>> Boolean values (e.g., wsgi.multithread). When these definitions say 
>> "This value should be true if..." does it mean that they should be a 
>> Python types.BooleanType, or that it should evaluate to true (e.g., 
>> if wgsi.multithread: ...)?
>
> The latter; I thought this was obvious by virtue of the fact that it 
> doesn't say ``True`` in typewriter font.  Good Python style (and 
> performance) demands that one never perform truth tests by comparing 
> directly to ``True`` or ``False``, so in theory it shouldn't matter 
> unless you want to be tricky and use the value as an index.
>
> Were you actually confused by this bit, or are you just looking for 
> ambiguities?  I'd like to avoid cluttering these definitions further, 
> if possible.

Looking for ambiguities. Couldn't you fix this by saying "The value 
should evaluate to true if..."?


Cheers,

--
Mark Nottingham     http://www.mnot.net/

From pje at telecommunity.com  Wed Sep 29 18:36:16 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 29 18:36:44 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <415AE120.10609@colorstudy.com>
References: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com>

At 11:21 AM 9/29/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>At 02:19 AM 9/29/04 -0500, Ian Bicking wrote:
>>
>>>The only one I was mistakenly requiring seems to be QUERY_STRING; from 
>>>my reading, all these are required:
>>>
>>>'REQUEST_METHOD', 'SCRIPT_NAME', 'PATH_INFO', 'SERVER_NAME', 'SERVER_PORT'
>>>
>>>Well, maybe SCRIPT_NAME isn't required.
>>
>>Or PATH_INFO - if the request is addressed directly to the application, 
>>and there's no trailing '/', it can be empty, and is therefore allowed to 
>>be missing, as in CGI.
>
>OK, fixed.

Actually, it just occurred to me that there *is* a legitimate test you can 
do for SCRIPT_NAME and PATH_INFO: at least *one* of them must be present 
and non-blank, because if you're at the site root, SCRIPT_NAME is empty and 
PATH_INFO has to be '/'.  (Or the other way around, the CGI spec isn't 
clear on this, but Apache CGI puts the '/' in PATH_INFO.)  Anyway, it's 
never valid to have both empty or missing, so you can:

     assert environ.get('SCRIPT_NAME') or environ.get('PATH_INFO')

Also, if present and non-empty, both of these variables must *begin* with a 
'/', so it's more like:

     script_name = environ.get('SCRIPT_NAME','')
     path_info   = environ.get('PATH_INFO','')
     assert not script_name or script_name.startswith('/')
     assert not path_info   or path_info.startswith('/')
     assert script_name or path_info


>>By the way, I found another issue with lint: IteratorWrapper doesn't 
>>close the original iterable if it had a close() method.
>
>Fixed as well.

Actually, no.  Lint's iterator close() is still broken.  You have to use 
close() on the *iterable*, not on iter(iterable).  The two may be different 
objects, since an iterable may return a separate iterator object.

Also, pycgiwrapper returns None from __call__, when it should return an 
iterator.  A simple way to fix that would be to just 'return [body]' after 
calling start_respsonse.

I'm pretty much coming to the conclusion that WSGI is no longer "simple", 
alas.  For it to actually be usable, there's going to have to be a 
reference library, as well as tests.  I'm going to keep pecking away at 
your lint program, and eventually your other test facilities as well, so 
that I'll have something to test the reference library with.  :)

From ianb at colorstudy.com  Wed Sep 29 19:23:54 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Sep 29 19:24:44 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com>
References: <5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
	<5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com>
Message-ID: <415AEFAA.8070405@colorstudy.com>

Phillip J. Eby wrote:
> Actually, it just occurred to me that there *is* a legitimate test you 
> can do for SCRIPT_NAME and PATH_INFO: at least *one* of them must be 
> present and non-blank, because if you're at the site root, SCRIPT_NAME 
> is empty and PATH_INFO has to be '/'.  (Or the other way around, the CGI 
> spec isn't clear on this, but Apache CGI puts the '/' in PATH_INFO.)  

OK... I guess the root of a domain is an odd case, because I can't 
imagine what the difference between SCRIPT_NAME="/", PATH_INFO="" or 
SCRIPT_NAME="", PATH_INFO="/" would mean.

On further thought, I think it doesn't make sense for SCRIPT_NAME to be 
"/".  Because PATH_INFO must always start with a "/", SCRIPT_NAME must 
be "" if there's any path (unless we get double /'s when reconstructing 
the URL, which wouldn't be good).  So I think I'm going to make the test 
include SCRIPT_NAME != "/".  The general case would say that SCRIPT_NAME 
should not end with a /, but I don't feel 100% confident that that's 
correct.

> Anyway, it's never valid to have both empty or missing, so you can:
> 
>     assert environ.get('SCRIPT_NAME') or environ.get('PATH_INFO')
> 
> Also, if present and non-empty, both of these variables must *begin* 
> with a '/', so it's more like:
> 
>     script_name = environ.get('SCRIPT_NAME','')
>     path_info   = environ.get('PATH_INFO','')
>     assert not script_name or script_name.startswith('/')
>     assert not path_info   or path_info.startswith('/')
>     assert script_name or path_info

Yes, the '/' tests were already in there.

>>> By the way, I found another issue with lint: IteratorWrapper doesn't 
>>> close the original iterable if it had a close() method.
>>
>>
>> Fixed as well.
> 
> 
> Actually, no.  Lint's iterator close() is still broken.  You have to use 
> close() on the *iterable*, not on iter(iterable).  The two may be 
> different objects, since an iterable may return a separate iterator object.

This was something I felt a little ambiguous about.  I assume the server 
always must iterate over iter(app_iter), it can't iterate over app_iter 
directly.  When using a "for" loop there's not much distinction, but if 
you access the .next() methods directly there would be.  Anyway, I'm a 
little fuzzy when __iter__ gets called implicitly.  I was suprised that 
it seemed to get called twice when iterating with a simple for look, and 
I had to add IteratorWrapper.__iter__.

> Also, pycgiwrapper returns None from __call__, when it should return an 
> iterator.  A simple way to fix that would be to just 'return [body]' 
> after calling start_respsonse.

I've added a check in lint specifically for None or False for the 
iterator; it would still fail implicitly before, but this way the error 
should be better.  I haven't tested pycgiwrapper yet, or some of the 
other code I wrote before, so there might be other bugs in there (e.g., 
unnecessary use of write(), or returning None).

> I'm pretty much coming to the conclusion that WSGI is no longer 
> "simple", alas.  For it to actually be usable, there's going to have to 
> be a reference library, as well as tests.  I'm going to keep pecking 
> away at your lint program, and eventually your other test facilities as 
> well, so that I'll have something to test the reference library with.  :)

The basic mechanics are still reasonably simple, but there's a lot of 
smaller things to consider.  So I don't think WSGI has become that much 
more complicated, we've just come to appreciate complexities that were 
there all along.

Also, should we be putting all of this code in a single repository?

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Wed Sep 29 19:38:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Sep 29 19:38:27 2004
Subject: [Web-SIG] WSGI tests
In-Reply-To: <415AEFAA.8070405@colorstudy.com>
References: <5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com>
	<5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929010716.0266c010@mail.telecommunity.com>
	<5.1.1.6.0.20040929110218.0449aa90@mail.telecommunity.com>
	<5.1.1.6.0.20040929122710.0210e5f0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040929133005.020e6d40@mail.telecommunity.com>

At 12:23 PM 9/29/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Actually, it just occurred to me that there *is* a legitimate test you 
>>can do for SCRIPT_NAME and PATH_INFO: at least *one* of them must be 
>>present and non-blank, because if you're at the site root, SCRIPT_NAME is 
>>empty and PATH_INFO has to be '/'.  (Or the other way around, the CGI 
>>spec isn't clear on this, but Apache CGI puts the '/' in PATH_INFO.)
>
>OK... I guess the root of a domain is an odd case, because I can't imagine 
>what the difference between SCRIPT_NAME="/", PATH_INFO="" or 
>SCRIPT_NAME="", PATH_INFO="/" would mean.
>
>On further thought, I think it doesn't make sense for SCRIPT_NAME to be 
>"/".  Because PATH_INFO must always start with a "/", SCRIPT_NAME must be 
>"" if there's any path (unless we get double /'s when reconstructing the 
>URL, which wouldn't be good).  So I think I'm going to make the test 
>include SCRIPT_NAME != "/".  The general case would say that SCRIPT_NAME 
>should not end with a /, but I don't feel 100% confident that that's correct.

Actually, you're right: SCRIPT_NAME should not end with a '/', because it 
would have to be part of PATH_INFO in that case.


>>>>By the way, I found another issue with lint: IteratorWrapper doesn't 
>>>>close the original iterable if it had a close() method.
>>>
>>>
>>>Fixed as well.
>>
>>Actually, no.  Lint's iterator close() is still broken.  You have to use 
>>close() on the *iterable*, not on iter(iterable).  The two may be 
>>different objects, since an iterable may return a separate iterator object.
>
>This was something I felt a little ambiguous about.  I assume the server 
>always must iterate over iter(app_iter), it can't iterate over app_iter 
>directly.

Not precisely true; see below.


>   When using a "for" loop there's not much distinction, but if you access 
> the .next() methods directly there would be.  Anyway, I'm a little fuzzy 
> when __iter__ gets called implicitly.  I was suprised that it seemed to 
> get called twice when iterating with a simple for look, and I had to add 
> IteratorWrapper.__iter__.

PEP 234 describes the iterator protocol, but here's a short summary:

* An "iterable" has an __iter__ method (tp_iter slot at the C level)

* An "iterator" has an __iter__ method *and* a next method (tp_iter_next slot)

'for' loops work on "iterables", so they call __iter__.  Typically, an 
iterator's __iter__ returns self, so this is idempotent if you're iterating 
over an iterator.

WSGI apps must return an *iterable*.  An iterator is of course also an 
iterable.


>>I'm pretty much coming to the conclusion that WSGI is no longer "simple", 
>>alas.  For it to actually be usable, there's going to have to be a 
>>reference library, as well as tests.  I'm going to keep pecking away at 
>>your lint program, and eventually your other test facilities as well, so 
>>that I'll have something to test the reference library with.  :)
>
>The basic mechanics are still reasonably simple, but there's a lot of 
>smaller things to consider.  So I don't think WSGI has become that much 
>more complicated, we've just come to appreciate complexities that were 
>there all along.
>
>Also, should we be putting all of this code in a single repository?

Eventually, we should probably use the Python CVS sandbox.  For now, we 
don't really have any duplication taking place AFAICT.  Once I have 
something resembling a coherent reference library, I'll put it there, anyway.

From ianb at colorstudy.com  Wed Sep 29 23:48:49 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Sep 29 23:49:37 2004
Subject: [Web-SIG] WSGI Webware/WebKit
Message-ID: <415B2DC1.70007@colorstudy.com>

I just committed some code to the repository 
(svn://colorstudy.com/trunk/WSGI/) that implements WebKit ontop of WSGI. 
  It's not complete, but much of the core is there.  There's no 
configuration, no AppServer or Application object, no session, and the 
path (URL introspection) methods are missing.  The path methods in 
Webware are a mess, which is why I left them out.

AppServer and Application objects don't really apply in this context.  I 
hope to create dummy objects for those few places where they are 
exposed.  Configuration probably will be implemented in a different 
layer, and all the configuration will change, since it's a different 
environment you are configuring.  Session will probably be in a 
different layer as well, maybe with a wrapper to implement the WebKit 
interface around a session that may not have that interface.  The path 
methods will just wait.

Also, there's no URL resolution.  I'm just using dispatch.py for now, 
which is a naive way of dispatching.  But I plan to keep dispatching in 
a separate layer -- this way different frameworks can live side-by-side.

Instances of WSGI.WSGIWebKit.wkservlet.Page are WSGI applications; 
basically Page.__call__ does the work.  Most of the rest is copied from 
WebKit with some cleanup; there's some small portions that were changed, 
like HTTPResponse.__init__, write, commit, and deliver, and 
HTTPRequest.__init__.  The changes were fairly easy to do.

I also changed the tests to be unittests.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org