From mo.babaei at gmail.com  Thu Dec  1 08:48:12 2005
From: mo.babaei at gmail.com (mohammad babaei)
Date: Thu, 1 Dec 2005 11:18:12 +0330
Subject: [Web-SIG] Database Module in a Web Application
Message-ID: <5bf3a41f0511302348t4b84c5a4g8ac66a4ff7644adf@mail.gmail.com>

Hi,
I'm going to write my first web application in Python,
is it an good idea to write a database module that handles the connection to
database & executing queries ?


Regards
M.B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20051201/7ea9c426/attachment.html

From tsoehnli at gmu.edu  Thu Dec  1 18:08:53 2005
From: tsoehnli at gmu.edu (Timothy Soehnlin)
Date: Thu, 01 Dec 2005 12:08:53 -0500
Subject: [Web-SIG] Sessions and Headers
Message-ID: <200512011208.53930.tsoehnli@gmu.edu>

Hello All,

	Okay, lets get down to business.  I am wondering if anyone knows of a 
framework independent Session library.  I am looking to bring a Session 
library into my framework, but everything I have found so far seems to be 
unnecessarily integrated with the frameworks.  And before I get all gung ho 
and go and right my own Session libraries, I was wondering if anyone knows of 
a library that I could use, and save myself some time.  

	On another note, I am also wanting to integerate multiple server 
environments, and specifically with this question, mod_python.  Now I have my 
framework working with mod_python but I have recently created a standard 
request object that all the different server environments plug into by 
initializing the object with an environment dictionary, a file to read the 
user data from(for posts and whatnot), and then a write function that gives 
direct control to returning the request output to the user.  In mod_python 
the headers are automagically submitted when the function write is invoked 
the first time.  I need this to not be.  I need to have total control over 
the headers, as my standard Request Object handles header manipulation and 
submission.  
	
	Thank you for your time and consideration.	

					Sincerely,
						Timothy Soehnlin
-- 
I would rather be known as a Christian
	and despised, than to be overlooked,
		and thought of as one of the world.

From ben at groovie.org  Thu Dec  1 18:04:25 2005
From: ben at groovie.org (Ben Bangert)
Date: Thu, 1 Dec 2005 09:04:25 -0800
Subject: [Web-SIG] Sessions and Headers
In-Reply-To: <200512011208.53930.tsoehnli@gmu.edu>
References: <200512011208.53930.tsoehnli@gmu.edu>
Message-ID: <078AD67D-7233-467A-A9EF-5425407B7058@groovie.org>

On Dec 1, 2005, at 9:08 AM, Timothy Soehnlin wrote:

> 	Okay, lets get down to business.  I am wondering if anyone knows of a
> framework independent Session library.  I am looking to bring a  
> Session
> library into my framework, but everything I have found so far seems  
> to be
> unnecessarily integrated with the frameworks.  And before I get all  
> gung ho
> and go and right my own Session libraries, I was wondering if  
> anyone knows of
> a library that I could use, and save myself some time.

Many frameworks session system's can be used completely independently  
of the framework. Myghty's has been used in various scenarios partly  
as it works without a problem in mod_python, WSGI, etc. and has a  
consistent interface across any of the environments.

Ian Bicking wrote a WSGI session middleware module that handles  
sessions completely independently of any framework, though I'm not  
sure offhand how that'd work with mod_python.

I won't be surprised to see other framework authors offer advice on  
how to use their respective session object outside of their  
framework, as they're typically modular enough to function in this  
manner. Most of them provide a dict-style interface, some use  
attributes, etc. In the end, I think you'll have enough choices where  
you can sift it out and find the one that works best for you.

Cheers,
Ben

From fumanchu at amor.org  Thu Dec  1 19:20:47 2005
From: fumanchu at amor.org (Robert Brewer)
Date: Thu, 1 Dec 2005 10:20:47 -0800
Subject: [Web-SIG] Sessions and Headers
Message-ID: <A77618B80CDD2543B705C82B7665D9F902F79A91@ex9.hostedexchange.local>

Timothy Soehnlin wrote:
> On another note, I am also wanting to integerate 
> multiple server environments, and specifically
> with this question, mod_python.  Now I have my
> framework working with mod_python but I have
> recently created a standard request object that
> all the different server environments plug into
> by initializing the object with an environment
> dictionary, a file to read the user data from
> (for posts and whatnot), and then a write
> function that gives direct control to returning
> the request output to the user.

Congratulations, you just reinvented WSGI. ;)


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

From ianb at colorstudy.com  Thu Dec  1 21:14:44 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 01 Dec 2005 14:14:44 -0600
Subject: [Web-SIG] Sessions and Headers
In-Reply-To: <078AD67D-7233-467A-A9EF-5425407B7058@groovie.org>
References: <200512011208.53930.tsoehnli@gmu.edu>
	<078AD67D-7233-467A-A9EF-5425407B7058@groovie.org>
Message-ID: <438F59B4.7000900@colorstudy.com>

Ben Bangert wrote:
> Ian Bicking wrote a WSGI session middleware module that handles  
> sessions completely independently of any framework, though I'm not  
> sure offhand how that'd work with mod_python.

It's nothing to write home about.  Flup has a somewhat better session, 
and an object that is clearly usable outside WSGI; but it only has a 
couple actual stores (e.g., no database), and some room for 
improvements, so it isn't terribly notable either.

There was some talk about this on this list a while ago, but it never 
really went anywhere.  I proposed an interface, but since I lacked 
actual intention to implement it didn't go anywhere either.  But it 
still exists, of course: 
http://svn.colorstudy.com/home/ianb/scarecrow_session_interface.py -- it 
might be useful to an implementor.

In an actually-extracted form, I don't know about any session library 
for Python.  In an extrable form, I'm sure many frameworks have 
something.  An extracted session library would be welcome.  I'm 
personally getting by with a session that is much lamer than the one my 
proposed interface would imply, which is probably fine since I only put 
non-critical data in it anyway.  So a simpler session library would be 
cool too.  I think it should leave out things like configuration, but 
there's still useful functionality to be done.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From colin at owlfish.com  Sun Dec  4 15:46:14 2005
From: colin at owlfish.com (Colin Stewart)
Date: Sun, 04 Dec 2005 14:46:14 +0000
Subject: [Web-SIG] ANN: WSGIUtils 0.7
Message-ID: <1133707574.3157.8.camel@roll>

Hi,

I've release WSGIUtils 0.7.  This is a minor update, but with at one
notable fix.  Here's what's changed:

New features:
 - Added minimal support for SetupTools.

Bug fixes:
 - Changed "error.timeout" to "socket.timeout".
 - Changed package name from "WSGI Utils" to "WSGIUtils" for greater
compatibility with other tools.

The package can be found at http://www.owlfish.com/software/wsgiutils/

WSGIUtils is a package of standalone utility libraries that ease the
development of simple WSGI programs.  The package is divided into two
main components which can be used individualy or in combination:

      * wsgiServer is a multi-threaded WSGI web server based on
        SimpleHTTPServer.
      * wsgiAdaptor is a simple WSGI application that provides basic
        authentication, signed cookies and persistent sessions.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20051204/6434ea39/attachment.html

From tsoehnli at gmu.edu  Tue Dec  6 18:38:21 2005
From: tsoehnli at gmu.edu (Timothy Soehnlin)
Date: Tue, 06 Dec 2005 12:38:21 -0500
Subject: [Web-SIG] Sessions and Headers
Message-ID: <200512061238.21746.tsoehnli@gmu.edu>

Hello All,
	
	In a previous post I wrote about Sessions and Headers.  The Sessions topic 
was addressed but the Headers point was never focused on. I was wondering 
about controlling headers in mod_python. In mod_python the headers are 
automagically submitted when the function write is invoked the first time.  I 
need this to not be.  I need to have total control over the headers, and 
control when and if they are sent to the client.  I was wondering if there 
are any settings, examples, etc that any of you all would know about.
	
	Thank you for your time and consideration.	

					Sincerely,
						Timothy Soehnlin
-- 
I would rather be known as a Christian
	and despised, than to be overlooked,
		and thought of as one of the world.

From chris.arndt at web.de  Tue Dec  6 20:41:25 2005
From: chris.arndt at web.de (Christopher Arndt)
Date: Tue, 06 Dec 2005 19:41:25 +0000
Subject: [Web-SIG] cgipython 2.4.x binary for FreeBSD 4.7?
Message-ID: <4395E965.5040507@web.de>

Hi,

does anybody have, can build me, or point me to a binary of cgipython 2.4.x
(preferable 2.4.2) (http://www.egenix.com/files/python/mxCGIPython.html) for
FreeBSD 4.7?

I am trying to install a decent Python version at a Webhoster (Verio) which
apparently has FreeBSD (and only Python 1.5.2). The output of 'uname -a ' says:

FreeBSD mydomain.com 4.7-RELEASE-p22 FreeBSD 4.7-RELEASE-p22 #5: Tue May 3
13:36:49 MDT 2005 root at somemachine:/usr/home/somepath i386

I've tried the binaries for Python 2.3.5 provided by Oleg Broytmann for FreeBSD
4.9 and it basically works, but it lacks the '_random' module on which cgi.py
relies.*

The 2.4.x versions he has, are only for FreeBSD 5.4 and did not work for me.

Testing all this is very difficult, because the error.log does not show stderr
from CGI scripts :-(

Alternatively, are there any other single-file/cgi-ready Python distros I could
try?

Chris

* indirectly through tempfile.py and random.py

From fumanchu at amor.org  Tue Dec  6 21:51:20 2005
From: fumanchu at amor.org (Robert Brewer)
Date: Tue, 6 Dec 2005 12:51:20 -0800
Subject: [Web-SIG] Sessions and Headers
Message-ID: <A77618B80CDD2543B705C82B7665D9F903141F07@ex9.hostedexchange.local>

Timothy Soehnlin wrote:
> In mod_python the headers are automagically submitted when
> the function write is invoked the first time.  I need this
> to not be.

You can do that either informally, by not calling req.write in your own
code until you've built the complete response entity, or strictly, by
wrapping the request object so that the write method (and flush) spools
output until you're done. I *think* you are implying more constraints
than that, but until you expand on them, they're hard to address. ;)


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

From grahamd at dscpl.com.au  Tue Dec  6 22:54:03 2005
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Tue, 6 Dec 2005 16:54:03 -0500
Subject: [Web-SIG] Sessions and Headers
Message-ID: <1133906043.9681@dscpl.user.openhosting.com>

Timothy Soehnlin wrote ..
> Hello All,
> 	
> 	In a previous post I wrote about Sessions and Headers.  The Sessions topic
> was addressed but the Headers point was never focused on. I was wondering
> about controlling headers in mod_python. In mod_python the headers are
> automagically submitted when the function write is invoked the first time.
> I 
> need this to not be.  I need to have total control over the headers, and
> control when and if they are sent to the client.  I was wondering if there
> are any settings, examples, etc that any of you all would know about.

Don't incrementally use req.write(), instead accumulate any response as
a list of strings or using StringIO instance. Then at the point that you
finally want to send content, ie., after you have set your headers, then
call req.write() once with the accumulated content.

Note that there is a separate mod_python mailing list, you would be
better off using that if you want to get a response. The mailing list
you are posting to is not specifically about mod_python and so you are
less likely to get a response. See the mod_python web site for how to
get onto the mod_python mailing list.

Graham

From jim at zope.com  Thu Dec 15 19:58:49 2005
From: jim at zope.com (Jim Fulton)
Date: Thu, 15 Dec 2005 13:58:49 -0500
Subject: [Web-SIG] Is the size argument to the input-stream read method
	optional?
Message-ID: <43A1BCE9.8020403@zope.com>


The PEP is unclear on this and should be clarified, IMO.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From jim at zope.com  Thu Dec 15 19:47:30 2005
From: jim at zope.com (Jim Fulton)
Date: Thu, 15 Dec 2005 13:47:30 -0500
Subject: [Web-SIG] Thread-management middleware components?
Message-ID: <43A1BA42.8090406@zope.com>


Has anyone written any thread-management middleware components for WSGI?
Many web applications need to run application code in separate threads.
Often, the number of threads needs to be limited, either by throttling
the rate of thread creation, or by dispatching requests to a thread pool.
This is a capability that could be provided by a server, however, it seems
that it might be functionality better provided at an intermediate layer to
make it more pluggable.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From jim at zope.com  Thu Dec 15 21:01:44 2005
From: jim at zope.com (Jim Fulton)
Date: Thu, 15 Dec 2005 15:01:44 -0500
Subject: [Web-SIG] When must applications call the WSGI start_response
	callable.
Message-ID: <43A1CBA8.2020706@zope.com>

I'm a bit unclear about the timing of the start_response call.
I think this is because the PEP is unclear, but perhaps I missed
something.

It doesn't appear that the PEP says when the start_response callable
must be called.  It gives several examples. In most, the callback is
called when the application is called, but in one example, the
callback is called in the __iter__ of the result of calling the
application.

Here's what I think the PEP should say (something like):

"The start_response callback must be:

- called when the application is called,

- called when the result iterator is computed, or

- it must be called asynchronously, typically from an application
   thread.

Normally an application will call the start_response callable when the
application is called or when the result iterator is constructed, as
shown in the first 2 examples. An application, or more commonly, a
middleware component that provides it's own thread management might
delay starting the response.  A server should not begin iterating
over the result until the start_response callable has been called."

Why do I want this?  It appears that this would be needed to enable
middleware components that manage application threads.  I can imagine
though that there aren't any existing servers that handle what I've
suggested correctly.

I do think it would be straightforward for servers to handle this
correctly, especially for asynchronous servers like Twisted
and ayncore-based servers.  Perhaps this could be an optional feature
of the servers.  Servers supporting this feature would be prepared to
delay response output until start_response is called.  Servers unable
to do this would generate errors if start_response hasn't been called
by the time the result iterator has been constructed.

In any case, I think the PEP needs to specify more clearly when
start_response can be called.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From ianb at colorstudy.com  Thu Dec 15 21:11:21 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 15 Dec 2005 14:11:21 -0600
Subject: [Web-SIG] When must applications call the WSGI start_response
 callable.
In-Reply-To: <43A1CBA8.2020706@zope.com>
References: <43A1CBA8.2020706@zope.com>
Message-ID: <43A1CDE9.1000108@colorstudy.com>

Jim Fulton wrote:
> I'm a bit unclear about the timing of the start_response call.
> I think this is because the PEP is unclear, but perhaps I missed
> something.
> 
> It doesn't appear that the PEP says when the start_response callable
> must be called.  It gives several examples. In most, the callback is
> called when the application is called, but in one example, the
> callback is called in the __iter__ of the result of calling the
> application.
> 
> Here's what I think the PEP should say (something like):
> 
> "The start_response callback must be:
> 
> - called when the application is called,
> 
> - called when the result iterator is computed, or
> 
> - it must be called asynchronously, typically from an application
>    thread.
> 
> Normally an application will call the start_response callable when the
> application is called or when the result iterator is constructed, as
> shown in the first 2 examples. An application, or more commonly, a
> middleware component that provides it's own thread management might
> delay starting the response.  A server should not begin iterating
> over the result until the start_response callable has been called."

My impression is that it is the application's responsibility to call 
start_response before the first item is returned from the iterator, and 
it is an error if it does not.

However, in paste.lint 
(http://svn.pythonpaste.org/Paste/trunk/paste/lint.py) I check that 
start_response is called before the application returns the iterator. 
So I guess, at least where I've been inserting paste.lint, that I 
haven't encountered other examples in practice.  But then most of the 
places I've used it, I wrote the application, and so I've never felt 
compelled to use a different order.

If that's not correct, I'd like to update paste.lint.

> Why do I want this?  It appears that this would be needed to enable
> middleware components that manage application threads.  I can imagine
> though that there aren't any existing servers that handle what I've
> suggested correctly.
> 
> I do think it would be straightforward for servers to handle this
> correctly, especially for asynchronous servers like Twisted
> and ayncore-based servers.  Perhaps this could be an optional feature
> of the servers.  Servers supporting this feature would be prepared to
> delay response output until start_response is called.  Servers unable
> to do this would generate errors if start_response hasn't been called
> by the time the result iterator has been constructed.

I suppose this wouldn't be particularly bad for threaded or multiprocess 
servers either -- they use a thread/process until the request is 
completed regardless of what happens.  I can see how it could be used to 
greater effect in an asynchronous server.  However, I'd rather it not be 
optional, as most WSGI apps won't do this, and so servers won't get good 
testing on this or may just not implement it, and then some apps and 
some servers won't be compatible.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From ianb at colorstudy.com  Thu Dec 15 21:22:30 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 15 Dec 2005 14:22:30 -0600
Subject: [Web-SIG] Thread-management middleware components?
In-Reply-To: <43A1BA42.8090406@zope.com>
References: <43A1BA42.8090406@zope.com>
Message-ID: <43A1D086.1060704@colorstudy.com>

Jim Fulton wrote:
> Has anyone written any thread-management middleware components for WSGI?
> Many web applications need to run application code in separate threads.
> Often, the number of threads needs to be limited, either by throttling
> the rate of thread creation, or by dispatching requests to a thread pool.
> This is a capability that could be provided by a server, however, it seems
> that it might be functionality better provided at an intermediate layer to
> make it more pluggable.

Right now all threading and generally concurrency is handled by the 
server.  Since it *has* to be handled by the server, I'm not sure what 
the advantage would be to duplicating that functionality?  Well, 
strictly speaking you could have a server with wsgi.threaded and 
wsgi.multiprocess both being false, and the server presumably being 
asynchronous, but I think that's challenging to fit into the WSGI spec 
-- there was some discussion some time ago that dwindled off, and I 
don't think there was ever any resolution on handling asynchronous 
servers/apps in WSGI.

I don't see a need for a lot of interchangeable thread pools, a handful 
at most should do.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From ianb at colorstudy.com  Thu Dec 15 20:59:25 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 15 Dec 2005 13:59:25 -0600
Subject: [Web-SIG] Is the size argument to the input-stream read method
 optional?
In-Reply-To: <43A1BCE9.8020403@zope.com>
References: <43A1BCE9.8020403@zope.com>
Message-ID: <43A1CB1D.7000900@colorstudy.com>

Jim Fulton wrote:
> The PEP is unclear on this and should be clarified, IMO.

My experience in using implementations is many servers do not require 
the read size argument (they don't give a TypeError), but they block 
without it, or if you read past CONTENT_LENGTH.  So it should probably 
be required in the spec, since it's required in practice.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From foom at fuhm.net  Thu Dec 15 21:29:00 2005
From: foom at fuhm.net (James Y Knight)
Date: Thu, 15 Dec 2005 15:29:00 -0500
Subject: [Web-SIG] When must applications call the WSGI start_response
	callable.
In-Reply-To: <43A1CBA8.2020706@zope.com>
References: <43A1CBA8.2020706@zope.com>
Message-ID: <1EFFD451-F82C-4973-B2AE-9311B1500A08@fuhm.net>

On Dec 15, 2005, at 3:01 PM, Jim Fulton wrote:
> Normally an application will call the start_response callable when the
> application is called or when the result iterator is constructed, as
> shown in the first 2 examples. An application, or more commonly, a
> middleware component that provides it's own thread management might
> delay starting the response.  A server should not begin iterating
> over the result until the start_response callable has been called."

But it's my understanding that this is valid:

     def test_calledStartResponseLate(self):
         def application(environ, start_response):
             start_response("200 OK", {})
             yield "Foo"

start_response is called _inside_ the first iteration of the result.  
So the server has to iterate at least once, even if start_response  
was not called...

I was led to believe this was a valid thing to do from the following  
wording:
> (Note: the application must invoke the start_response() callable  
> before the iterable yields its first body string, so that the  
> server can send the headers before any body content. However, this  
> invocation may be performed by the iterable's first iteration, so  
> servers must not assume that start_response() has been called  
> before they begin iterating over the iterable.)

James

From jim at zope.com  Thu Dec 15 21:55:19 2005
From: jim at zope.com (Jim Fulton)
Date: Thu, 15 Dec 2005 15:55:19 -0500
Subject: [Web-SIG] Thread-management middleware components?
In-Reply-To: <43A1D086.1060704@colorstudy.com>
References: <43A1BA42.8090406@zope.com> <43A1D086.1060704@colorstudy.com>
Message-ID: <43A1D837.8060404@zope.com>

Ian Bicking wrote:
> Jim Fulton wrote:
> 
>> Has anyone written any thread-management middleware components for WSGI?
>> Many web applications need to run application code in separate threads.
>> Often, the number of threads needs to be limited, either by throttling
>> the rate of thread creation, or by dispatching requests to a thread pool.
>> This is a capability that could be provided by a server, however, it 
>> seems
>> that it might be functionality better provided at an intermediate 
>> layer to
>> make it more pluggable.
> 
> 
> Right now all threading and generally concurrency is handled by the 
> server.  Since it *has* to be handled by the server,

Why does it have to be handled by the server?

 > I'm not sure what
> the advantage would be to duplicating that functionality? 

The advantage is that it gives people deploying an application
more control.  We've recently switched to using WSGI for
HTTP in Zope.  Our default "out of the box" server of choice
is Twisted, however, the current thread-management strategy used by
Twisted's WSGI server doesn't meet out needs.  I could try to get
Twisted to change it's stragegy, and I probably will, but it would
be more flexible to be able to plug something in.


 > Well,
> strictly speaking you could have a server with wsgi.threaded and 
> wsgi.multiprocess both being false, and the server presumably being 
> asynchronous, but I think that's challenging to fit into the WSGI spec 
> -- there was some discussion some time ago that dwindled off, and I 
> don't think there was ever any resolution on handling asynchronous 
> servers/apps in WSGI.

We have long experience with combining an asynchronous network server
with a threaded application server. Asynchronous network servers can
handle I/O with lots of network clients very efficiently, but only
if an application doesn't block.  Real applications often take
significant time to compute results.  A thread-management facility that
bridges asychronous servers with threaded application can work very well.

It's possible that my need is specific to using asynchronous servers,
but I consider working well with asynchronous servers to be a pretty
important requirement.

> I don't see a need for a lot of interchangeable thread pools, a handful 
> at most should do.

I'm not sure what you mean by this.

On the one hand, I'd like to be free to choose my own thread-management
stragegy.  On the other hand, if there are multiple asynchronous servers,
I don't see why they should each have to maintain their own thread-management
subsystems if one can be shared among the different servers.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From jim at zope.com  Thu Dec 15 21:59:04 2005
From: jim at zope.com (Jim Fulton)
Date: Thu, 15 Dec 2005 15:59:04 -0500
Subject: [Web-SIG] When must applications call the WSGI start_response
 callable.
In-Reply-To: <1EFFD451-F82C-4973-B2AE-9311B1500A08@fuhm.net>
References: <43A1CBA8.2020706@zope.com>
	<1EFFD451-F82C-4973-B2AE-9311B1500A08@fuhm.net>
Message-ID: <43A1D918.1010803@zope.com>

James Y Knight wrote:
> On Dec 15, 2005, at 3:01 PM, Jim Fulton wrote:
> 
>> Normally an application will call the start_response callable when the
>> application is called or when the result iterator is constructed, as
>> shown in the first 2 examples. An application, or more commonly, a
>> middleware component that provides it's own thread management might
>> delay starting the response.  A server should not begin iterating
>> over the result until the start_response callable has been called."
> 
> 
> But it's my understanding that this is valid:
> 
>     def test_calledStartResponseLate(self):
>         def application(environ, start_response):
>             start_response("200 OK", {})
>             yield "Foo"
> 
> start_response is called _inside_ the first iteration of the result.  So 
> the server has to iterate at least once, even if start_response  was not 
> called...
> 
> I was led to believe this was a valid thing to do from the following  
> wording:
> 
>> (Note: the application must invoke the start_response() callable  
>> before the iterable yields its first body string, so that the  server 
>> can send the headers before any body content. However, this  
>> invocation may be performed by the iterable's first iteration, so  
>> servers must not assume that start_response() has been called  before 
>> they begin iterating over the iterable.)

Aargh, I didn't see that, despite looking for it.  I said I may have missed
it.

Hm.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From pje at telecommunity.com  Thu Dec 15 21:35:56 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 15 Dec 2005 15:35:56 -0500
Subject: [Web-SIG] When must applications call the WSGI start_response
 callable.
In-Reply-To: <43A1CBA8.2020706@zope.com>
Message-ID: <5.1.1.6.0.20051215151106.01e176f0@mail.telecommunity.com>

At 03:01 PM 12/15/2005 -0500, Jim Fulton wrote:
>I'm a bit unclear about the timing of the start_response call.
>I think this is because the PEP is unclear, but perhaps I missed
>something.
>
>It doesn't appear that the PEP says when the start_response callable
>must be called.  It gives several examples. In most, the callback is
>called when the application is called, but in one example, the
>callback is called in the __iter__ of the result of calling the
>application.

Hm.  I thought there was something there saying that it had to be called by 
the time the first value is yielded by the iterable, but it's not 
explicit.  The example *server* in the PEP, however, raises an 
AssertionError if you violate this rule.


>Here's what I think the PEP should say (something like):
>
>"The start_response callback must be:
>
>- called when the application is called,
>
>- called when the result iterator is computed, or
>
>- it must be called asynchronously, typically from an application
>    thread.

-1 on enabling asynchrony here; it would enormously complicate the design 
of servers.  WSGI is a purely synchronous protocol.  Any asynchrony within 
an application must be masked from the server.


>Normally an application will call the start_response callable when the
>application is called or when the result iterator is constructed, as
>shown in the first 2 examples. An application, or more commonly, a
>middleware component that provides it's own thread management might
>delay starting the response.  A server should not begin iterating
>over the result until the start_response callable has been called."

This would completely break the existing design.  Note in particular that 
some applications do not call start_response until they're in their first 
iterator next() call; notably any generator-based WSGI apps will do this.


>Why do I want this?  It appears that this would be needed to enable
>middleware components that manage application threads.

No, it's not needed.  Such middleware would simply have to return iterators 
that communicate with the other threads (e.g. via a queue).  These 
iterators would simply have to block until output is available.


>   I can imagine
>though that there aren't any existing servers that handle what I've
>suggested correctly.

There probably aren't *any*, actually.


>I do think it would be straightforward for servers to handle this
>correctly, especially for asynchronous servers like Twisted
>and ayncore-based servers.  Perhaps this could be an optional feature
>of the servers.  Servers supporting this feature would be prepared to
>delay response output until start_response is called.  Servers unable
>to do this would generate errors if start_response hasn't been called
>by the time the result iterator has been constructed.

About a year ago, there was some discussion of designing such an optional 
"async server" API extension to allow basically the same sort of thing; the 
only part of the idea that was incorporated, is that an iterator is allowed 
to yield empty strings to suggest to an async server that it should do 
other things for a while before trying to get another block from the iterator.

The main thing that kept the async API from gelling was that there was 
nobody with adequate use cases to motivate the definition.  Perhaps that 
has changed now.


>In any case, I think the PEP needs to specify more clearly when
>start_response can be called.

It's tempting at this point to allow start_response() to occur at any time 
until the first non-empty string is yielded, rather than the first 
string.  This would make your thread-management middleware possible, but 
unfortunately would require a protocol version change, from 1.0 to 
1.1.  Servers in the field (especially those based on the wsgiref.handlers 
module) currently require start_response() to be called before the first 
string, so your middleware couldn't rely on this feature unless it was 
either optional or a "1.1" feature.

On the other hand, it would probably make more sense to define a server 
extension like 'wsgi_async.delayed_start'.  If present, this would be a 
special value you could  return to indicate that you'll actually respond 
later.  So the threading middleware might look like:

     def threader_mw(environ, start_response):
         if 'wsgi_async.delayed_start' in environ:
             # add environ+start_response to threadqueue
             return environ['wsgi_async.delayed_start']
         else:
             # run request synchronously

The threads would then have to use write() to send data.

Anyway, this would allow async servers to let apps handle their own thread 
pooling, although in the general case I think it's a lousy idea.  An async 
server like Twisted already has a thread pooling facility, and 
application-specific pools would just duplicate that and waste 
resources.  Meanwhile, this hypothetical threading middleware seems like 
useless overhead for synchronous servers.


From pje at telecommunity.com  Thu Dec 15 22:03:08 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 15 Dec 2005 16:03:08 -0500
Subject: [Web-SIG] When must applications call the WSGI start_response
 callable.
In-Reply-To: <1EFFD451-F82C-4973-B2AE-9311B1500A08@fuhm.net>
References: <43A1CBA8.2020706@zope.com>
 <43A1CBA8.2020706@zope.com>
Message-ID: <5.1.1.6.0.20051215160226.030ba288@mail.telecommunity.com>

At 03:29 PM 12/15/2005 -0500, James Y Knight wrote:
>I was led to believe this was a valid thing to do from the following
>wording:
> > (Note: the application must invoke the start_response() callable
> > before the iterable yields its first body string, so that the
> > server can send the headers before any body content. However, this
> > invocation may be performed by the iterable's first iteration, so
> > servers must not assume that start_response() has been called
> > before they begin iterating over the iterable.)

Aha!  I knew it was in there somewhere.  :)


From jim at zope.com  Thu Dec 15 22:02:59 2005
From: jim at zope.com (Jim Fulton)
Date: Thu, 15 Dec 2005 16:02:59 -0500
Subject: [Web-SIG] When must applications call the WSGI start_response
 callable.
In-Reply-To: <43A1CDE9.1000108@colorstudy.com>
References: <43A1CBA8.2020706@zope.com> <43A1CDE9.1000108@colorstudy.com>
Message-ID: <43A1DA03.4030502@zope.com>

Ian Bicking wrote:
> Jim Fulton wrote:
...
>> Why do I want this?  It appears that this would be needed to enable
>> middleware components that manage application threads.  I can imagine
>> though that there aren't any existing servers that handle what I've
>> suggested correctly.
>>
>> I do think it would be straightforward for servers to handle this
>> correctly, especially for asynchronous servers like Twisted
>> and ayncore-based servers.  Perhaps this could be an optional feature
>> of the servers.  Servers supporting this feature would be prepared to
>> delay response output until start_response is called.  Servers unable
>> to do this would generate errors if start_response hasn't been called
>> by the time the result iterator has been constructed.
> 
> 
> I suppose this wouldn't be particularly bad for threaded or multiprocess 
> servers either -- they use a thread/process until the request is 
> completed regardless of what happens. 

Exacept that it makes the implementation a bit more complex.

 > I can see how it could be used to
> greater effect in an asynchronous server.  However, I'd rather it not be 
> optional, as most WSGI apps won't do this, and so servers won't get good 
> testing on this or may just not implement it, and then some apps and 
> some servers won't be compatible.

I mostly agree, except that I think this feature may only be useful for
asynchronous servers.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From ianb at colorstudy.com  Thu Dec 15 22:10:51 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 15 Dec 2005 15:10:51 -0600
Subject: [Web-SIG] Thread-management middleware components?
In-Reply-To: <43A1D837.8060404@zope.com>
References: <43A1BA42.8090406@zope.com> <43A1D086.1060704@colorstudy.com>
	<43A1D837.8060404@zope.com>
Message-ID: <43A1DBDB.2030805@colorstudy.com>

Jim Fulton wrote:
>> Right now all threading and generally concurrency is handled by the 
>> server.  Since it *has* to be handled by the server,
> 
> 
> Why does it have to be handled by the server?

Because most WSGI apps are blocking, so unless you want the server to be 
non-concurrent it has to handle this.  Of course you design a 
non-concurrent WSGI server that *had* to be used with some threading 
middleware.  WSGI doesn't seem like a good fit for that, though.

>  > I'm not sure what
> 
>> the advantage would be to duplicating that functionality? 
> 
> 
> The advantage is that it gives people deploying an application
> more control.  We've recently switched to using WSGI for
> HTTP in Zope.  Our default "out of the box" server of choice
> is Twisted, however, the current thread-management strategy used by
> Twisted's WSGI server doesn't meet out needs.  I could try to get
> Twisted to change it's stragegy, and I probably will, but it would
> be more flexible to be able to plug something in.

I think in this particular case -- barring direct changes to Twisted -- 
it would make more sense to build on Twisted's non-WSGI asyncronous 
application support, and build a threadpool that calls WSGI from there.

>  > Well,
>> strictly speaking you could have a server with wsgi.threaded and 
>> wsgi.multiprocess both being false, and the server presumably being 
>> asynchronous, but I think that's challenging to fit into the WSGI spec 
>> -- there was some discussion some time ago that dwindled off, and I 
>> don't think there was ever any resolution on handling asynchronous 
>> servers/apps in WSGI.
> 
> 
> We have long experience with combining an asynchronous network server
> with a threaded application server. Asynchronous network servers can
> handle I/O with lots of network clients very efficiently, but only
> if an application doesn't block.  Real applications often take
> significant time to compute results.  A thread-management facility that
> bridges asychronous servers with threaded application can work very well.
> 
> It's possible that my need is specific to using asynchronous servers,
> but I consider working well with asynchronous servers to be a pretty
> important requirement.

I think the server has to be synchronous by the time it calls a WSGI 
app.  There's nothing saying that the WSGI support in Twisted is the 
WSGI support you have to use.

My impression is that it is hard to standardize anything async-related 
because they use slightly different conventions on how to do async 
(e.g., Deferred vs. ad hoc callbacks).  So... whatever standardization 
there is to be done there is probably below WSGI.

>> I don't see a need for a lot of interchangeable thread pools, a 
>> handful at most should do.
> 
> 
> I'm not sure what you mean by this.
> 
> On the one hand, I'd like to be free to choose my own thread-management
> stragegy.  On the other hand, if there are multiple asynchronous servers,
> I don't see why they should each have to maintain their own 
> thread-management
> subsystems if one can be shared among the different servers.

Sure, but if there's only, say, 4 viable strategies and 3 serious async 
servers (are there even that many of either?) then it's easier just to 
figure out how to plug each strategy in on a case-by-case basis, and 
discuss the concrete issues with the server developers.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From jim at zope.com  Thu Dec 15 22:39:07 2005
From: jim at zope.com (Jim Fulton)
Date: Thu, 15 Dec 2005 16:39:07 -0500
Subject: [Web-SIG] Thread-management middleware components?
In-Reply-To: <43A1DBDB.2030805@colorstudy.com>
References: <43A1BA42.8090406@zope.com> <43A1D086.1060704@colorstudy.com>
	<43A1D837.8060404@zope.com> <43A1DBDB.2030805@colorstudy.com>
Message-ID: <43A1E27B.4020306@zope.com>

Ian Bicking wrote:
> Jim Fulton wrote:
> 
>>> Right now all threading and generally concurrency is handled by the 
>>> server.  Since it *has* to be handled by the server,
>>
>>
>>
>> Why does it have to be handled by the server?
> 
> 
> Because most WSGI apps are blocking, so unless you want the server to be 
> non-concurrent it has to handle this.  Of course you design a 
> non-concurrent WSGI server that *had* to be used with some threading 
> middleware. 

Actually, I suggest a WSGI server that *can* be used with
threading middleware.

 > WSGI doesn't seem like a good fit for that, though.

...

> I think in this particular case -- barring direct changes to Twisted -- 
> it would make more sense to build on Twisted's non-WSGI asyncronous 
> application support, and build a threadpool that calls WSGI from there.

I don't want to maintain a non-WSGI interface and I don't want to
maintain my own WSGI Twisted interface.

...

> I think the server has to be synchronous by the time it calls a WSGI 
> app.  There's nothing saying that the WSGI support in Twisted is the 
> WSGI support you have to use.

No, but I have good reasons for wanting to use it.

> My impression is that it is hard to standardize anything async-related 
> because they use slightly different conventions on how to do async 
> (e.g., Deferred vs. ad hoc callbacks).  So... whatever standardization 
> there is to be done there is probably below WSGI.

...

> Sure, but if there's only, say, 4 viable strategies and 3 serious async 
> servers (are there even that many of either?) then it's easier just to 
> figure out how to plug each strategy in on a case-by-case basis, and 
> discuss the concrete issues with the server developers.

That's what I'll do if I have to.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From foom at fuhm.net  Sat Dec 17 22:50:05 2005
From: foom at fuhm.net (James Y Knight)
Date: Sat, 17 Dec 2005 16:50:05 -0500
Subject: [Web-SIG] WSGI thread affinity/interleaving
Message-ID: <4085084D-D9F8-4A45-8C22-D34C287519AE@fuhm.net>

So this came up when I was writing the twisted WSGI support, but at  
that point I just took the most conservative view and forgot to  
revisit the issue.

1)
Take the following application:
def simple_wsgi_app(environ, start_response):
     start_response("200 OK")
     yield str(thread.get_ident())
     yield str(thread.get_ident())

Is there any guarantee that both times the iterator's .next() is  
called, they will be on the same thread?

2)
d = {}
def simple_wsgi_app(environ, start_response):
     d[thread.get_ident()] = 0
     start_response("200 OK")
     yield "Start"
     assert d[thread.get_ident()] == 0
     d[thread.get_ident] += 1
     yield "Done"

Is there any guarantee that this will work? That is, is it possible  
that at the first "yield", another application will be allowed to  
run, in the same thread?

(Of course you'd probably actually want to use threading.local, not a  
dict of thread.get_ident, but, same idea.)

In Twisted, from the first entry to an application's code, until it's  
finished, it runs on the single thread, with nothing else running on  
that thread. This means that any app which is paused from generating  
too much output data for the client will be holding up a thread. When  
the app returns an iterator, Twisted could be running other requests  
on that thread while waiting for the client to read some data, if  
thread affinity is not required.

James

From pje at telecommunity.com  Sat Dec 17 23:25:21 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 17 Dec 2005 17:25:21 -0500
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <4085084D-D9F8-4A45-8C22-D34C287519AE@fuhm.net>
Message-ID: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>

At 04:50 PM 12/17/2005 -0500, James Y Knight wrote:
>So this came up when I was writing the twisted WSGI support, but at
>that point I just took the most conservative view and forgot to
>revisit the issue.
>
>1)
>Take the following application:
>def simple_wsgi_app(environ, start_response):
>      start_response("200 OK")
>      yield str(thread.get_ident())
>      yield str(thread.get_ident())
>
>Is there any guarantee that both times the iterator's .next() is
>called, they will be on the same thread?

I thought I included something in the spec to the effect that there's no 
guarantee that each next() will be called in the same thread.  But it might 
just have been discussed and not actually edited into the spec.


>2)
>d = {}
>def simple_wsgi_app(environ, start_response):
>      d[thread.get_ident()] = 0
>      start_response("200 OK")
>      yield "Start"
>      assert d[thread.get_ident()] == 0
>      d[thread.get_ident] += 1
>      yield "Done"
>
>Is there any guarantee that this will work? That is, is it possible
>that at the first "yield", another application will be allowed to
>run, in the same thread?
>
>(Of course you'd probably actually want to use threading.local, not a
>dict of thread.get_ident, but, same idea.)

Yeah, that was the thing, I don't think we wanted to guarantee thread 
affinity across yields, either in the sense of restricting a thread for one 
app *or* an app to one thread.

This does mean that iterator-based apps can't rely on thread-local 
variables.  I've recently written a "Contextual" library that actually 
makes it easy for the task controller to manage this, by swapping a 
thread's context in and out when you switch between tasks, but of course it 
won't work for anything that doesn't use Contextual variables.  I 
originally proposed Contextual for the stdlib in a pre-PEP, but Guido waved 
it off on the basis that PEPs 342 and 343 aren't field-deployed yet and the 
usefulness is unproven.  WSGI, however, would be an example of a case where 
contextual task-locals are needed even with today's Python, sans PEPs 342 
and 343.


From foom at fuhm.net  Sun Dec 18 19:27:20 2005
From: foom at fuhm.net (James Y Knight)
Date: Sun, 18 Dec 2005 13:27:20 -0500
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>
Message-ID: <C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>

On Dec 17, 2005, at 5:25 PM, Phillip J. Eby wrote:
> Yeah, that was the thing, I don't think we wanted to guarantee  
> thread affinity across yields, either in the sense of restricting a  
> thread for one app *or* an app to one thread.
>
> This does mean that iterator-based apps can't rely on thread-local  
> variables.  I've recently written a "Contextual" library that  
> actually makes it easy for the task controller to manage this, by  
> swapping a thread's context in and out when you switch between  
> tasks, but of course it won't work for anything that doesn't use  
> Contextual variables.  I originally proposed Contextual for the  
> stdlib in a pre-PEP, but Guido waved it off on the basis that PEPs  
> 342 and 343 aren't field-deployed yet and the usefulness is  
> unproven.  WSGI, however, would be an example of a case where  
> contextual task-locals are needed even with today's Python, sans  
> PEPs 342 and 343.

I'm worried about database access. Most DBAPI adapters have  
threadsafety level 2: "Threads may share the module and  
connections.". So with those, at least, it should be fine to move a  
connection between threads, since "share OK" implies "move OK".  
However, no documentation I've found has said anything separately  
about whether it's safe to _move_ a cursor between threads. It seems  
likely to me that it would not be safe, at least in some database  
adapters. And if it's not safe, that means a WSGI result iterator  
cannot use any DBAPI cursor functionality which seems a drag.

Does anybody have practical experience with the safety of moving a  
DBAPI cursor between threads?

James


From ianb at colorstudy.com  Sun Dec 18 20:33:05 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 18 Dec 2005 13:33:05 -0600
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>
	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>
Message-ID: <43A5B971.1010408@colorstudy.com>

James Y Knight wrote:
> I'm worried about database access. Most DBAPI adapters have  
> threadsafety level 2: "Threads may share the module and  
> connections.". So with those, at least, it should be fine to move a  
> connection between threads, since "share OK" implies "move OK".  
> However, no documentation I've found has said anything separately  
> about whether it's safe to _move_ a cursor between threads. It seems  
> likely to me that it would not be safe, at least in some database  
> adapters. And if it's not safe, that means a WSGI result iterator  
> cannot use any DBAPI cursor functionality which seems a drag.
> 
> Does anybody have practical experience with the safety of moving a  
> DBAPI cursor between threads?

I haven't done that, but SQLite (2?) notably doesn't allow you to move a 
connection between threads.  I'm not actually sure what problems it 
causes if you do move them -- it may simply be an overzealous warning.

CCing DB-SIG -- people there might know more details.

-- 
Ian Bicking  |  ianb at colorstudy.com  |  http://blog.ianbicking.org

From p.f.moore at gmail.com  Sun Dec 18 22:33:12 2005
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 18 Dec 2005 21:33:12 +0000
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <43A5B971.1010408@colorstudy.com>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>
	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>
	<43A5B971.1010408@colorstudy.com>
Message-ID: <79990c6b0512181333h7445b21ch4153f127b74ca556@mail.gmail.com>

On 12/18/05, Ian Bicking <ianb at colorstudy.com> wrote:
> James Y Knight wrote:
> > Does anybody have practical experience with the safety of moving a
> > DBAPI cursor between threads?
>
> I haven't done that, but SQLite (2?) notably doesn't allow you to move a
> connection between threads.  I'm not actually sure what problems it
> causes if you do move them -- it may simply be an overzealous warning.
>
> CCing DB-SIG -- people there might know more details.

I can confirm that cx_Oracle does not like cursors being shared
between threads. I even recall crashes (but can't verify this - once I
checked and found I shouldn't be doing this, I stopped - the problem
was intermittent, as is the nature of thread bugs :-().

Paul.

From gh at ghaering.de  Sun Dec 18 23:18:23 2005
From: gh at ghaering.de (=?ISO-8859-1?Q?Gerhard_H=E4ring?=)
Date: Sun, 18 Dec 2005 23:18:23 +0100
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <43A5B971.1010408@colorstudy.com>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>
	<43A5B971.1010408@colorstudy.com>
Message-ID: <43A5E02F.4090206@ghaering.de>

Ian Bicking wrote:
> James Y Knight wrote:
>> I'm worried about database access. Most DBAPI adapters have  
>> threadsafety level 2: "Threads may share the module and  
>> connections.". So with those, at least, it should be fine to move a  
>> connection between threads, since "share OK" implies "move OK".  
>> However, no documentation I've found has said anything separately  
>> about whether it's safe to _move_ a cursor between threads. It seems  
>> likely to me that it would not be safe, at least in some database  
>> adapters. And if it's not safe, that means a WSGI result iterator  
>> cannot use any DBAPI cursor functionality which seems a drag.
>>
>> Does anybody have practical experience with the safety of moving a  
>> DBAPI cursor between threads?
> 
> I haven't done that, but SQLite (2?) notably doesn't allow you to move a 
> connection between threads.  I'm not actually sure what problems it 
> causes if you do move them -- it may simply be an overzealous warning.

It's the same for SQLite 3. The problem is, as far as I understand, that 
POSIX file locks don't work reliably when they're accessed from multiple 
threads. That's why the SQLite *docs* always said that you cannot share 
a SQLite database handle between threads.

And pysqlite as well as apsw both fire exceptions if you try to do so. 
In recent SQLite 3.x versions, SQLite itself would detect this and 
return an error on *nix too FWIW.

pysqlite does have an option to turn the check off, for people who want 
to shoot themselves in the foot. Fortunately for them, they nowadays get 
an error-message from SQLite on non-Windows systems anyway ;-)

-- Gerhard

From foom at fuhm.net  Mon Dec 19 01:34:56 2005
From: foom at fuhm.net (James Y Knight)
Date: Sun, 18 Dec 2005 19:34:56 -0500
Subject: [Web-SIG] [DB-SIG]  WSGI thread affinity/interleaving
In-Reply-To: <43A5F757.2030906@egenix.com>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>
	<43A5B971.1010408@colorstudy.com> <43A5F757.2030906@egenix.com>
Message-ID: <6B850331-4947-4824-84A3-2C04BC32BEA8@fuhm.net>

On Dec 18, 2005, at 6:57 PM, M.-A. Lemburg wrote:

> Ian Bicking wrote:
>
>> James Y Knight wrote:
>>
>>> I'm worried about database access. Most DBAPI adapters have
>>> threadsafety level 2: "Threads may share the module and
>>> connections.". So with those, at least, it should be fine to move a
>>> connection between threads, since "share OK" implies "move OK".
>>>
>
> What exactly do you mean with "move" ? Sharing a
> connection refers to multiple threads creating cursors
> on this connection.

I'm asking about moving a cursor, that is, accessing it sequentially  
first from one thread, then later from another thread. This is  
potentially asking less than sharing, that is, accessing it  
simultaneously from two threads.

For example, a simple class without any locking, that only modifies  
itself, would generally be movable between threads, but not sharable.  
Adding a mutex would make it both.

James


From foom at fuhm.net  Mon Dec 19 19:48:03 2005
From: foom at fuhm.net (James Y Knight)
Date: Mon, 19 Dec 2005 13:48:03 -0500
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <43A5B971.1010408@colorstudy.com>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>
	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>
	<43A5B971.1010408@colorstudy.com>
Message-ID: <BB89D35C-E6E1-40F2-896B-A31FFDF7AED4@fuhm.net>


On Dec 18, 2005, at 2:33 PM, Ian Bicking wrote:

> James Y Knight wrote:
>
>> I'm worried about database access. Most DBAPI adapters have   
>> threadsafety level 2: "Threads may share the module and   
>> connections.". So with those, at least, it should be fine to move  
>> a  connection between threads, since "share OK" implies "move  
>> OK".  However, no documentation I've found has said anything  
>> separately  about whether it's safe to _move_ a cursor between  
>> threads. It seems  likely to me that it would not be safe, at  
>> least in some database  adapters. And if it's not safe, that means  
>> a WSGI result iterator  cannot use any DBAPI cursor functionality  
>> which seems a drag.
>> Does anybody have practical experience with the safety of moving  
>> a  DBAPI cursor between threads?
>>
>
> I haven't done that, but SQLite (2?) notably doesn't allow you to  
> move a connection between threads.  I'm not actually sure what  
> problems it causes if you do move them -- it may simply be an  
> overzealous warning.
>
> CCing DB-SIG -- people there might know more details.

Okay, so I think the overall recommendation from DB-SIG is "don't do  
that". I'm not sure where that leaves the WSGI discussion now? "Don't  
use databases from a result iterator", I guess (unless threadsafety  
== 3)? But do anybody else's WSGI server implementations move apps  
between threads? I don't especially want to make Twisted's be unique  
in this way even if it is technically allowed, as I can only see it  
causing problems when people's apps *do* try to use databases from  
result iterators and *do* work everywhere else...

James

From fumanchu at amor.org  Mon Dec 19 20:59:28 2005
From: fumanchu at amor.org (Robert Brewer)
Date: Mon, 19 Dec 2005 11:59:28 -0800
Subject: [Web-SIG] WSGI thread affinity/interleaving
Message-ID: <6949EC6CD39F97498A57E0FA55295B2153CB79@ex9.hostedexchange.local>

James Y Knight wrote:
> >> I'm worried about database access. Most DBAPI adapters have   
> >> threadsafety level 2: "Threads may share the module and   
> >> connections.". So with those, at least, it should be fine to move  
> >> a  connection between threads, since "share OK" implies "move  
> >> OK".  However, no documentation I've found has said anything  
> >> separately  about whether it's safe to _move_ a cursor between  
> >> threads. It seems  likely to me that it would not be safe, at  
> >> least in some database  adapters. And if it's not safe, 
> that means  
> >> a WSGI result iterator  cannot use any DBAPI cursor functionality  
> >> which seems a drag.
> 
> Okay, so I think the overall recommendation from DB-SIG is "don't do  
> that". I'm not sure where that leaves the WSGI discussion 
> now? "Don't  
> use databases from a result iterator", I guess (unless threadsafety  
> == 3)? But do anybody else's WSGI server implementations move apps  
> between threads? I don't especially want to make Twisted's be unique  
> in this way even if it is technically allowed, as I can only see it  
> causing problems when people's apps *do* try to use databases from  
> result iterators and *do* work everywhere else...

I have to admit that none of the apps, servers, or gateways I've worked
on have allowed for thread-moving or -sharing. I'm pretty well convinced
that CherryPy, for example, won't be able to support that anytime
soon--thread isolation is too well baked in.

Couldn't someone write a piece of WSGI middleware that takes requests
from an async server and dispatches them to a pool of Queues? The
consumer side of the Queue would then call the WSGI app with the same
thread each time for a given request, but the async-server side would be
free to create new requests and fetch results from different threads.
Sort of an async-to-threaded bridge. I would think, even if you chose
not to build that into your WSGI wrapper, that it would be generic
enough to be quite useful for any async server + threaded app. I'll
refrain from any predictions about performance, however... ;)


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

From pje at telecommunity.com  Mon Dec 19 21:34:32 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 19 Dec 2005 15:34:32 -0500
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <6949EC6CD39F97498A57E0FA55295B2153CB79@ex9.hostedexchange. local>
Message-ID: <5.1.1.6.0.20051219152139.034a22c8@mail.telecommunity.com>

At 11:59 AM 12/19/2005 -0800, Robert Brewer wrote:
>Couldn't someone write a piece of WSGI middleware that takes requests
>from an async server and dispatches them to a pool of Queues? The
>consumer side of the Queue would then call the WSGI app with the same
>thread each time for a given request, but the async-server side would be
>free to create new requests and fetch results from different threads.
>Sort of an async-to-threaded bridge. I would think, even if you chose
>not to build that into your WSGI wrapper, that it would be generic
>enough to be quite useful for any async server + threaded app. I'll
>refrain from any predictions about performance, however... ;)

This was Jim Fulton's suggestion, and it's beginning to makes more 
sense.  :)  Unfortunately I don't think there's a reasonable way to 
integrate it with the host server's threadpool (e.g. the Twisted threadpool).

We should keep an eye, however, on the fact that the vast majority of WSGI 
apps' requests can and should be handled in a single synchronous 
iteration.  Multiple iterations are primarily useful for large files, and 
streaming/push applications.  These are the *only* reason the spec allows 
multiple writes or iterations.   Applications are supposed to do their own 
buffering in all other cases, to minimize the number of blocks shuffled up 
and down the middleware chain.

That being the case, the simplest way to ensure thread affinity in Twisted 
is to just farm out the entire processing of a given request to a 
reactor.callInThread().  The only applications for which this is not 
suitable will be large files and streaming/push, which will tie up threads 
that they probably shouldn't.  To handle those use cases, a customized 
threadpool mechanism would be needed, wherein each thread would have an 
event loop going over the currently active iterators and adding new ones 
from a master request queue whenever the thread-local queue dropped below a 
threshold.


From renesd at gmail.com  Mon Dec 19 22:45:11 2005
From: renesd at gmail.com (Rene Dudfield)
Date: Tue, 20 Dec 2005 08:45:11 +1100
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <5.1.1.6.0.20051219152139.034a22c8@mail.telecommunity.com>
References: <5.1.1.6.0.20051219152139.034a22c8@mail.telecommunity.com>
Message-ID: <64ddb72c0512191345l7f5295f6o141dc3aee0931560@mail.gmail.com>

Large files should just return a file.  So that the file descriptor is
available for the most efficient sending.

So you could use sendfile(2), or another helper process send the file.


On 12/20/05, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 11:59 AM 12/19/2005 -0800, Robert Brewer wrote:
> >Couldn't someone write a piece of WSGI middleware that takes requests
> >from an async server and dispatches them to a pool of Queues? The
> >consumer side of the Queue would then call the WSGI app with the same
> >thread each time for a given request, but the async-server side would be
> >free to create new requests and fetch results from different threads.
> >Sort of an async-to-threaded bridge. I would think, even if you chose
> >not to build that into your WSGI wrapper, that it would be generic
> >enough to be quite useful for any async server + threaded app. I'll
> >refrain from any predictions about performance, however... ;)
>
> This was Jim Fulton's suggestion, and it's beginning to makes more
> sense.  :)  Unfortunately I don't think there's a reasonable way to
> integrate it with the host server's threadpool (e.g. the Twisted threadpool).
>
> We should keep an eye, however, on the fact that the vast majority of WSGI
> apps' requests can and should be handled in a single synchronous
> iteration.  Multiple iterations are primarily useful for large files, and
> streaming/push applications.  These are the *only* reason the spec allows
> multiple writes or iterations.   Applications are supposed to do their own
> buffering in all other cases, to minimize the number of blocks shuffled up
> and down the middleware chain.
>
> That being the case, the simplest way to ensure thread affinity in Twisted
> is to just farm out the entire processing of a given request to a
> reactor.callInThread().  The only applications for which this is not
> suitable will be large files and streaming/push, which will tie up threads
> that they probably shouldn't.  To handle those use cases, a customized
> threadpool mechanism would be needed, wherein each thread would have an
> event loop going over the currently active iterators and adding new ones
> from a master request queue whenever the thread-local queue dropped below a
> threshold.
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/renesd%40gmail.com
>

From foom at fuhm.net  Mon Dec 19 23:22:09 2005
From: foom at fuhm.net (James Y Knight)
Date: Mon, 19 Dec 2005 17:22:09 -0500
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <5.1.1.6.0.20051219152139.034a22c8@mail.telecommunity.com>
References: <5.1.1.6.0.20051219152139.034a22c8@mail.telecommunity.com>
Message-ID: <31A611AA-5C46-4516-AF89-9FCDF054FFCE@fuhm.net>

On Dec 19, 2005, at 3:34 PM, Phillip J. Eby wrote:
> We should keep an eye, however, on the fact that the vast majority  
> of WSGI apps' requests can and should be handled in a single  
> synchronous iteration.  Multiple iterations are primarily useful  
> for large files, and streaming/push applications.  These are the  
> *only* reason the spec allows multiple writes or iterations.    
> Applications are supposed to do their own buffering in all other  
> cases, to minimize the number of blocks shuffled up and down the  
> middleware chain.
>
> That being the case, the simplest way to ensure thread affinity in  
> Twisted is to just farm out the entire processing of a given  
> request to a reactor.callInThread().

Yes, this is how it works currently. I was pondering relaxing that,  
if the spec allowed. I'm now pretty much convinced that WSGI servers  
_should not_ move applications among threads between yields of the  
result iterator, and thus, will be leaving the twisted code that  
handles this alone. Even though the requirement is not stated in the  
spec, it seems to be a practical requirement.

> The only applications for which this is not suitable will be large  
> files and streaming/push, which will tie up threads that they  
> probably shouldn't.

Large files is already supported by wsgi.file_wrapper, at least if  
you're not fiddling with the file as it goes through. That leaves  
streaming/push, which I'm not sure is a big enough use case to  
actually care about. At least IMO, if you want efficient streaming  
support without using up a bunch of threads, use twisted's APIs  
directly rather than some yet-to-be-invented WSGI extension.

James


From fumanchu at amor.org  Mon Dec 19 23:23:46 2005
From: fumanchu at amor.org (Robert Brewer)
Date: Mon, 19 Dec 2005 14:23:46 -0800
Subject: [Web-SIG] WSGI thread affinity/interleaving
Message-ID: <6949EC6CD39F97498A57E0FA55295B2153CB7D@ex9.hostedexchange.local>

Rene Dudfield wrote:
> Large files should just return a file.  So that the file descriptor is
> available for the most efficient sending.
> 
> So you could use sendfile(2), or another helper process send the file.

Large *files*, perhaps, but using HTTP for static files is so 2001 ;).
The "streaming/push" requirement is more important to me. Just this
morning, one of my users ran a large report (which I thought had been
set up to stream its output, but isn't doing that now). He specifically
asked that it not wait to be completely-formed before rendering:

    The GSR used to build immediately on the screen when choosing
    "Current Trips". Now the entire GSR builds "off screen"...then
    pops on the screen when it is completely built. This makes for
    a lot of waiting. Can we get the GSR to build immediately on
    the screen again?


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

From pje at telecommunity.com  Mon Dec 19 23:25:05 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 19 Dec 2005 17:25:05 -0500
Subject: [Web-SIG] WSGI thread affinity/interleaving
In-Reply-To: <64ddb72c0512191345l7f5295f6o141dc3aee0931560@mail.gmail.co
 m>
References: <5.1.1.6.0.20051219152139.034a22c8@mail.telecommunity.com>
	<5.1.1.6.0.20051219152139.034a22c8@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20051219172250.0209c488@mail.telecommunity.com>

At 08:45 AM 12/20/2005 +1100, Rene Dudfield wrote:
>Large files should just return a file.  So that the file descriptor is
>available for the most efficient sending.
>
>So you could use sendfile(2), or another helper process send the file.

This isn't an option for e.g. files stored in a database (including ZODB), 
or generated on the fly, although I suppose you could use a temporary 
file.  Even with a temporary file, however, it doesn't address streaming/push.


From jim at zope.com  Wed Dec 21 16:20:44 2005
From: jim at zope.com (Jim Fulton)
Date: Wed, 21 Dec 2005 10:20:44 -0500
Subject: [Web-SIG] Is the size argument to the input-stream read method
 optional?
In-Reply-To: <43A1CB1D.7000900@colorstudy.com>
References: <43A1BCE9.8020403@zope.com> <43A1CB1D.7000900@colorstudy.com>
Message-ID: <43A972CC.9090204@zope.com>

Ian Bicking wrote:
> Jim Fulton wrote:
> 
>> The PEP is unclear on this and should be clarified, IMO.
> 
> 
> My experience in using implementations is many servers do not require 
> the read size argument (they don't give a TypeError), but they block 
> without it, or if you read past CONTENT_LENGTH.  So it should probably 
> be required in the spec, since it's required in practice.

Does this constitude a decision?  Can somebody update the PEP?
I am able and willing to if requested to. :)

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From jim at zope.com  Wed Dec 21 17:25:05 2005
From: jim at zope.com (Jim Fulton)
Date: Wed, 21 Dec 2005 11:25:05 -0500
Subject: [Web-SIG] Questions/suggestions on 'wsgi.file_wrapper'
Message-ID: <43A981E1.4090609@zope.com>


Here are some questions and sugesstions on the 'wsgi.file_wrapper'
part of the WSGI API:

1. Does this need to be optional?  It seems that it would be
    easy for any server to provide this, it would be nice for
    applications to be able to rely in it.

2. If the file-like object passed has a close method, wouldn't
    it be acceptable for the iterator returned by wsgi.file_wrapper
    to close it when iteration is done?

    I would slightly prefer:

    "It may have a close() method, and if so, the iterable returned by
    wsgi.file_wrapper must have a close() method that invokes the original
    file-like object's close() method, or the iterable must close the file
    when the file-like object's read method returns no data."

    I prefer this because it allows a simple generator implementation of
    a default wsgi.file_wrapper.

3. The server should be allowed to use the file wrapper in a different
    thread than the one used to run the application. This should be noted.
    Applications should not return file-like objects that rely on running
    in the same thread.  This too should be noted.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From pje at telecommunity.com  Wed Dec 21 18:29:13 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 21 Dec 2005 12:29:13 -0500
Subject: [Web-SIG] Is the size argument to the input-stream read method
 optional?
In-Reply-To: <43A972CC.9090204@zope.com>
References: <43A1CB1D.7000900@colorstudy.com> <43A1BCE9.8020403@zope.com>
	<43A1CB1D.7000900@colorstudy.com>
Message-ID: <5.1.1.6.0.20051221122030.03cf9858@mail.telecommunity.com>

At 10:20 AM 12/21/2005 -0500, Jim Fulton wrote:
>Ian Bicking wrote:
> > Jim Fulton wrote:
> >
> >> The PEP is unclear on this and should be clarified, IMO.
> >
> >
> > My experience in using implementations is many servers do not require
> > the read size argument (they don't give a TypeError), but they block
> > without it, or if you read past CONTENT_LENGTH.  So it should probably
> > be required in the spec, since it's required in practice.
>
>Does this constitude a decision?  Can somebody update the PEP?

I thought the PEP was actually pretty clear on this already.  It says that 
the application should not attempt to read more data than is specified by 
CONTENT_LENGTH - which means that you can't omit the read() argument and 
avoid that.  An application that omits the argument is therefore off-spec, 
and a server is thus well within its rights to reject this.  As far as I 
know, there is also no circumstance under which a previously-working 
application (using CGI or some similar protocol) would be able to use 
read() without an argument and work correctly with any non-ancient version 
of HTTP.

I'm happy to entertain suggestions for language that would make this more 
obvious.  How about just adding """The "size" argument is required and must 
be a positive integer.""" to the existing note 1?


From jim at zope.com  Wed Dec 21 18:38:26 2005
From: jim at zope.com (Jim Fulton)
Date: Wed, 21 Dec 2005 12:38:26 -0500
Subject: [Web-SIG] Is the size argument to the input-stream read method
 optional?
In-Reply-To: <5.1.1.6.0.20051221122030.03cf9858@mail.telecommunity.com>
References: <43A1CB1D.7000900@colorstudy.com> <43A1BCE9.8020403@zope.com>
	<43A1CB1D.7000900@colorstudy.com>
	<5.1.1.6.0.20051221122030.03cf9858@mail.telecommunity.com>
Message-ID: <43A99312.1060502@zope.com>

Phillip J. Eby wrote:
> At 10:20 AM 12/21/2005 -0500, Jim Fulton wrote:
> 
>> Ian Bicking wrote:
>> > Jim Fulton wrote:
>> >
>> >> The PEP is unclear on this and should be clarified, IMO.
>> >
>> >
>> > My experience in using implementations is many servers do not require
>> > the read size argument (they don't give a TypeError), but they block
>> > without it, or if you read past CONTENT_LENGTH.  So it should probably
>> > be required in the spec, since it's required in practice.
>>
>> Does this constitude a decision?  Can somebody update the PEP?
> 
> 
> I thought the PEP was actually pretty clear on this already.  It says 
> that the application should not attempt to read more data than is 
> specified by CONTENT_LENGTH - which means that you can't omit the read() 
> argument and avoid that.  An application that omits the argument is 
> therefore off-spec, and a server is thus well within its rights to 
> reject this.  As far as I know, there is also no circumstance under 
> which a previously-working application (using CGI or some similar 
> protocol) would be able to use read() without an argument and work 
> correctly with any non-ancient version of HTTP.

In Zope and twisted's wsgi server implementation, the input read method
treats the character at position content length (counting from 1) as the
last character in the file.  So read without argument reads the remaining
characters up to the content length.  This isn't inconsistent with the
current language.

> I'm happy to entertain suggestions for language that would make this 
> more obvious.  How about just adding """The "size" argument is required 
> and must be a positive integer.""" to the existing note 1?

I think this is an improvement. +1.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From pje at telecommunity.com  Wed Dec 21 18:41:39 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 21 Dec 2005 12:41:39 -0500
Subject: [Web-SIG] Questions/suggestions on 'wsgi.file_wrapper'
In-Reply-To: <43A981E1.4090609@zope.com>
Message-ID: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>

At 11:25 AM 12/21/2005 -0500, Jim Fulton wrote:

>Here are some questions and sugesstions on the 'wsgi.file_wrapper'
>part of the WSGI API:
>
>1. Does this need to be optional?  It seems that it would be
>     easy for any server to provide this, it would be nice for
>     applications to be able to rely in it.

It's intentionally optional because its presence signifies that the server 
can do things *better* than the application, if and only if the object is a 
"real" operating system file or other "special" object.  The only reason 
the spec requires only a "file-like" object rather than an object with a 
valid "fileno()" method, is because somebody wanted to support Jython 
objects wrapping Java sio(?) objects, for a Java equivalent of sendfile().


>2. If the file-like object passed has a close method, wouldn't
>     it be acceptable for the iterator returned by wsgi.file_wrapper
>     to close it when iteration is done?
>
>     I would slightly prefer:
>
>     "It may have a close() method, and if so, the iterable returned by
>     wsgi.file_wrapper must have a close() method that invokes the original
>     file-like object's close() method, or the iterable must close the file
>     when the file-like object's read method returns no data."
>
>     I prefer this because it allows a simple generator implementation of
>     a default wsgi.file_wrapper.

I'm sorry, I don't understand what you're asking for here.  I think maybe 
you have a misunderstanding about why the spec is arranged the way it is 
here.  It is intended to ensure that any middleware between the server and 
the application will be able to treat the wrapper as a valid WSGI 
application return value.  The server is allowed to strip off the wrapper, 
if that's in fact what it receives.  But the wrapper has to be a 100% valid 
WSGI return value, or middleware will get confused.  The server must also 
only do special handling *if* it receives the wrapper as a return value; it 
can't assume that just because you called file_wrapper() that it is going 
to use that handler.

If I understand your suggestion correctly, you're asking to change that in 
a way that disallows early closing, and I don't think that should be 
allowed.  If the file has a close(), any middleware involved needs to be 
allowed to call it.


>3. The server should be allowed to use the file wrapper in a different
>     thread than the one used to run the application. This should be noted.
>     Applications should not return file-like objects that rely on running
>     in the same thread.  This too should be noted.

This seems reasonable to me.  For the actual use cases file_wrapper was 
intended to support (sendfile() and the Java equivalent thereof) this 
should be no problem at all.


From jim at zope.com  Wed Dec 21 19:06:50 2005
From: jim at zope.com (Jim Fulton)
Date: Wed, 21 Dec 2005 13:06:50 -0500
Subject: [Web-SIG] Questions/suggestions on 'wsgi.file_wrapper'
In-Reply-To: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
References: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
Message-ID: <43A999BA.9040207@zope.com>

Phillip J. Eby wrote:
> At 11:25 AM 12/21/2005 -0500, Jim Fulton wrote:
> 
>> Here are some questions and sugesstions on the 'wsgi.file_wrapper'
>> part of the WSGI API:
>>
>> 1. Does this need to be optional?  It seems that it would be
>>     easy for any server to provide this, it would be nice for
>>     applications to be able to rely in it.
> 
> 
> It's intentionally optional because its presence signifies that the 
> server can do things *better* than the application, if and only if the 
> object is a "real" operating system file or other "special" object.  The 
> only reason the spec requires only a "file-like" object rather than an 
> object with a valid "fileno()" method, is because somebody wanted to 
> support Jython objects wrapping Java sio(?) objects, for a Java 
> equivalent of sendfile().

I guess I'm puzzled how the server can fail to do at least as well
as the application.  Can you think of a case where an application wants to
output a file and can do better than a simple fallback iterator provided
by the server?

> 
>> 2. If the file-like object passed has a close method, wouldn't
...

> If I understand your suggestion correctly, you're asking to change that 
> in a way that disallows early closing, and I don't think that should be 
> allowed. 

Ah! I see. Good point. OK, I withdraw my suggestion.

...

>> 3. The server should be allowed to use the file wrapper in a different
>>     thread than the one used to run the application. This should be 
>> noted.
>>     Applications should not return file-like objects that rely on running
>>     in the same thread.  This too should be noted.
> 
> 
> This seems reasonable to me.  For the actual use cases file_wrapper was 
> intended to support (sendfile() and the Java equivalent thereof) this 
> should be no problem at all.

Cool.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From pje at telecommunity.com  Wed Dec 21 19:31:05 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 21 Dec 2005 13:31:05 -0500
Subject: [Web-SIG] Questions/suggestions on 'wsgi.file_wrapper'
In-Reply-To: <43A999BA.9040207@zope.com>
References: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
	<5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20051221132221.02102060@mail.telecommunity.com>

At 01:06 PM 12/21/2005 -0500, Jim Fulton wrote:
>Phillip J. Eby wrote:
>>At 11:25 AM 12/21/2005 -0500, Jim Fulton wrote:
>>
>>>Here are some questions and sugesstions on the 'wsgi.file_wrapper'
>>>part of the WSGI API:
>>>
>>>1. Does this need to be optional?  It seems that it would be
>>>     easy for any server to provide this, it would be nice for
>>>     applications to be able to rely in it.
>>
>>It's intentionally optional because its presence signifies that the 
>>server can do things *better* than the application, if and only if the 
>>object is a "real" operating system file or other "special" object.  The 
>>only reason the spec requires only a "file-like" object rather than an 
>>object with a valid "fileno()" method, is because somebody wanted to 
>>support Jython objects wrapping Java sio(?) objects, for a Java 
>>equivalent of sendfile().
>
>I guess I'm puzzled how the server can fail to do at least as well
>as the application.  Can you think of a case where an application wants to
>output a file and can do better than a simple fallback iterator provided
>by the server?

Again, file_wrapper was created as an optional hack to allow sendfile() and 
java.nio.FileChannel to work.  It's a little late to go back and make it 
required unless we want to start trying to make a WSGI 1.1 spec.

At this point, it's optional because it was optional and everybody's gone 
and implemented servers that either do or don't comply with the existing 
spec.  We're not really in a position to change the spec without a new 
spec.  About a year ago the SIG consensus was basically, "it's done; 
anything from here on out has to be either a clarification of something 
already decided, or addition of new optional features (like an async API)".

Once that was done, people have been making implementations left and right, 
so it's not fair to go back and make them retroactively noncompliant for 
not implementing an explicitly optional feature.


From jim at zope.com  Wed Dec 21 19:49:48 2005
From: jim at zope.com (Jim Fulton)
Date: Wed, 21 Dec 2005 13:49:48 -0500
Subject: [Web-SIG] Questions/suggestions on 'wsgi.file_wrapper'
In-Reply-To: <5.1.1.6.0.20051221132221.02102060@mail.telecommunity.com>
References: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
	<5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
	<5.1.1.6.0.20051221132221.02102060@mail.telecommunity.com>
Message-ID: <43A9A3CC.4090801@zope.com>

Phillip J. Eby wrote:
> At 01:06 PM 12/21/2005 -0500, Jim Fulton wrote:
> 
>> Phillip J. Eby wrote:
>>
>>> At 11:25 AM 12/21/2005 -0500, Jim Fulton wrote:
>>>
>>>> Here are some questions and sugesstions on the 'wsgi.file_wrapper'
>>>> part of the WSGI API:
>>>>
>>>> 1. Does this need to be optional?  It seems that it would be
>>>>     easy for any server to provide this, it would be nice for
>>>>     applications to be able to rely in it.
>>>
>>>
>>> It's intentionally optional because its presence signifies that the 
>>> server can do things *better* than the application, if and only if 
>>> the object is a "real" operating system file or other "special" 
>>> object.  The only reason the spec requires only a "file-like" object 
>>> rather than an object with a valid "fileno()" method, is because 
>>> somebody wanted to support Jython objects wrapping Java sio(?) 
>>> objects, for a Java equivalent of sendfile().
>>
>>
>> I guess I'm puzzled how the server can fail to do at least as well
>> as the application.  Can you think of a case where an application 
>> wants to
>> output a file and can do better than a simple fallback iterator provided
>> by the server?
> 
> 
> Again, file_wrapper was created as an optional hack to allow sendfile() 
> and java.nio.FileChannel to work.  It's a little late to go back and 
> make it required unless we want to start trying to make a WSGI 1.1 spec.
> 
> At this point, it's optional because it was optional and everybody's 
> gone and implemented servers that either do or don't comply with the 
> existing spec.  We're not really in a position to change the spec 
> without a new spec.  About a year ago the SIG consensus was basically, 
> "it's done; anything from here on out has to be either a clarification 
> of something already decided, or addition of new optional features (like 
> an async API)".
> 
> Once that was done, people have been making implementations left and 
> right, so it's not fair to go back and make them retroactively 
> noncompliant for not implementing an explicitly optional feature.

That's a fair point.  I suggest it is something to consider in a
later rev of the PEP, but I don't think it alone would justify
a later rev.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From jim at zope.com  Wed Dec 21 20:17:07 2005
From: jim at zope.com (Jim Fulton)
Date: Wed, 21 Dec 2005 14:17:07 -0500
Subject: [Web-SIG] Should system environment variables appear in a WSGI
	environ?
Message-ID: <43A9AA33.1090001@zope.com>


The PEP describes CGI and WSGI ("wsgi.") environment variables that must
and should be included. It also describes a mechanism for the server to
add server-specific environment variables.  It doesn't explicitly say
that the server should not include other environment variables, such as
process environment variables.  It does say that all additional variables
it provides should be documented, which could be construed to mean that
it shouldn't add additional variables. :)

Would it be reasonable to say that a server should not include process
environment variables?

Zope currently exposes most of the environment it's given and I don't
want to expose process environment variables.  I'm wondering
if I need to cleanse the environment I'm given.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From ianb at colorstudy.com  Wed Dec 21 20:22:02 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed, 21 Dec 2005 13:22:02 -0600
Subject: [Web-SIG] WSGI: QUERY_STRING and cgi stdlib module
Message-ID: <43A9AB5A.30803@colorstudy.com>

I thought I'd note that in testing I noticed that if QUERY_STRING is 
missing the cgi module falls back on sys.argv, which is aweful.  WSGI 
says QUERY_STRING is optional, but if you pass the WSGI environment to 
cgi.FieldStorage you get this bug.  Should QUERY_STRING just be 
required?  It's almost always set anyways.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From pje at telecommunity.com  Wed Dec 21 20:40:12 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 21 Dec 2005 14:40:12 -0500
Subject: [Web-SIG] Should system environment variables appear in a WSGI
 environ?
In-Reply-To: <43A9AA33.1090001@zope.com>
Message-ID: <5.1.1.6.0.20051221143530.02098510@mail.telecommunity.com>

At 02:17 PM 12/21/2005 -0500, Jim Fulton wrote:
>The PEP describes CGI and WSGI ("wsgi.") environment variables that must
>and should be included. It also describes a mechanism for the server to
>add server-specific environment variables.  It doesn't explicitly say
>that the server should not include other environment variables, such as
>process environment variables.  It does say that all additional variables
>it provides should be documented, which could be construed to mean that
>it shouldn't add additional variables. :)

The intent was to say that if you provide additional CGI-like variables 
(like HTTPS=on and SSL_PROTOCOL), you should document them.


>Would it be reasonable to say that a server should not include process
>environment variables?

No; the spec explicitly says the server can, and strongly implies they 
should as a way to allow configuration of applications that expect to use 
their environment as configuration.  See:

http://www.python.org/peps/pep-0333.html#application-configuration

"""Servers and gateways should support this by allowing an application's 
deployer to specify name-value pairs to be placed in environ. In the 
simplest case, this support can consist merely of copying all operating 
system-supplied environment variables from os.environ into the environ."""

So, if you cleanse the process environment, you should provide an 
alternative way for application deployers to put name-value pairs into the 
environ.


From mso at oz.net  Thu Dec 22 04:32:28 2005
From: mso at oz.net (Mike Orr)
Date: Wed, 21 Dec 2005 19:32:28 -0800
Subject: [Web-SIG] Questions/suggestions on 'wsgi.file_wrapper'
In-Reply-To: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
References: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
Message-ID: <43AA1E4C.6050601@oz.net>

Phillip J. Eby wrote:

>At 11:25 AM 12/21/2005 -0500, Jim Fulton wrote:
>
>  
>
>>Here are some questions and sugesstions on the 'wsgi.file_wrapper'
>>part of the WSGI API:
>>
>>1. Does this need to be optional?  It seems that it would be
>>    easy for any server to provide this, it would be nice for
>>    applications to be able to rely in it.
>>    
>>
>
>It's intentionally optional because its presence signifies that the server 
>can do things *better* than the application, if and only if the object is a 
>"real" operating system file or other "special" object.  The only reason 
>the spec requires only a "file-like" object rather than an object with a 
>valid "fileno()" method, is because somebody wanted to support Jython 
>objects wrapping Java sio(?) objects, for a Java equivalent of sendfile().
>  
>

Allowing a file-like object like StringIO also allows the environment to 
be pickled and sent to another process.  This lets a Python web server 
talk directly to a Python application server using WSGI, rather than 
having to kludge through SCGI and then repackage it to WSGI.  I don't 
know of any web servers that do this yet but it would be a shame to lose 
the capability.

If we require a file object, the environment becomes non-pickleable 
because you can't serialize an open file.  SCGI uses passfd, which 
somehow works, but not on Windows.  If we require .fileno(), one could 
have an object that quickly writes the content to a file and passes that 
fileno, but I don't see what that gains.

-- Mike Orr

From pje at telecommunity.com  Thu Dec 22 04:53:10 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 21 Dec 2005 22:53:10 -0500
Subject: [Web-SIG] Questions/suggestions on 'wsgi.file_wrapper'
In-Reply-To: <43AA1E4C.6050601@oz.net>
References: <5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
	<5.1.1.6.0.20051221122927.0278a5b8@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20051221225120.020ed5b8@mail.telecommunity.com>

At 07:32 PM 12/21/2005 -0800, Mike Orr wrote:
>Phillip J. Eby wrote:
>
> >At 11:25 AM 12/21/2005 -0500, Jim Fulton wrote:
> >
> >
> >
> >>Here are some questions and sugesstions on the 'wsgi.file_wrapper'
> >>part of the WSGI API:
> >>
> >>1. Does this need to be optional?  It seems that it would be
> >>    easy for any server to provide this, it would be nice for
> >>    applications to be able to rely in it.
> >>
> >>
> >
> >It's intentionally optional because its presence signifies that the server
> >can do things *better* than the application, if and only if the object is a
> >"real" operating system file or other "special" object.  The only reason
> >the spec requires only a "file-like" object rather than an object with a
> >valid "fileno()" method, is because somebody wanted to support Jython
> >objects wrapping Java sio(?) objects, for a Java equivalent of sendfile().
> >
> >
>
>Allowing a file-like object like StringIO also allows the environment to
>be pickled and sent to another process.  This lets a Python web server
>talk directly to a Python application server using WSGI, rather than
>having to kludge through SCGI and then repackage it to WSGI.  I don't
>know of any web servers that do this yet but it would be a shame to lose
>the capability.
>
>If we require a file object, the environment becomes non-pickleable
>because you can't serialize an open file.  SCGI uses passfd, which
>somehow works, but not on Windows.  If we require .fileno(), one could
>have an object that quickly writes the content to a file and passes that
>fileno, but I don't see what that gains.

I think perhaps you've confused the 'file_wrapper' API with the file-like 
objects in the environment.  The discussion above is about 'file_wrapper' 
objects *returned* by the application, not the input/stderr objects in the 
environment.


From kai.keliikuli at gmail.com  Thu Dec 22 18:24:59 2005
From: kai.keliikuli at gmail.com (kai)
Date: Thu, 22 Dec 2005 12:24:59 -0500
Subject: [Web-SIG] transaction  progress with cgi.FieldStorage
Message-ID: <43AAE16B.9040006@gmail.com>

Hi All,
this is my first post on this list. I am working on a way to monitor the 
progress of reading a file upload from wsgi.input.  I can currently 
monitor the overall transfer and when individual files of a multiple 
file upload are completed. The ultimate goal of this is to be able to 
display a progress meter when someone is uploading a file.

To do this I subclassed cgi.FieldStorage but when I finished I had 
modified most of the non-trivial methods just to hook in something to 
monitor the transfer progress, oops.

Has anyone else found FieldStorage insufficient for certain tasks?
Is there a general need for a more flexible FieldStorage replacement?


kai keliikuli

From tsoehnli at gmu.edu  Sat Dec 24 18:53:16 2005
From: tsoehnli at gmu.edu (tsoehnli@gmu.edu)
Date: Sat, 24 Dec 2005 12:53:16 -0500
Subject: [Web-SIG] cgi.fieldstorage
In-Reply-To: <mailman.6.1135335603.15778.web-sig@python.org>
References: <mailman.6.1135335603.15778.web-sig@python.org>
Message-ID: <f620dc797ce38.43ad44bc@gmu.edu>

I found cgi library to be too bulky for cgi actually. Its load time was enought to double the processing time of my scripts.  I changed that tho, by hand recoding most of everything, and removed certain things, like regular expressions, from the process, and replaced it with urllib's fast quote and unquote.    Right now to process file uploads is quite simple, though I have even seen some more modularized versions.  If you would like a copy of mine, I would be more than happy to email it to you, but yeah, the standard cgi lib is not all that great, and performance is weak.

From cce at clarkevans.com  Sun Dec 25 04:45:34 2005
From: cce at clarkevans.com (Clark C. Evans)
Date: Sat, 24 Dec 2005 22:45:34 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
Message-ID: <20051225034534.GA88508@prometheusresearch.com>

Why is response_headers a list instead of a dict?

>From RFC 2616 Section 4.2:

    The order in which header fields with differing field names are
    received is not significant. However, it is "good practice" to send
    general-header fields first, followed by request-header or response-
    header fields, and ending with the entity-header fields.

    Multiple message-header fields with the same field-name MAY be
    present in a message if and only if the entire field-value for that
    header field is defined as a comma-separated list [i.e., #(values)].
    It MUST be possible to combine the multiple header fields into one
    "field-name: field-value" pair, without changing the semantics of
    the message, by appending each subsequent field-value to the first,
    each separated by a comma. 

In other words: (a) order does not matter, (b) it is reasonable to
restrict a header field to a single (header_name, header_value) pair.
Indeed, according to the specification, a HTTP Proxy could re-arrange
headers and condense N headers of the same type by simply concatenating
their values with a comma.

I'm asking this because it is quite painful (and very much an unnecessary 
pain) to work with headers in complex WSGI-based middleware applications.

Kind Regards,

Clark

From foom at fuhm.net  Sun Dec 25 05:48:39 2005
From: foom at fuhm.net (James Y Knight)
Date: Sat, 24 Dec 2005 23:48:39 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <20051225034534.GA88508@prometheusresearch.com>
References: <20051225034534.GA88508@prometheusresearch.com>
Message-ID: <3BFED61A-97DE-48EB-BC10-152861D988B5@fuhm.net>


On Dec 24, 2005, at 10:45 PM, Clark C. Evans wrote:

> Why is response_headers a list instead of a dict?
>
> [ RFC quote ]
>
> In other words: (a) order does not matter

True, order between headers does not matter.

> (b) it is reasonable to
> restrict a header field to a single (header_name, header_value) pair.

Yes, the RFC says that, and I certainly wish it were true, but it's  
simply not. The RFC lies. The primary example is the Set-Cookie  
header, which by _definition_ cannot be combined, as it uses an  
unquoted date which includes a comma. Also, multiple WWW-Authenticate  
headers should be okay to combine, but I've heard rumors of UAs being  
confused by that format.

WSGI could have spec'd a dictionary of lists of strings, rather than  
a list of strings, but it did not. You can transform the result into  
that if you like...

James

From pje at telecommunity.com  Sun Dec 25 06:13:09 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 25 Dec 2005 00:13:09 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <20051225034534.GA88508@prometheusresearch.com>
Message-ID: <5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>

At 10:45 PM 12/24/2005 -0500, Clark C. Evans wrote:
>Why is response_headers a list instead of a dict?

The short answer is because of "Set-Cookie:" headers, and quoting issues 
with the 'expires' parameter.  The slightly longer answer is that it gives 
the application more control of the response, which may be important to 
work around bugs in browsers, caches, and proxies currently deployed in the 
field.  :(


From cce at clarkevans.com  Sun Dec 25 19:04:29 2005
From: cce at clarkevans.com (Clark C. Evans)
Date: Sun, 25 Dec 2005 13:04:29 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
References: <20051225034534.GA88508@prometheusresearch.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
Message-ID: <20051225180429.GA93279@prometheusresearch.com>

I'm going to play the devil's advocate here; although I really love
WSGI -- I think this particular decision is a wart and will greatly
hinder adoption. 

On Sun, Dec 25, 2005 at 12:13:09AM -0500, Phillip J. Eby wrote:
| At 10:45 PM 12/24/2005 -0500, Clark C. Evans wrote:
| >Why is response_headers a list instead of a dict?
| 
| The short answer is because of "Set-Cookie:" headers, and quoting issues 
| with the 'expires' parameter.  The slightly longer answer is that it 
| gives the application more control of the response, which may be 
| important to work around bugs in browsers, caches, and proxies currently 
| deployed in the field.  :(

You are, of course, referring to the horribly old Netscape Specification
for Set-Cookie, http://wp.netscape.com/newsref/std/cookie_spec.html.
I'd like to note that RFC 2109 (1997), and RFC 2965 (2000) have no such
problems.  Just about every major browser out there supports max-age
parameter instead of "Expires".  Doing a quick "unofficial" survey of 
major websites, 'max-age' usage (RFC 2109) is the most common usage,
as it is far easier for server implementations to specify an age in 
seconds rather than compute a GMT timestamp. 

More control over the response is fine; but really, this should be 
in the domain of web-server software -- which will have much more eyes
on it and has a greater chance of being correct and handling variants
among browsers.  For example, Twisted or the Zope community have a much
better chance of making WSGI work in pratice if they are given the
freedom to re-arrange the Headers (splitting or joining as appropiriate)
to match browsers which commonly visit their site.

In this particular case, you've taken control from the writers of the
web-server software (who have much greater chance of getting it right)
and given it to framework/application writers -- which have a much 
larger chance of not reading the specifciations correctly or not having
enough deployment experience to cover browser quirks.

On Sat, Dec 24, 2005 at 11:48:39PM -0500, James Y Knight wrote:
| On Dec 24, 2005, at 10:45 PM, Clark C. Evans wrote:
| >Why is response_headers a list instead of a dict?
| >
| >[ RFC quote ]
| >
| >In other words: (a) order does not matter
| 
| True, order between headers does not matter.

Yes, however, the HTTP/1.1 specification explicitly suggests that
general headers come first, then request/response headers, followed by
entity headers.  It also recommends that headers take a "common form"
when sent by servers (that is, in Camel-Dash-Case, except ones like ETag
or WWW-Authenticate). I think that server platforms should be able to
implement these suggestions so that applications/frameworks don't have
to be bothered with such details.

| >(b) it is reasonable to
| >restrict a header field to a single (header_name, header_value) pair.
| 
| Yes, the RFC says that, and I certainly wish it were true, but it's  
| simply not. The RFC lies. The primary example is the Set-Cookie  
| header, which by _definition_ cannot be combined, as it uses an  
| unquoted date which includes a comma.

This seems to be the only use-case for the decision.  If it is that
important; make it an exception.  A small bit of code for 'Set-Cookie',
if it is even necessary (I contend that it isn't), is an acceptable
price to pay for simpler WSGI applications.

| Also, multiple WWW-Authenticate  
| headers should be okay to combine, but I've heard rumors of UAs being  
| confused by that format.

First, there is _nothing_ preventing a Server (such as Zope or Twisted)
handling this case by splitting out comma-separated WWW-Authenticate or
Set-Cookie (RFC 2109, or even the _broken_ netscape spec with a very
small amount of code) into mutltiple lines.

Second, is combination of needing _multiple_ WWW-Authenticate headers on
that particular User-Agent a real-live use case? 

Frankly -- this is programming for Edge Cases; it is a 1% issue and your
average Framework/Application developer won't do it, or if they do do
it, it will most likely be done incorrectly.  It's not like the servers
we have to run WSGI apps are closed-source, non-responsive.  The Twisted
and Zope team (among others) are very quick at making things work.

| WSGI could have spec'd a dictionary of lists of strings, rather than  
| a list of strings, but it did not. You can transform the result into  
| that if you like...

Well, I agree it should have been a dictionary (with lower-case keys).
I don't think that a list would have been helpful; 90% of the time
you're dealing with something that isn't a list.  And when it is a list,
appending ",mystuff" to the list isn't that hard.

Kind Regards,

Clark

From pje at telecommunity.com  Sun Dec 25 20:13:00 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 25 Dec 2005 14:13:00 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <20051225180429.GA93279@prometheusresearch.com>
References: <5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<20051225034534.GA88508@prometheusresearch.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>

At 01:04 PM 12/25/2005 -0500, Clark C. Evans wrote:
>More control over the response is fine; but really, this should be
>in the domain of web-server software -- which will have much more eyes
>on it and has a greater chance of being correct and handling variants
>among browsers.  For example, Twisted or the Zope community have a much
>better chance of making WSGI work in pratice if they are given the
>freedom to re-arrange the Headers (splitting or joining as appropiriate)
>to match browsers which commonly visit their site.
>
>In this particular case, you've taken control from the writers of the
>web-server software (who have much greater chance of getting it right)
>and given it to framework/application writers -- which have a much
>larger chance of not reading the specifciations correctly or not having
>enough deployment experience to cover browser quirks.

WSGI puts this particular power in the application writer's hands, because 
then *they* can fix a problem.  If it's in the server author's hands, the 
application writer can be screwed, whether the server is open source or not.


>I think that server platforms should be able to
>implement these suggestions so that applications/frameworks don't have
>to be bothered with such details.

WSGI is not designed - and is definitely not intended! - to encourage 
writing new web frameworks.


>This seems to be the only use-case for the decision.  If it is that
>important; make it an exception.  A small bit of code for 'Set-Cookie',
>if it is even necessary (I contend that it isn't), is an acceptable
>price to pay for simpler WSGI applications.

No, I'm sorry, but it's not.  Read the PEP again, which explains why having 
a nicer API for the application side was never a goal - in fact, it was an 
explicit *anti*-goal.  Having it be ugly and primitive was both necessary 
and intentional.

Ironically, headers are the one use case where I felt we could make an 
exception to the "crude is better" principle, but was argued down by 
others.  I had originally proposed using an email.Message object to manage 
headers, since it had all the needed functionality (including the necessary 
ordering control), but others argued that it's easy enough for a framework 
to do that itself, and that in any case email.Message had too many 
distracting non-HTTP-header methods.


>Frankly -- this is programming for Edge Cases; it is a 1% issue and your
>average Framework/Application developer won't do it, or if they do do
>it, it will most likely be done incorrectly.  It's not like the servers
>we have to run WSGI apps are closed-source, non-responsive.  The Twisted
>and Zope team (among others) are very quick at making things work.

FYI, If I understand correctly, Jim Fulton has stated that Zope isn't going 
to *have* a server in the future, if they can avoid it.

In any case, the point is moot; this isn't a compatible change to the spec, 
so it would have to wait for a WSGI 2.0.

Note that in any case, every framework, application, or middleware is free 
to invent its own solution for managing headers - and most already had one 
before WSGI came into being.  As written, the WSGI spec allows those 
existing applications and frameworks to produce the same output that they 
used to.  Backward compatibility with field-deployed software was a key 
criterion for WSGI design decisions.  Moving from a non-WSGI interface to 
WSGI should not alter an application's output unnecessarily.

If you want a friendly API for WSGI header management, please see the 
wsgiref.headers.Headers class, which offers a dictionary-like interface to 
manipulate a WSGI header list.


From cce at clarkevans.com  Sun Dec 25 20:21:59 2005
From: cce at clarkevans.com (Clark C. Evans)
Date: Sun, 25 Dec 2005 14:21:59 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
References: <5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<20051225034534.GA88508@prometheusresearch.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
Message-ID: <20051225192159.GA93982@prometheusresearch.com>

Thank you for taking time to respond Phillip.

On Sun, Dec 25, 2005 at 02:13:00PM -0500, Phillip J. Eby wrote:
| WSGI puts this particular power in the application writer's hands, 
| because then *they* can fix a problem.  If it's in the server author's 
| hands, the application writer can be screwed, whether the server is open 
| source or not.
}
| Having it be ugly and primitive was both necessary and intentional.

Ok.

| In any case, the point is moot; this isn't a compatible change to the 
| spec, so it would have to wait for a WSGI 2.0.

Right; it's quite a large change.  Also, my sample set was limited to
mostly sites that didn't use 'long-lasting' cookies.  It seems that
Microsoft's SDK still uses 'expires' in their Set-Cookie header [1],
despite almost 8 years of it being expliclty removed from the RFC.

| If you want a friendly API for WSGI header management, please see the 
| wsgiref.headers.Headers class, which offers a dictionary-like interface 
| to manipulate a WSGI header list.

I'll have a look at it; thanks.

Best,

Clark

[1] http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wininet/wininet/http_cookies.asp

From cce at clarkevans.com  Sun Dec 25 20:51:23 2005
From: cce at clarkevans.com (Clark C. Evans)
Date: Sun, 25 Dec 2005 14:51:23 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
References: <5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<20051225034534.GA88508@prometheusresearch.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
Message-ID: <20051225195123.GA95491@prometheusresearch.com>

On Sun, Dec 25, 2005 at 02:13:00PM -0500, Phillip J. Eby wrote:
| In any case, the point is moot; this isn't a compatible change to the 
| spec, so it would have to wait for a WSGI 2.0.

In paragraph #3 of the "start_response()" definition, it states that
type(response_headers) is ListType.  I'm wondering if you'd be willing
to modify this to isinstance(response_headers, list)?

A similar assertion is not made about `environ` parameters, only that 
it is a 'dictionary'.  Could a server or middleware provide a special
environment handler object (as long as isinstance(environ, dict))?

The idea is that these two objects could be customized to provide
low-level RFC support and helper methods; but yet still be 'list of
tuples' and 'dictionary' as required by the WSGI specification.

For example:

  (a) the specialized `environ` could provide attributes which 
      get common HTTP_HEADERs; or raise an error if they do not
      exist -- this would prevent spelling mistakes.

  (b) the specialized `headers` could override the list[selector]
      to take a string argumnet, doing a lookup and replacement; 
      it could also do HTTP Header checking, etc.

Of course, the goal of these objects would be to present the _normal_
dict and list interfaces so that intermediate WSGI applications that
didn't know about the specialization would remain unaffected.

With Python 2.2's __new__ operator, this could be done transparently
at each level, where the intermediate object "adorns" the underlying
native representation.

   my_start_response(status, response_headers):
       response_headers = ResponseHeaders(response_headers)
       response_headers['My-Header'] = 'some-value'
       response_headers.set_content_disposition(filename="bing",inline=True)
       ...

The ResponseHeader class in this case would derive from 'list', and be a
valid WSGI list-of-tuples; for those that know it is a ResponseHeaders
however, they can use the goodness and type-checking provided.  The
implementation of ResponseHeaders() constructor is simple; if the object
is already a ResponseHeaders, it returns self -- otherwise, it
constructs the wrapper as needed.

Kind Regards,

Clark

From pje at telecommunity.com  Mon Dec 26 01:26:58 2005
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 25 Dec 2005 19:26:58 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <20051225195123.GA95491@prometheusresearch.com>
References: <5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<20051225034534.GA88508@prometheusresearch.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20051225190848.0220b018@mail.telecommunity.com>

At 02:51 PM 12/25/2005 -0500, Clark C. Evans wrote:
>On Sun, Dec 25, 2005 at 02:13:00PM -0500, Phillip J. Eby wrote:
>| In any case, the point is moot; this isn't a compatible change to the
>| spec, so it would have to wait for a WSGI 2.0.
>
>In paragraph #3 of the "start_response()" definition, it states that
>type(response_headers) is ListType.  I'm wondering if you'd be willing
>to modify this to isinstance(response_headers, list)?

No.  :)  See below.


>A similar assertion is not made about `environ` parameters, only that
>it is a 'dictionary'.

 From http://www.python.org/peps/pep-0333.html#specification-details :

"""This object must be a builtin Python dictionary (not a subclass, 
UserDict or other dictionary emulation),..."""


>   Could a server or middleware provide a special
>environment handler object (as long as isinstance(environ, dict))?

No; this is explicitly forbidden.  See also Q&A item #1, under:

http://www.python.org/peps/pep-0333.html#questions-and-answers

A different argument applies to the headers list, but it's even worse in 
the headers case.  There is essentially zero probability that a server is 
going to be able to make use of any auxiliary methods of a headers object, 
and it would be crazy for the server to try and introspect to see which of 
the dozens of possible header extensions *might* exist.

The simple solution for code which wants a higher-level interface to either 
environ or headers is to wrap the raw data structures in its own 
enhancements - such as a request and response object.  This is what maybe 
99% of existing applications and frameworks do, so there was no sense in 
duplicating this in WSGI.  Meanwhile, optional features and flexibility are 
things to be *avoided* in a low-level protocol like this, if at all possible.


From foom at fuhm.net  Mon Dec 26 17:59:16 2005
From: foom at fuhm.net (James Y Knight)
Date: Mon, 26 Dec 2005 11:59:16 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <20051225192159.GA93982@prometheusresearch.com>
References: <5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<20051225034534.GA88508@prometheusresearch.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
	<20051225192159.GA93982@prometheusresearch.com>
Message-ID: <77DD120D-CC63-4FFA-BC68-01EF57F162EB@fuhm.net>


On Dec 25, 2005, at 2:21 PM, Clark C. Evans wrote:
> It seems that
> Microsoft's SDK still uses 'expires' in their Set-Cookie header [1],
> despite almost 8 years of it being expliclty removed from the RFC.

Sorry, but, despite the RFC writers best efforts, the newer RFCs are  
almost universally ignored by servers, frameworks, and browsers. When  
I looked a few months ago, even mozilla did not support the new  
cookie RFC. I think Opera is the only browser that does. Netscape  
cookies are unfortunately still the de facto standard.

James

From cce at clarkevans.com  Tue Dec 27 21:38:21 2005
From: cce at clarkevans.com (Clark C. Evans)
Date: Tue, 27 Dec 2005 15:38:21 -0500
Subject: [Web-SIG] Why is response_headers a list instead of a dict?
In-Reply-To: <5.1.1.6.0.20051225190848.0220b018@mail.telecommunity.com>
References: <5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<20051225034534.GA88508@prometheusresearch.com>
	<5.1.1.6.0.20051224233526.03d0cf60@mail.telecommunity.com>
	<5.1.1.6.0.20051225135446.0220b2b8@mail.telecommunity.com>
	<5.1.1.6.0.20051225190848.0220b018@mail.telecommunity.com>
Message-ID: <20051227203821.GA28430@prometheusresearch.com>

Phillip,

Thank you for humoring the discussion (I realize it was covered in the
PEP).  I've since found a solution which covers my requirements of
making header access easier in ``environ`` and ``response_headers`` yet
keeping to the spirt of WSGI (but I'll let you be the final judge).  It
involves turning header "definitions" into objects:

  http://svn.w4py.org/Paste/trunk/paste/httpheaders.py
  http://svn.w4py.org/Paste/trunk/tests/test_httpheaders.py

Anyway, the final result is actually much better than I expected, it is 
far more modular/extendable than the extensions/wrappers I had started
to implement earlier.  So, I must thank you for sticking to your policy;
despite my complaints earlier, it seems to be a very wise choice.

Kind Regars,

Clark

P.S. The work above is usable; but incomplete in a few minor ways.  It
will soon be getting concrete (rather than generic) implementations for
the more complicated HTTP headers that I work with: Content-Disposition,
Cache-Control, Set-Cookie, etc.  Suggestions, of course, are very
welcome.  At this time the module has no dependencies, and so far the
rest of Paste does not depend upon it, however, if Ian agrees, much of
paste could be re-configured to use this module (especially the
fileapp.py module which is one of the motivators).

From foom at fuhm.net  Wed Dec 28 16:34:51 2005
From: foom at fuhm.net (James Y Knight)
Date: Wed, 28 Dec 2005 10:34:51 -0500
Subject: [Web-SIG] Is the size argument to the input-stream read method
	optional?
In-Reply-To: <43A972CC.9090204@zope.com>
References: <43A1BCE9.8020403@zope.com> <43A1CB1D.7000900@colorstudy.com>
	<43A972CC.9090204@zope.com>
Message-ID: <74365E33-DC78-461D-A880-6B4580548C22@fuhm.net>


On Dec 21, 2005, at 10:20 AM, Jim Fulton wrote:

> Ian Bicking wrote:
>
>> Jim Fulton wrote:
>>
>>
>>> The PEP is unclear on this and should be clarified, IMO.
>>>
>>
>>
>> My experience in using implementations is many servers do not require
>> the read size argument (they don't give a TypeError), but they block
>> without it, or if you read past CONTENT_LENGTH.  So it should  
>> probably
>> be required in the spec, since it's required in practice.
>>
>
> Does this constitude a decision?  Can somebody update the PEP?
> I am able and willing to if requested to. :)

Surely that's a bug in the server, not the spec? Indeterminate length  
uploads (with transfer-encoding chunked) are allowed by HTTP, after  
all. The CGI spec explicitly rejects such requests, but WSGI doesn't  
seem to.

James


From ianb at colorstudy.com  Wed Dec 28 19:14:25 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed, 28 Dec 2005 12:14:25 -0600
Subject: [Web-SIG] Is the size argument to the input-stream read method
 optional?
In-Reply-To: <74365E33-DC78-461D-A880-6B4580548C22@fuhm.net>
References: <43A1BCE9.8020403@zope.com> <43A1CB1D.7000900@colorstudy.com>
	<43A972CC.9090204@zope.com>
	<74365E33-DC78-461D-A880-6B4580548C22@fuhm.net>
Message-ID: <43B2D601.30007@colorstudy.com>

James Y Knight wrote:
>>>> The PEP is unclear on this and should be clarified, IMO.
>>>>
>>>
>>>
>>> My experience in using implementations is many servers do not require
>>> the read size argument (they don't give a TypeError), but they block
>>> without it, or if you read past CONTENT_LENGTH.  So it should  probably
>>> be required in the spec, since it's required in practice.
>>>
>>
>> Does this constitude a decision?  Can somebody update the PEP?
>> I am able and willing to if requested to. :)
> 
> 
> Surely that's a bug in the server, not the spec? Indeterminate length  
> uploads (with transfer-encoding chunked) are allowed by HTTP, after  
> all. The CGI spec explicitly rejects such requests, but WSGI doesn't  
> seem to.

But while it is possible, if an application uses this then it won't be 
portable, right?  I think chunking has been explicitly excluded from 
WSGI too, as something that should be handled/isolated in the server. 
Not that I really know much about chunking, except that it was discussed 
at one point.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From foom at fuhm.net  Wed Dec 28 19:36:48 2005
From: foom at fuhm.net (James Y Knight)
Date: Wed, 28 Dec 2005 13:36:48 -0500
Subject: [Web-SIG] Is the size argument to the input-stream read method
	optional?
In-Reply-To: <43B2D601.30007@colorstudy.com>
References: <43A1BCE9.8020403@zope.com> <43A1CB1D.7000900@colorstudy.com>
	<43A972CC.9090204@zope.com>
	<74365E33-DC78-461D-A880-6B4580548C22@fuhm.net>
	<43B2D601.30007@colorstudy.com>
Message-ID: <B854A1F8-58EF-485F-ABD9-E9CB8089C17E@fuhm.net>


On Dec 28, 2005, at 1:14 PM, Ian Bicking wrote:
>> Surely that's a bug in the server, not the spec? Indeterminate  
>> length  uploads (with transfer-encoding chunked) are allowed by  
>> HTTP, after  all. The CGI spec explicitly rejects such requests,  
>> but WSGI doesn't  seem to.
>>
>
> But while it is possible, if an application uses this then it won't  
> be portable, right?  I think chunking has been explicitly excluded  
> from WSGI too, as something that should be handled/isolated in the  
> server. Not that I really know much about chunking, except that it  
> was discussed at one point.

The server handles the unchunking, but the unchunked stream it passes  
to the client has no content-length. The only way to indicate when  
the stream is done is via EOF.

It doesn't seem a good idea to me for the WSGI spec to disallow  
chunked uploads. The reason it's disallowed in the CGI spec is that  
it was added to HTTP after CGI was defined. There's no similar excuse  
for WSGI.

However, I see that in the spec, indeterminate length uploads have  
already been disallowed implicitly, by not requiring the server to  
return EOF from reads at the end of the stream:
"The server is not required to read past the client's specified  
Content-Length, and is allowed to simulate an end-of-file condition  
if the application attempts to read past that point. The application  
SHOULD NOT attempt to read more data than is specified by the  
CONTENT_LENGTH variable."

If the client cannot depend on an EOF at the end of the stream, it  
cannot read a stream without a length. I'd much rather it say  
something like:
"The server MUST NOT read past the end of the request, and MUST  
simulate an end-of-file condition if the application attempts to read  
past that point. Attempting to read from an input stream when no data  
has been provided MUST result in an end-of-file result (the empty  
string)."

but it doesn't. At least the spec does allow the server to implement  
read correctly.

James


From cce at clarkevans.com  Thu Dec 29 16:44:08 2005
From: cce at clarkevans.com (Clark C. Evans)
Date: Thu, 29 Dec 2005 10:44:08 -0500
Subject: [Web-SIG] WSGI and Content-Type
Message-ID: <20051229154408.GA20693@prometheusresearch.com>

I'm puzzled why CONTENT_TYPE/CONTENT_LENGTH is listed as an ``environ``
CGI variable when it seems the corresponding corresponding
HTTP_CONTENT_TYPE/HTTP_CONTENT_LENGTH would work.  Is there a reason for
this redundancy?  Which one should I use?  If they differ, which one is
correct? 

Kind Regards,

Clark

From ianb at colorstudy.com  Thu Dec 29 17:27:14 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 29 Dec 2005 10:27:14 -0600
Subject: [Web-SIG] WSGI and Content-Type
In-Reply-To: <20051229154408.GA20693@prometheusresearch.com>
References: <20051229154408.GA20693@prometheusresearch.com>
Message-ID: <43B40E62.90209@colorstudy.com>

Clark C. Evans wrote:
> I'm puzzled why CONTENT_TYPE/CONTENT_LENGTH is listed as an ``environ``
> CGI variable when it seems the corresponding corresponding
> HTTP_CONTENT_TYPE/HTTP_CONTENT_LENGTH would work.  Is there a reason for
> this redundancy?  Which one should I use?  If they differ, which one is
> correct? 

Probably HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH shouldn't be in 
there, and they should be ignored if they are in there.  CGI translates 
all headers by adding HTTP_, except for these two particular headers. 
WSGI is just following CGI on this one.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From ianb at colorstudy.com  Thu Dec 29 17:38:27 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 29 Dec 2005 10:38:27 -0600
Subject: [Web-SIG] transaction  progress with cgi.FieldStorage
In-Reply-To: <43AAE16B.9040006@gmail.com>
References: <43AAE16B.9040006@gmail.com>
Message-ID: <43B41103.1040308@colorstudy.com>

kai wrote:
> Hi All,
> this is my first post on this list. I am working on a way to monitor the 
> progress of reading a file upload from wsgi.input.  I can currently 
> monitor the overall transfer and when individual files of a multiple 
> file upload are completed. The ultimate goal of this is to be able to 
> display a progress meter when someone is uploading a file.
> 
> To do this I subclassed cgi.FieldStorage but when I finished I had 
> modified most of the non-trivial methods just to hook in something to 
> monitor the transfer progress, oops.
> 
> Has anyone else found FieldStorage insufficient for certain tasks?
> Is there a general need for a more flexible FieldStorage replacement?

Incidentally, one way I've considered implementing this is to simply 
write the entire request body to a file, and parse it later, probably in 
the context of whatever framework I'm using (but typical web frameworks 
don't actually deal well with tracking an upload, hence a custom WSGI 
application).

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From ianb at colorstudy.com  Thu Dec 29 17:51:52 2005
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu, 29 Dec 2005 10:51:52 -0600
Subject: [Web-SIG] WSGI and Content-Type
In-Reply-To: <20051229154408.GA20693@prometheusresearch.com>
References: <20051229154408.GA20693@prometheusresearch.com>
Message-ID: <43B41428.3080100@colorstudy.com>

Clark C. Evans wrote:
> I'm puzzled why CONTENT_TYPE/CONTENT_LENGTH is listed as an ``environ``
> CGI variable when it seems the corresponding corresponding
> HTTP_CONTENT_TYPE/HTTP_CONTENT_LENGTH would work.  Is there a reason for
> this redundancy?  Which one should I use?  If they differ, which one is
> correct? 

Incidentally, I've added a check for QUERY_STRING (missing QUERY_STRING 
causes buggy cgi module behavior, per my previous email) and 
HTTP_CONTENT_TYPE/LENGTH to paste.lint (it's an error now, but maybe it 
should be a warning).  I'd encourage people to use it to check server 
and application behavior (it just watches things go by, it doesn't 
effect the request).

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

From cce at clarkevans.com  Fri Dec 30 00:31:26 2005
From: cce at clarkevans.com (Clark C. Evans)
Date: Thu, 29 Dec 2005 18:31:26 -0500
Subject: [Web-SIG] transaction  progress with cgi.FieldStorage
In-Reply-To: <43AAE16B.9040006@gmail.com>
References: <43AAE16B.9040006@gmail.com>
Message-ID: <20051229233126.GA24311@prometheusresearch.com>

On Thu, Dec 22, 2005 at 12:24:59PM -0500, kai wrote:
| this is my first post on this list. I am working on a way to monitor the 
| progress of reading a file upload from wsgi.input.  I can currently 
| monitor the overall transfer and when individual files of a multiple 
| file upload are completed. The ultimate goal of this is to be able to 
| display a progress meter when someone is uploading a file.

You could do this in a few stages:

  #1 Use an async XMLHttpRequest on the client side to POST
     the file to your file upload servlet; in the URL for the
     post use a unique identifier, say MY-ID

  #2 Override make_file /w your own that monitors how much
     of the file's content has been sent; store that in a 
     global mapping using MY-ID as the key

  #3 Create a monitor URL on your server that reads the mapping
     and returns an hour glass or something /w a refresh page

  #4 When you send your application request; open up an iframe
     /w refresh setting to that monitor URL (using MY-ID)

Although, you've probably already done something similar...

| To do this I subclassed cgi.FieldStorage but when I finished I had 
| modified most of the non-trivial methods just to hook in something to 
| monitor the transfer progress, oops.
| 
| Has anyone else found FieldStorage insufficient for certain tasks?
| Is there a general need for a more flexible FieldStorage replacement?

I've found make_file sufficient for all of my needs (so far)

Best,

Clark

From janssen at parc.com  Fri Dec 30 20:37:29 2005
From: janssen at parc.com (Bill Janssen)
Date: Fri, 30 Dec 2005 11:37:29 PST
Subject: [Web-SIG] WSGI for Medusa?
Message-ID: <05Dec30.113733pst."58633"@synergy1.parc.xerox.com>

If no one has done a WSGI implementation for Medusa, I think I'll take
a shot at it this weekend...

Bill

From kai.keliikuli at gmail.com  Sat Dec 31 02:56:16 2005
From: kai.keliikuli at gmail.com (kai)
Date: Fri, 30 Dec 2005 20:56:16 -0500
Subject: [Web-SIG] transaction  progress with cgi.FieldStorage
In-Reply-To: <43B41103.1040308@colorstudy.com>
References: <43AAE16B.9040006@gmail.com> <43B41103.1040308@colorstudy.com>
Message-ID: <43B5E540.8030709@gmail.com>


> Incidentally, one way I've considered implementing this is to simply 
> write the entire request body to a file, and parse it later, probably in 
> the context of whatever framework I'm using (but typical web frameworks 
> don't actually deal well with tracking an upload, hence a custom WSGI 
> application).

I put aside my rewrite of FieldStorage and went this route. I'm working
on this using lighttpd and the flup wsgi implementation. When I do an
upload though I'm seeing a delay before I start getting a progress read
it seems like all the data is getting to the server and only then is
environ['wsgi.input'] available. I'm looking at this just using a print 
statement in the loop I use to read in data.  So when I upload a 10 MB 
file. It sits for about 2.5 minutes then bursts the progress read all at 
once in under a second.  I need to investigate more may very well be me
doing something silly.

An aside on cgi.FieldStorage itself. It reads data using readline 
instead of reading in blocks of limited size. doing this I think means
a file with very long lines, 20MB, 100MB, ... could cause excessive 
memory consumption.

Kai

From chad at zetaweb.com  Sat Dec 31 06:21:29 2005
From: chad at zetaweb.com (Chad Whitacre)
Date: Sat, 31 Dec 2005 00:21:29 -0500
Subject: [Web-SIG] transaction  progress with cgi.FieldStorage
In-Reply-To: <43B5E540.8030709@gmail.com>
References: <43AAE16B.9040006@gmail.com> <43B41103.1040308@colorstudy.com>
	<43B5E540.8030709@gmail.com>
Message-ID: <43B61559.3050304@zetaweb.com>

> I need to investigate more may very well be me
> doing something silly.

Are your prints buffered? sys.stdout.flush()


chad


From chrism at plope.com  Sat Dec 31 06:50:48 2005
From: chrism at plope.com (Chris McDonough)
Date: Sat, 31 Dec 2005 00:50:48 -0500
Subject: [Web-SIG] transaction  progress with cgi.FieldStorage
In-Reply-To: <43B5E540.8030709@gmail.com>
References: <43AAE16B.9040006@gmail.com> <43B41103.1040308@colorstudy.com>
	<43B5E540.8030709@gmail.com>
Message-ID: <DE77A4BD-CB11-45AB-BFC2-C35EBAACB3E0@plope.com>

> An aside on cgi.FieldStorage itself. It reads data using readline
> instead of reading in blocks of limited size. doing this I think means
> a file with very long lines, 20MB, 100MB, ... could cause excessive
> memory consumption.

This was reported and solved a long time ago (but not yet fixed in  
any Python distro):

https://sourceforge.net/tracker/? 
func=detail&aid=1112549&group_id=5470&atid=105470


From mal at egenix.com  Mon Dec 19 00:57:13 2005
From: mal at egenix.com (M.-A. Lemburg)
Date: Sun, 18 Dec 2005 23:57:13 -0000
Subject: [Web-SIG] [DB-SIG]  WSGI thread affinity/interleaving
In-Reply-To: <43A5B971.1010408@colorstudy.com>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>
	<43A5B971.1010408@colorstudy.com>
Message-ID: <43A5F757.2030906@egenix.com>

Ian Bicking wrote:
> James Y Knight wrote:
>> I'm worried about database access. Most DBAPI adapters have  
>> threadsafety level 2: "Threads may share the module and  
>> connections.". So with those, at least, it should be fine to move a  
>> connection between threads, since "share OK" implies "move OK".  

What exactly do you mean with "move" ? Sharing a
connection refers to multiple threads creating cursors
on this connection.

>> However, no documentation I've found has said anything separately  
>> about whether it's safe to _move_ a cursor between threads. It seems  
>> likely to me that it would not be safe, at least in some database  
>> adapters.

Thread level 3 adapters would allow for sharing a cursor
meaning that you can call cursor.execute() from different
threads.

Given that you usually already have to be careful with
sharing connections, sharing cursors is rather unlikely
to work in a general setting.

> And if it's not safe, that means a WSGI result iterator  
>> cannot use any DBAPI cursor functionality which seems a drag.
>>
>> Does anybody have practical experience with the safety of moving a  
>> DBAPI cursor between threads?
> 
> I haven't done that, but SQLite (2?) notably doesn't allow you to move a 
> connection between threads.  I'm not actually sure what problems it 
> causes if you do move them -- it may simply be an overzealous warning.
> 
> CCing DB-SIG -- people there might know more details.

Sharing cursors is possible with some database drivers and
can be used to e.g. pool cursors with prepared commands.

mxODBC does support this if the ODBC driver is thread-safe
(which it should be if it adheres to the ODBC standard).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 19 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mal at egenix.com  Mon Dec 19 15:17:51 2005
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 19 Dec 2005 14:17:51 -0000
Subject: [Web-SIG] [DB-SIG]  WSGI thread affinity/interleaving
In-Reply-To: <6B850331-4947-4824-84A3-2C04BC32BEA8@fuhm.net>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>	<43A5B971.1010408@colorstudy.com>
	<43A5F757.2030906@egenix.com>
	<6B850331-4947-4824-84A3-2C04BC32BEA8@fuhm.net>
Message-ID: <43A6C10D.5000503@egenix.com>

James Y Knight wrote:
> On Dec 18, 2005, at 6:57 PM, M.-A. Lemburg wrote:
> 
>> Ian Bicking wrote:
>>
>>> James Y Knight wrote:
>>>
>>>> I'm worried about database access. Most DBAPI adapters have
>>>> threadsafety level 2: "Threads may share the module and
>>>> connections.". So with those, at least, it should be fine to move a
>>>> connection between threads, since "share OK" implies "move OK".
>>>>
>> What exactly do you mean with "move" ? Sharing a
>> connection refers to multiple threads creating cursors
>> on this connection.
> 
> I'm asking about moving a cursor, that is, accessing it sequentially  
> first from one thread, then later from another thread. This is  
> potentially asking less than sharing, that is, accessing it  
> simultaneously from two threads.
> 
> For example, a simple class without any locking, that only modifies  
> itself, would generally be movable between threads, but not sharable.  
> Adding a mutex would make it both.

Ok. In that sense, I think "moving" is not really possible
with database connections or cursors: these always rely on
external resources and these may be relying on having the
same thread context around when being called.

Why would you want to "move" cursors or connections around ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 19 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mal at egenix.com  Sat Dec 31 13:13:15 2005
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 31 Dec 2005 12:13:15 -0000
Subject: [Web-SIG] [DB-SIG]  WSGI thread affinity/interleaving
In-Reply-To: <ca471dc20512302110m54caf74cq559555d36259dd35@mail.gmail.com>
References: <5.1.1.6.0.20051217171840.01e1ab80@mail.telecommunity.com>	
	<C43AF2EA-B969-40EC-94BE-EB41201C129F@fuhm.net>	
	<43A5B971.1010408@colorstudy.com> <43A5F757.2030906@egenix.com>	
	<6B850331-4947-4824-84A3-2C04BC32BEA8@fuhm.net>	
	<43A6C10D.5000503@egenix.com>
	<ca471dc20512302110m54caf74cq559555d36259dd35@mail.gmail.com>
Message-ID: <43B675D9.4090708@egenix.com>

Guido van Rossum wrote:
> On 12/19/05, M.-A. Lemburg <mal at egenix.com> wrote:
>> Ok. In that sense, I think "moving" is not really possible
>> with database connections or cursors: these always rely on
>> external resources and these may be relying on having the
>> same thread context around when being called.
>>
>> Why would you want to "move" cursors or connections around ?
> 
> A typical connection (or cursor) caching implementation used from a
> multi-threaded program might easily do this: a resource is created in
> one thread, used for a while, then given back to the cache; when
> another thread requests a resource, it gets one from the cache that
> might have been used previously in a different thread. Keeping a cache
> per thread is a bit cumbersome and reduces the efficacy of the cache
> (if a thread goes away all the resources cached on its behalf must be
> closed instead of being made available to other threads).
>
> I'm not sure I understand what resources a typical DB client library
> might have that are associated with a thread instead of with a
> connection or cursor -- IOW I don't understand why you think moving
> resources between threads would be a problem, as long as only one
> thread "owns" them at any time. IOW if I maintain my own locking, why
> would I still be limited in sharing connections/cursors between
> threads? What am I missing?

All this would be easily possible if the Python cursor object
had full control over the external resources in use.

However, most Python database cursor objects rely on external
libraries and therefore do not have control over where state
is stored.

If you use a resource from a different thread than the one
where it was created, this can cause situations where part
of the state is missing, or worse, a different state is
used.

Many database libraries do their own caching at various
levels (network connection, logical connections, cursors).
Not all of them are fully thread-safe and its hard to find
out.

To be on the safe side, you should only use connections from
the thread they were created with.

It's still worthwhile to cache the connections (and even
cursors on that connection):

* connecting to a database can take anything from micro-seconds
to several seconds;

* preparing statements for execution on a cursor also takes
  time (and in most cases costs a network roundtrip), so
  caching already prepared cursors also makes sense for
  commonly used statements.

The latter is especially useful with bound parameters since
the database will usually only have to prepare the statement
once and can then take any number of parameter sets to
execute the statement with.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 31 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::