From evdo.hsdpa at gmail.com  Thu Oct  5 01:29:10 2006
From: evdo.hsdpa at gmail.com (Robert Kim Wireless Internet Advisor)
Date: Wed, 4 Oct 2006 16:29:10 -0700
Subject: [Web-SIG] ruby rails / python dev needed for small webapp
Message-ID: <1ec620e90610041629p1d28fe6cx1c939b355abc11c5@mail.gmail.com>

any body got time to build out a suuuper simple webapp?

-- 
Robert Q Kim, Internet Advisor Provider
http://wireless-internet-access-provider.com
http://evdo-coverage.com
2611 S. Pacific Coast Highway 101
Suite 203
Cardiff by the Sea, CA 92007
206 984 0880

From michael.kerrin at openapp.biz  Thu Oct  5 12:07:43 2006
From: michael.kerrin at openapp.biz (Michael Kerrin)
Date: Thu, 5 Oct 2006 11:07:43 +0100
Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility
In-Reply-To: <ca471dc20609291231o60293553w6622187903ba784e@mail.gmail.com>
References: <451D1D22.5090607@openapp.biz>
	<ca471dc20609291231o60293553w6622187903ba784e@mail.gmail.com>
Message-ID: <200610051107.43507.michael.kerrin@openapp.biz>

Hi,

On Friday 29 September 2006 20:31, Guido van Rossum wrote:
> On 9/29/06, Michael Kerrin <michael.kerrin at openapp.biz> wrote:
> >   But the current implementation of cgi.FieldStorage in the 2.4.4 branch
> > and on Python 2.5 does call readline with the size argument. It has
> > started to do this in response to the Python bug #1112549 -
> > cgi.FieldStorage memory usage can spike in line-oriented ops. See
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id
> >=5470&atid=105470
> >
> >   Since it is reasonable for a WSGI application to use cgi.FieldStorage
> > I am wondering whether cgi.FieldStorage or the WSGI specification needs
> > to changed in order to solve this incompatibility.
> >
> >   Originally I thought it was cgi.FieldStorage that needs to be changed,
> > and hence tried to fix it by wrapping the input stream so that the
> > readline method always uses the read method on the input stream. While
> > this seems to work for me it introduces a level of complexity in the
> > cgi.py file, and possible some other bugs, that makes me think that
> > adding the size argument for readline into the WSGI specification isn't
> > such bad idea after all.
>
> Since that change to cgi.py was a security fix I would strongly
> recommend not to remove it and to change the WSGI spec instead.
I wasn't recommending to remove that fix but instead I was trying get around 
both problems by using the read method on the input stream instead of the 
readline method. Since there are no problems passing the size argument to the 
read method.

I think the best thing to do for now is to open a bug report on sourceforge.

Thanks
Michael


-- 
Michael Kerrin

55 Fitzwilliam Sq.,
Dublin 2.

Tel: 087 688 3894

From janssen at parc.com  Wed Oct 18 00:48:12 2006
From: janssen at parc.com (Bill Janssen)
Date: Tue, 17 Oct 2006 15:48:12 PDT
Subject: [Web-SIG] WSGI -- usable for other protocols?
Message-ID: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com>

I've been working on Python IMAP server that uses PyLucene for
indexing.  It's mainly IMAP, but also speaks a bit of HTTP for an
administrative interface.  Does it make any sense to wrap it with
WSGI?  That is, does WSGI make sense for other protocols than HTTP
(specifically IMAP)?

And, what WSGI-supporting environments will also support PyLucene (the
limiting factor is that the GCJ runtime has to be linked in, and all
threads must be GCJ threads).

Bill

From ianb at colorstudy.com  Wed Oct 18 01:25:46 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 17 Oct 2006 18:25:46 -0500
Subject: [Web-SIG] WSGI -- usable for other protocols?
In-Reply-To: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com>
References: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com>
Message-ID: <4535667A.8010603@colorstudy.com>

Bill Janssen wrote:
> I've been working on Python IMAP server that uses PyLucene for
> indexing.  It's mainly IMAP, but also speaks a bit of HTTP for an
> administrative interface.  Does it make any sense to wrap it with
> WSGI?  That is, does WSGI make sense for other protocols than HTTP
> (specifically IMAP)?

I would probably wrap it in WSGI, because that would please me.  For 
something like IMAP, FTP, etc., you'd have to have some persistent 
server that holds the connection open, then turns certain commands into 
requests.  I've been thinking about doing this for dbus 
(http://www.freedesktop.org/wiki/Software/dbus)

But I dunno... is there some WSGI libraries you'd like to leverage in 
your IMAP server?  Do you want to maintain a IMAP server with a parallel 
HTTP interface?  Anyway, I don't think it would be particularly hard to do.

> And, what WSGI-supporting environments will also support PyLucene (the
> limiting factor is that the GCJ runtime has to be linked in, and all
> threads must be GCJ threads).

Yikes, not sure about that.  Can the normal threads communicate via some 
queue to gcj threads?  Otherwise the WSGI server is where the threads 
are handled, so it would require tweaking some server for that (none use 
gcj threads currently).

You'd also need a WSGI server that handled IMAP and persistent 
connections.  So maybe another server is called for, or an adaptation of 
an existing multi-protocol server.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From janssen at parc.com  Wed Oct 18 02:03:05 2006
From: janssen at parc.com (Bill Janssen)
Date: Tue, 17 Oct 2006 17:03:05 PDT
Subject: [Web-SIG] WSGI -- usable for other protocols?
In-Reply-To: Your message of "Tue, 17 Oct 2006 16:25:46 PDT."
	<4535667A.8010603@colorstudy.com> 
Message-ID: <06Oct17.170311pdt."58648"@synergy1.parc.xerox.com>

> You'd also need a WSGI server that handled IMAP and persistent 
> connections.  So maybe another server is called for, or an adaptation of 
> an existing multi-protocol server.

That's my tentative conclusion.  The WSGI handling doesn't really
match the IMAP connection requests very well.  I figured I'd adapt
Medusa for this, again; set up an HTTP handler and an IMAP handler.

But I thought I'd check the wisdom of the crowd, first.

Bill

From luke.arno at gmail.com  Wed Oct 18 02:42:36 2006
From: luke.arno at gmail.com (Luke Arno)
Date: Tue, 17 Oct 2006 20:42:36 -0400
Subject: [Web-SIG] WSGI Components Mailing List
Message-ID: <d79e89ce0610171742w7f1906ffub7fe395eede1eabb@mail.gmail.com>

I set up a mailing list for WSGI component users
and developers. I have had a few emails asking
questions and looking for help. I thought it would
be good to have a list with that scope.

Homepage:

http://groups.google.com/group/wsgi-components  	

Group email:

wsgi-components at googlegroups.com 	

Description:

WSGI is transforming Python Web development. It
is now easy to snap together best-of-breed
components to build applications or even roll your
own frameworks (or "application profiles"). This list
is for users and developers of WSGI components.

Cheers,
- Luke

From exarkun at divmod.com  Wed Oct 18 02:50:13 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Tue, 17 Oct 2006 20:50:13 -0400
Subject: [Web-SIG] WSGI -- usable for other protocols?
In-Reply-To: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com>
Message-ID: <20061018005013.26151.2100991412.divmod.quotient.5363@ohm>

On Tue, 17 Oct 2006 15:48:12 PDT, Bill Janssen <janssen at parc.com> wrote:
>I've been working on Python IMAP server that uses PyLucene for
>indexing.  It's mainly IMAP, but also speaks a bit of HTTP for an
>administrative interface.  Does it make any sense to wrap it with
>WSGI?  That is, does WSGI make sense for other protocols than HTTP
>(specifically IMAP)?
>
>And, what WSGI-supporting environments will also support PyLucene (the
>limiting factor is that the GCJ runtime has to be linked in, and all
>threads must be GCJ threads).

Not really a response to your question, but might I suggest you contribute
to a project which sounds roughly equivalent to the one you're describing?

http://divmod.org/trac/wiki/DivmodQuotient

Jean-Paul


From titus at caltech.edu  Wed Oct 18 02:46:35 2006
From: titus at caltech.edu (Titus Brown)
Date: Tue, 17 Oct 2006 17:46:35 -0700
Subject: [Web-SIG] WSGI Components Mailing List
In-Reply-To: <d79e89ce0610171742w7f1906ffub7fe395eede1eabb@mail.gmail.com>
References: <d79e89ce0610171742w7f1906ffub7fe395eede1eabb@mail.gmail.com>
Message-ID: <20061018004635.GI30517@caltech.edu>

On Tue, Oct 17, 2006 at 08:42:36PM -0400, Luke Arno wrote:
-> I set up a mailing list for WSGI component users
-> and developers. I have had a few emails asking
-> questions and looking for help. I thought it would
-> be good to have a list with that scope.

What's wrong with keeping WSGI discussions on the web-sig list?  Is it
off-topic?

--titus

From janssen at parc.com  Wed Oct 18 03:19:10 2006
From: janssen at parc.com (Bill Janssen)
Date: Tue, 17 Oct 2006 18:19:10 PDT
Subject: [Web-SIG] WSGI -- usable for other protocols?
In-Reply-To: Your message of "Tue, 17 Oct 2006 17:50:13 PDT."
	<20061018005013.26151.2100991412.divmod.quotient.5363@ohm> 
Message-ID: <06Oct17.181917pdt."58648"@synergy1.parc.xerox.com>

> might I suggest you contribute
> to a project which sounds roughly equivalent to the one you're describing?
> 
> http://divmod.org/trac/wiki/DivmodQuotient

Just for fun, I grepped the sources for IMAP.  No hits.

Seems like I'd spend more time understanding the framework system
you're using than it would take me to write it from scratch.  An IMAP
server isn't hard.  And I don't think the project is all that equivalent.

Does Twisted support the use of PyLucene?

I basically want an IMAP server that supports the MH mail storage
format, uses Lucene for indexing and search, and has the ability to do
auto-filtering on a per-user basis with either MH procmail scripts or
a Python script that uses a particular API.  I don't need an SMTP
server, I don't need a Web interface to mail.

If DivmodQuotient is anywhere close to that, I'll take a longer look.

Bill

From exarkun at divmod.com  Wed Oct 18 03:39:12 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Tue, 17 Oct 2006 21:39:12 -0400
Subject: [Web-SIG] WSGI -- usable for other protocols?
In-Reply-To: <06Oct17.181917pdt."58648"@synergy1.parc.xerox.com>
Message-ID: <20061018013912.26151.1203616830.divmod.quotient.5405@ohm>

On Tue, 17 Oct 2006 18:19:10 PDT, Bill Janssen <janssen at parc.com> wrote:
>> might I suggest you contribute
>> to a project which sounds roughly equivalent to the one you're describing?
>>
>> http://divmod.org/trac/wiki/DivmodQuotient
>
>Just for fun, I grepped the sources for IMAP.  No hits.

Quite so.  We're currently most of the way through an (unfortunate) rewrite
to fix some database-related problems.  IMAP4 hasn't been high on the
port-list, so there's no IMAP4 code in the new codebase yet.  However,
Twisted's IMAP4 protocol implementation was developed for this project,
and IMAP4 is on our mind as we implement things, so adding it isn't going
to be obstructed by anything in Quotient (I would say "easy" but nothing
related to IMAP4 is easy).

>
>Seems like I'd spend more time understanding the framework system
>you're using than it would take me to write it from scratch.

Ahhh, I doubt it.  This isn't to say you wouldn't spend a while
understanding the framework, but writing it from scratch would take
longer.

>An IMAP server isn't hard.

Having spent fair chunks of the last several years implementing various
IMAP4 servers, I must disagree. :)  Unless you're happy with a
semi-protocol spec, semi-broken server that doesn't scale to a decent
number of messages, it's quite a haul.

>And I don't think the project is all that equivalent.

Strictly speaking, an IMAP4 server will be a subset of Quotient, and IMAP4
is by no means the main focus of Quotient, so maybe equivalent wasn't the
right word.

>
>Does Twisted support the use of PyLucene?

Quotient's using PyLucene for fulltext indexing already.  So... yes :)

>
>I basically want an IMAP server that supports the MH mail storage
>format, uses Lucene for indexing and search, and has the ability to do
>auto-filtering on a per-user basis with either MH procmail scripts or
>a Python script that uses a particular API.  I don't need an SMTP
>server, I don't need a Web interface to mail.

It's possible you'd be happier basing the IMAP4 server on Twisted's
protocol support, rather than starting from Quotient (although _I'd_ be
happier if you added IMAP4 support to Quotient ;).

Quotient uses a SQLite database for storage of structured data about
messages and a filesystem structure (currently not a great structure,
but it's fixable) for actual message files.

It supports per-user filtering rules (although not procmail based - and
the work done in this area so far is extremely minimal, basically it can
do substring matching on headers - expanding this would be pretty simple
though, Quotient is designed for this kind of thing).

The SMTP server can be turned off completely, although then you need
another mechanism for adding messages to the system (inotify + directory
would work, but you'll have to write that part).  For users, the web
interface is optional too, but various admin tasks may continue to require
some web interaction.

>
>If DivmodQuotient is anywhere close to that, I'll take a longer look.

It sounds like you might be happier starting from Twisted's IMAP4 code
rather than doing the work in Quotient, unless your requirements are
somewhat more flexible than I have gotten the impression that they are.

I _certainly_ would not recommend doing the protocol implementation from
scratch.  Using Twisted there at least is a complete win.

As for PyLucene in that scenario, there's no _direct_ support for gcj
threads in Twisted.  All of my work with PyLucene in Quotient has been
in the main thread in a child process of the main process (desirable
to avoid segfaulting the main server, mainly).

Note (in case it isn't obvious yet) I'm a developer on both Quotient and
Twisted, and I wrote pretty much all of Twisted's IMAP4 code.  It's possible
I'm slightly biased.  That said, lots of people have told me Twisted's IMAP4
implementation is the best they've worked with, that it saved their thesis,
project, company, life, etc. ;)

Jean-Paul

From janssen at parc.com  Wed Oct 18 05:00:35 2006
From: janssen at parc.com (Bill Janssen)
Date: Tue, 17 Oct 2006 20:00:35 PDT
Subject: [Web-SIG] WSGI -- usable for other protocols?
In-Reply-To: Your message of "Tue, 17 Oct 2006 18:39:12 PDT."
	<20061018013912.26151.1203616830.divmod.quotient.5405@ohm> 
Message-ID: <06Oct17.200037pdt."58648"@synergy1.parc.xerox.com>

Well, I'll definitely check out Twisted's IMAP4 code.  Thanks!

> Quotient uses a SQLite database for storage of structured data about
> messages and a filesystem structure (currently not a great structure,
> but it's fixable) for actual message files.

I was sort of planning on keeping all the message metadata in the Lucene DB.

MH uses a filesystem structure too.  Maybe there's hope.

> It supports per-user filtering rules (although not procmail based - and
> the work done in this area so far is extremely minimal, basically it can
> do substring matching on headers - expanding this would be pretty simple
> though, Quotient is designed for this kind of thing).

This doesn't sound too far from what I intended, actually.

I'd like to keep the Lucene index in memory, and don't particularly
want the overhead of process swaps, so I'd like to be able to use them
together in a single address space.  It sounds like you've worked out
most of the issues with IMAP4, so I'll take a closer look.

Bill

From luke.arno at gmail.com  Wed Oct 18 05:05:21 2006
From: luke.arno at gmail.com (Luke Arno)
Date: Tue, 17 Oct 2006 23:05:21 -0400
Subject: [Web-SIG] WSGI Components Mailing List
In-Reply-To: <20061018004635.GI30517@caltech.edu>
References: <d79e89ce0610171742w7f1906ffub7fe395eede1eabb@mail.gmail.com>
	<20061018004635.GI30517@caltech.edu>
Message-ID: <d79e89ce0610172005i7be9e130xfb97d968338afaee@mail.gmail.com>

On 10/17/06, Titus Brown <titus at caltech.edu> wrote:
> On Tue, Oct 17, 2006 at 08:42:36PM -0400, Luke Arno wrote:
> -> I set up a mailing list for WSGI component users
> -> and developers. I have had a few emails asking
> -> questions and looking for help. I thought it would
> -> be good to have a list with that scope.
>
> What's wrong with keeping WSGI discussions on the web-sig list?  Is it
> off-topic?
>

I am not talking about higher level conversations
regarding WSGI.

The various frameworks have communities where
users can go for help and developers can coordinate
their specific efforts. Maybe this list is the place for it,
but I have a feeling that if I start giving support to
users of various components, it would be a little too
noisy. What do you think?

I am happy to direct these conversations to
wherever folks want. Is this the place, after all?

Thanks.

Cheers,
- Luke

From jim at zope.com  Wed Oct 18 12:55:23 2006
From: jim at zope.com (Jim Fulton)
Date: Wed, 18 Oct 2006 06:55:23 -0400
Subject: [Web-SIG] WSGI Components Mailing List
In-Reply-To: <20061018004635.GI30517@caltech.edu>
References: <d79e89ce0610171742w7f1906ffub7fe395eede1eabb@mail.gmail.com>
	<20061018004635.GI30517@caltech.edu>
Message-ID: <4536081B.80803@zope.com>

Titus Brown wrote:
> On Tue, Oct 17, 2006 at 08:42:36PM -0400, Luke Arno wrote:
> -> I set up a mailing list for WSGI component users
> -> and developers. I have had a few emails asking
> -> questions and looking for help. I thought it would
> -> be good to have a list with that scope.
> 
> What's wrong with keeping WSGI discussions on the web-sig list?  Is it
> off-topic?

I don't think so.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From janssen at parc.com  Wed Oct 18 17:04:08 2006
From: janssen at parc.com (Bill Janssen)
Date: Wed, 18 Oct 2006 08:04:08 PDT
Subject: [Web-SIG] WSGI Components Mailing List
In-Reply-To: Your message of "Tue, 17 Oct 2006 20:05:21 PDT."
	<d79e89ce0610172005i7be9e130xfb97d968338afaee@mail.gmail.com> 
Message-ID: <06Oct18.080417pdt."58648"@synergy1.parc.xerox.com>

> I am happy to direct these conversations to
> wherever folks want. Is this the place, after all?

You bet!  Let's keep things here, till folks complain.

Bill

From sh at defuze.org  Wed Oct 18 22:16:01 2006
From: sh at defuze.org (Sylvain Hellegouarch)
Date: Wed, 18 Oct 2006 21:16:01 +0100
Subject: [Web-SIG] wsgiref bug with HEAD request
Message-ID: <45368B81.8090308@defuze.org>

All,

It seems the default server from wsgiref (from wsgiref.simple_server
import make_server) seems not to respect Content-Length in case of HEAD
request.

Since no body can be returned in a response to a HEAD request, the
content length is set to 0 by the server. In that case Content-Length is
therefore set by the application or a middleware.

wsgiref server disregard the existing value and sets to 0 either way.

Seems bogus to me or am I missing something here?

- Sylvain

From ianb at colorstudy.com  Sat Oct 21 19:49:06 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 21 Oct 2006 12:49:06 -0500
Subject: [Web-SIG] Proposal: wsgi.url_vars
Message-ID: <453A5D92.4000603@colorstudy.com>

I think there's room for some more standards building on WSGI (that 
aren't actually extensions of the WSGI spec itself).

I put a page up on the wsgi.org site for this: 
http://wsgi.org/wsgi/Specifications

And I'm introducing what I think is low-hanging fruit in the 
specification realm: wsgi.url_vars 
http://wsgi.org/wsgi/Specifications/url_vars

The spec is copied below for discussion:


:Title: wsgi.url_vars
:Author: Ian Bicking <ianb at colorstudy.com>
:Discussions-To: Python Web-SIG <web-sig at python.org>
:Status: Draft
:Created: 21-Oct-2006

.. contents::

Abstract
--------

This proposes a new standard environment key 
``environ['wsgi.url_vars']`` to represent the results of more 
complicated URL parsing strategies.

Rationale
---------

WSGI currently specifies the meaning of ``SCRIPT_NAME`` and 
``PATH_INFO``, which allows generic prefix-based dispatchers to be 
created.  These dispatchers can work with any WSGI application that 
respects the meaning of these two variables.  The basic meaning of 
``SCRIPT_NAME`` is *the portion of the path that has been consumed* and 
``PATH_INFO`` is *the portion of the path left to the application*.

Using these two variables more complex dispatchers cannot represent the 
information they pull out of the request path.  This specification 
simply defines a place where such dispatchers can put their information: 
``wsgi.url_vars``.

Specification
-------------

This specification defines a new key that can go in the WSGI 
environment, ``wsgi.url_vars``.  This key is optional.

If a dispatcher (like `routes <http://routes.groovie.org/>`_ or 
`selector <http://lukearno.com/projects/selector/>`_) pulls named 
information out of the portion of the request path it parses, it can put 
that information into a dictionary in ``environ['wsgi.url_vars']``.

Portions of the path that have been parsed should still be moved to 
``SCRIPT_NAME`` (and removed from ``PATH_INFO``).

Example
-------

This example is a dispatcher that is given regular expressions and 
matching applications.  It checks each regular expression in turn, and 
when one matches it moves the named groups into ``wsgi.url_vars`` and 
dispatches to the associated application.

::

     class RegexDispatch(object):

         def __init__(self, patterns):
             self.patterns = patters

         def __call__(self, environ, start_response):
             script_name = environ.get('SCRIPT_NAME', '')
             path_info = environ.get('PATH_INFO', '')
             for regex, application in self.patterns:
                 match = regex.match(path_info)
                 if not match:
                     continue
                 extra_path_info = path_info[match.end():]
                 if extra_path_info and not extra_path_info.startswith('/'):
                     # Not a very good match
                     continue
                 groups = match.groupdict()
                 environ.setdefault('wsgi.url_vars', {}).update(groups)
                 environ['SCRIPT_NAME'] = script_name + 
path_info[:match.end()]
                 environ['PATH_INFO'] = extra_path_info
                 return application(environ, start_response)
             return self.not_found(environ, start_response)

         def not_found(self, environ, start_response):
             start_response('404 Not Found', [('Content-type', 
'text/plain')])
             return ['Not found']

     dispatch_app = RegexDispatch([
         (re.compile(r'/archive/(?P<year>\d{4})/$'), archive_app),
         (re.compile(r'/archive/(?P<year>\d{4})/(?P<month>\d{2})/$'),
          archive_app),
 
(re.compile(r'/archive/(?P<year>\d{4})/(?P<month>\d{2})/(?P<article_id>\d+)$'),
          view_article),
         ])


From p.f.moore at gmail.com  Sat Oct 21 21:39:36 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 21 Oct 2006 20:39:36 +0100
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <453A5D92.4000603@colorstudy.com>
References: <453A5D92.4000603@colorstudy.com>
Message-ID: <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com>

On 10/21/06, Ian Bicking <ianb at colorstudy.com> wrote:
> Using these two variables more complex dispatchers cannot represent the
> information they pull out of the request path.  This specification
> simply defines a place where such dispatchers can put their information:
> ``wsgi.url_vars``.

But what is the point? If the receiving application uses the url_vars
information, it's tied to the particular dispatcher - so why does this
need to be a standard key, rather than just a private convention? If
the receiving application wants to remain compatible with generic
dispatchers, how can it make use of url_vars?

Or, to put it another way, can you provide a realistic example of a
*consumer* of the information?

Paul.

From ianb at colorstudy.com  Sat Oct 21 21:46:26 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 21 Oct 2006 14:46:26 -0500
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com>
References: <453A5D92.4000603@colorstudy.com>
	<79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com>
Message-ID: <453A7912.4080608@colorstudy.com>

Paul Moore wrote:
> On 10/21/06, Ian Bicking <ianb at colorstudy.com> wrote:
>> Using these two variables more complex dispatchers cannot represent the
>> information they pull out of the request path.  This specification
>> simply defines a place where such dispatchers can put their information:
>> ``wsgi.url_vars``.
> 
> But what is the point? If the receiving application uses the url_vars
> information, it's tied to the particular dispatcher - so why does this
> need to be a standard key, rather than just a private convention? If
> the receiving application wants to remain compatible with generic
> dispatchers, how can it make use of url_vars?

Just like POST and QUERY_STRING variables, the meaning and content of 
the variables is unspecified.  But it's useful that frameworks have a 
common way to parse and pass around the parsed information from those 
data sources.

An application that uses url_vars is tied to *some* dispatcher that puts 
stuff into that location (though of course the application could also 
fall back to QUERY_STRING parsing or whatever).  It's not tied to any 
particular dispatcher.  Already there's two dispatchers (selector and 
routes) that put the same kind of information into environment keys 
(just in two separate locations).

> Or, to put it another way, can you provide a realistic example of a
> *consumer* of the information?

Sure: http://bitworking.org/news/wsgicollection

It takes arguments in 'selector.vars', but could take arguments from any 
dispatcher.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From p.f.moore at gmail.com  Sat Oct 21 22:06:39 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 21 Oct 2006 21:06:39 +0100
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <453A7912.4080608@colorstudy.com>
References: <453A5D92.4000603@colorstudy.com>
	<79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com>
	<453A7912.4080608@colorstudy.com>
Message-ID: <79990c6b0610211306p16ee53du21487d9985134138@mail.gmail.com>

On 10/21/06, Ian Bicking <ianb at colorstudy.com> wrote:
> Just like POST and QUERY_STRING variables, the meaning and content of
> the variables is unspecified.  But it's useful that frameworks have a
> common way to parse and pass around the parsed information from those
> data sources.
[...]
> > Or, to put it another way, can you provide a realistic example of a
> > *consumer* of the information?
>
> Sure: http://bitworking.org/news/wsgicollection
>
> It takes arguments in 'selector.vars', but could take arguments from any
> dispatcher.

Ah, I see now. Yes, that sounds like a good proposal (in the abstract
- it's not something I have a need for myself).

Paul.

From luke.arno at gmail.com  Sat Oct 21 22:10:22 2006
From: luke.arno at gmail.com (Luke Arno)
Date: Sat, 21 Oct 2006 16:10:22 -0400
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <453A7912.4080608@colorstudy.com>
References: <453A5D92.4000603@colorstudy.com>
	<79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com>
	<453A7912.4080608@colorstudy.com>
Message-ID: <d79e89ce0610211310o53ebb7abi443f8211699ce65e@mail.gmail.com>

It seems like a good idea to me. I dislike dependencies.

I have been working this way for a while and have been
wondering about the same thing. Being able to use
WSGI to wire up the components of an application (or
framework or "application profile") enables more choice
and flexibility. Relieving a dependency between a specific
dispatcher and that which is dispatched to further serves
the same ends.

- Luke

From ianb at colorstudy.com  Sat Oct 21 22:37:41 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 21 Oct 2006 15:37:41 -0500
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <453A7912.4080608@colorstudy.com>
References: <453A5D92.4000603@colorstudy.com>	<79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com>
	<453A7912.4080608@colorstudy.com>
Message-ID: <453A8515.3050806@colorstudy.com>

Ian Bicking wrote:
>> Or, to put it another way, can you provide a realistic example of a
>> *consumer* of the information?
> 
> Sure: http://bitworking.org/news/wsgicollection
> 
> It takes arguments in 'selector.vars', but could take arguments from any 
> dispatcher.

Another consumer came to mind: 
http://pythonpaste.org/class-paste.wsgiwrappers.WSGIRequest.html -- a 
generic wrapper around the WSGI environment, which could provide an 
attribute that would access this particular variable.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From joe at bitworking.org  Sat Oct 21 23:02:57 2006
From: joe at bitworking.org (Joe Gregorio)
Date: Sat, 21 Oct 2006 17:02:57 -0400
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <453A5D92.4000603@colorstudy.com>
References: <453A5D92.4000603@colorstudy.com>
Message-ID: <3f1451f50610211402m15f6ef2cw7a70396f1052a3a0@mail.gmail.com>

On 10/21/06, Ian Bicking <ianb at colorstudy.com> wrote:
> I think there's room for some more standards building on WSGI (that
> aren't actually extensions of the WSGI spec itself).
>
> I put a page up on the wsgi.org site for this:
> http://wsgi.org/wsgi/Specifications
>
> And I'm introducing what I think is low-hanging fruit in the
> specification realm: wsgi.url_vars
> http://wsgi.org/wsgi/Specifications/url_vars
>
> The spec is copied below for discussion:

+1

I like this, it will make middleware like wsgicollection possible
without tightly binding them to the middleware you use
to parse the URI.

   -joe

-- 
Joe Gregorio        http://bitworking.org

From ianb at colorstudy.com  Sat Oct 21 23:04:39 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 21 Oct 2006 16:04:39 -0500
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
Message-ID: <453A8B67.4070409@colorstudy.com>

I've added another spec to wsgi.org: 
http://wsgi.org/wsgi/Specifications/handling_post_forms

This one is a little more intrusive than wsgi.url_vars, but it addresses 
an outstanding source of problems: contention over wsgi.input.

Text copied:


:Title: Handling POST forms in WSGI
:Author: Ian Bicking <ianb at colorstudy.com>
:Discussions-To: Python Web-SIG <web-sig at python.org>
:Status: Draft
:Created: 21-Oct-2006

.. contents::

Abstract
--------

This suggests a way that WSGI middleware, applications, and frameworks 
can access POST form bodies so that there is less contention for the 
``wsgi.input`` stream.

Rationale
---------

Currently ``environ['wsgi.input']`` points to a stream that represents 
the body of the HTTP request.  Once this stream has been read, it cannot 
necessarily be read again.  It may not have a ``seek`` method (none is 
required by the WSGI specification, and frequently none is provided by 
WSGI servers).

As a result any piece of a system that looks at the request body 
essentially takes ownership of that body, and no one else is able to 
access it.  This is particularly problematic for POST form requests, as 
many framework pieces expect to have access to this.

Specification
-------------

This applies when certain requirements of the WSGI environment are met::

     def is_post_request(environ):
         if environ['REQUEST_METHOD'].upper() != 'POST':
             return False
         content_type = environ.get('CONTENT_TYPE',
             'application/x-www-form-urlencoded')
         return (
           content_type.startswith('application/x-www-form-urlencoded'
           or content_type.startswith('multipart/form-data'))

That is, it must be a POST request, and it must be a form request 
(generally ``application/x-www-form-urlencoded`` or when there are file 
uploads ``multipart/form-data``).

When this happens, the form can be parsed by ``cgi.FieldStorage``.  The 
results of this parsing should be put in ``environ['wsgi.post_form']`` 
in a particular fashion::

     def get_post_form(environ):
         assert is_post_request(environ)
         input = environ['wsgi.input']
         post_form = environ.get('wsgi.post_form')
         if (post_form is not None
             and post_form[0] is input):
             return post_form[2]
         fs = cgi.FieldStorage(fp=input,
                               environ=environ,
                               keep_blank_values=1)
         new_input = InputProcessed('')
         post_form = (new_input, input, fs)
         environ['wsgi.post_form'] = post_form
         environ['wsgi.input'] = new_input
         return fs

     class InputProcessed(object):
         def read(self, *args):
             raise EOFError(
                 'The wsgi.input stream has already been consumed')
         readline = readlines = __iter__ = read

This way multiple consumers can parse a POST form, accessing the form 
data in any order (later consumers will get the already-parsed data). 
The replacement ``wsgi.input`` guards against non-conforming access to 
the data, while the value in ``wsgi.post_form`` allows for access to the 
original ``wsgi.input`` in case it may be useful.

By checking for the replacement ``wsgi.input`` when checking if 
``wsgi.post_forms`` applies, this does not get in the way of WSGI 
middleware that may replace that key.  If the key is replaced, then the 
parsed data is implicitly invalidated.

Query String data
-----------------

Note that nothing in this specification touches or applies to the query 
string (in ``environ['QUERY_STRING']``).  This is not parsed as part of 
the process, and nothing in this specification applies to GET requests, 
or to the query string which may be present in a POST request.

Open Issues
-----------

1. Is cgi.FieldStorage the best way to store the parsed data?  It's the 
most common way, at least.

2. This doesn't address non-form-submission POST requests.  Most of the 
same issues apply to such requests, except that frameworks tend not to 
touch the request body in that case.  The body may be large, so the 
actual contents of the request body shouldn't go in the environment. 
Perhaps they could go in a temporary file, but this too might be an 
unnecessary indirection in many cases.  Also other kinds of request 
(like PUT) that have a request body are not covered, for largely the 
same reason.  In both these cases, it is much easier to construct a new 
``wsgi.input`` that accesses whatever your internal representation of 
the request body is.

3. Is the tuple of information necessary in ``wsgi.post_form``, or could 
it just be the ``FieldStorage`` instance?

4. Should ``wsgi.input`` be replaced by ``InputProcessed``, or just left 
as is?

From wilk-ml at flibuste.net  Sun Oct 22 09:46:47 2006
From: wilk-ml at flibuste.net (William Dode)
Date: Sun, 22 Oct 2006 07:46:47 +0000 (UTC)
Subject: [Web-SIG] Proposal: wsgi.url_vars
References: <453A5D92.4000603@colorstudy.com>
Message-ID: <ehf7l6$h3b$1@sea.gmane.org>

On 21-10-2006, Ian Bicking wrote:
> I think there's room for some more standards building on WSGI (that 
> aren't actually extensions of the WSGI spec itself).
>
> I put a page up on the wsgi.org site for this: 
> http://wsgi.org/wsgi/Specifications
>
> And I'm introducing what I think is low-hanging fruit in the 
> specification realm: wsgi.url_vars 
> http://wsgi.org/wsgi/Specifications/url_vars
>
> The spec is copied below for discussion:

+1 for this kind of specs to make applications more independant of 
frameworks pieces.

I hope you or somes others wsgi guru will also make somes proposals for 
session and cookies...

-- 
William Dod? - http://flibuste.net


From pje at telecommunity.com  Sun Oct 22 13:40:07 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 22 Oct 2006 04:40:07 -0700
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <453A8B67.4070409@colorstudy.com>
References: <453A8B67.4070409@colorstudy.com>
Message-ID: <7.0.1.0.0.20061022042434.020fa938@telecommunity.com>

At 02:04 PM 10/21/2006, Ian Bicking wrote:
>I've added another spec to wsgi.org:
>http://wsgi.org/wsgi/Specifications/handling_post_forms
>
>This one is a little more intrusive than wsgi.url_vars, but it addresses
>an outstanding source of problems: contention over wsgi.input.

-1 on this being middleware.  If middleware wants to read the input, 
it should copy it to a temporary file or StringIO, not remove it.

The broader principle here is that WSGI extensions should *add* to 
the WSGI specification, not subtract from it.  Code running under 
middleware that does as you have proposed will be unable to use its 
own form processing or support nested applications.  It's therefore 
not composable or further extensible, and I therefore have a hard 
time viewing the proposed middleware as being WSGI compliant.

This is an extremely good example of something that belongs in a 
*library* and should not be done in middleware.  Only end-application 
code that knows no further dispatching will occur is in a position to 
do destructive reading from wsgi.input.  Middleware should be 
non-destructive, and should NOT be used where a library will suffice, 
since they add setup complexity and runtime performance overhead.

The simple, standard way to do something like this would be to have a 
library routine like 'get_form_vars(environ)'.  The routine would 
check for the form vars key, and if not present, then it would 
process the input and cache the information in the environment.  It 
could even have an option to clone the input, in case the routine is 
being used from middleware.

In general, where adding functionality doesn't require that the 
request or response be modified (as opposed to information simply 
being added to the environ), it should be done using library routines 
like this.  There is no middleware setup or call-through overhead, 
and the calculation of additional environ entries only takes place if 
the information is actually used.  There is also no need to use 
string constants as environ keys except in the routines 
themselves.  This approach should be considered a best practice for 
*any* additions to the environ.


From cmlenz at gmx.de  Sun Oct 22 14:28:24 2006
From: cmlenz at gmx.de (Christopher Lenz)
Date: Sun, 22 Oct 2006 14:28:24 +0200
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <453A5D92.4000603@colorstudy.com>
References: <453A5D92.4000603@colorstudy.com>
Message-ID: <B14835C0-0931-441B-88A4-88F83F132BDE@gmx.de>

Am 21.10.2006 um 19:49 schrieb Ian Bicking:
> If a dispatcher (like `routes <http://routes.groovie.org/>`_ or
> `selector <http://lukearno.com/projects/selector/>`_) pulls named
> information out of the portion of the request path it parses, it  
> can put
> that information into a dictionary in ``environ['wsgi.url_vars']``.

While I think this is a great idea in general, I don't like that this  
is limited to "named information".

In the kind of dispatching I normally use, there's only one or maybe  
two parts of the URL that I want to receive as parameters. I like  
saving the overhead of making those named groups in regexes, and  
instead just use unnamed groups as positional arguments.

So not supporting both positional and named arguments limits the  
usefulness of this specification IMHO. How about making  
'wsgi.url_vars' a tuple of the form "(args, kwargs)" (the first a  
list or tuple, the second a dict)?

Cheers,
Chris
--
Christopher Lenz
   cmlenz at gmx.de
   http://www.cmlenz.net/


From ianb at colorstudy.com  Sun Oct 22 20:05:56 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 22 Oct 2006 13:05:56 -0500
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <7.0.1.0.0.20061022042434.020fa938@telecommunity.com>
References: <453A8B67.4070409@colorstudy.com>
	<7.0.1.0.0.20061022042434.020fa938@telecommunity.com>
Message-ID: <453BB304.1030108@colorstudy.com>

Phillip J. Eby wrote:
> At 02:04 PM 10/21/2006, Ian Bicking wrote:
>> I've added another spec to wsgi.org:
>> http://wsgi.org/wsgi/Specifications/handling_post_forms
>>
>> This one is a little more intrusive than wsgi.url_vars, but it addresses
>> an outstanding source of problems: contention over wsgi.input.
> 
> -1 on this being middleware.  If middleware wants to read the input, it 
> should copy it to a temporary file or StringIO, not remove it.

This isn't middleware, it's a suggestion of a library routine for 
reading POST form submissions.  If multiple consumers use this same 
routine (or generally, the algorithm described) then they won't conflict.

Copying to a StringIO or tempfile is possible, though it introduces a 
couple layers of indirection where it is likely none is needed. 
Potentially wsgi.input could be replaced with something that lazily 
serializes the parsed form back into an unparsed form; perhaps coupled 
with a monkeypatch on cgi that detects this case and also provides a 
shortcut.

> The broader principle here is that WSGI extensions should *add* to the 
> WSGI specification, not subtract from it.  Code running under middleware 
> that does as you have proposed will be unable to use its own form 
> processing or support nested applications.  It's therefore not 
> composable or further extensible, and I therefore have a hard time 
> viewing the proposed middleware as being WSGI compliant.

The status quo is that middleware or framework code that accesses POST 
vars are incompatible with any other middleware, framework code, or 
applications that also want to access POST vars.

This does not subtract from WSGI, it enables a pattern that is currently 
problematic.  It really is problematic, in that I've encountered this 
problem (contention over wsgi.input), and sometimes when I would like to 
access the POST vars in middleware I am currently unable to because it 
causes too many problems with code that comes later in the stack, or I 
am unable to because wsgi.input has already been consumed.

> This is an extremely good example of something that belongs in a 
> *library* and should not be done in middleware.  Only end-application 
> code that knows no further dispatching will occur is in a position to do 
> destructive reading from wsgi.input.  Middleware should be 
> non-destructive, and should NOT be used where a library will suffice, 
> since they add setup complexity and runtime performance overhead.

End application code knows no further dispatching will occur, but 
framework code does not know this.  Typically it is a framework that 
parses the POST vars, not an application.

> The simple, standard way to do something like this would be to have a 
> library routine like 'get_form_vars(environ)'.  The routine would check 
> for the form vars key, and if not present, then it would process the 
> input and cache the information in the environment.  It could even have 
> an option to clone the input, in case the routine is being used from 
> middleware.

This is what paste.request.parse_formvars does -- I'm suggesting this 
standard so that all consumers, not just people using Paste, can be 
compatible with each other.

> In general, where adding functionality doesn't require that the request 
> or response be modified (as opposed to information simply being added to 
> the environ), it should be done using library routines like this.  There 
> is no middleware setup or call-through overhead, and the calculation of 
> additional environ entries only takes place if the information is 
> actually used.  There is also no need to use string constants as environ 
> keys except in the routines themselves.  This approach should be 
> considered a best practice for *any* additions to the environ.

Reading from wsgi.input effectively does modify the request.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From ianb at colorstudy.com  Sun Oct 22 20:07:17 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 22 Oct 2006 13:07:17 -0500
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <ehf7l6$h3b$1@sea.gmane.org>
References: <453A5D92.4000603@colorstudy.com> <ehf7l6$h3b$1@sea.gmane.org>
Message-ID: <453BB355.3010909@colorstudy.com>

William Dode wrote:
> I hope you or somes others wsgi guru will also make somes proposals for 
> session and cookies...

Ben Bangert made a proposal some time ago about session IDs.  Maybe he'd 
like to resurrect that?  I thought it was a good proposal.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From ianb at colorstudy.com  Sun Oct 22 20:17:17 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 22 Oct 2006 13:17:17 -0500
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <B14835C0-0931-441B-88A4-88F83F132BDE@gmx.de>
References: <453A5D92.4000603@colorstudy.com>
	<B14835C0-0931-441B-88A4-88F83F132BDE@gmx.de>
Message-ID: <453BB5AD.1030300@colorstudy.com>

Christopher Lenz wrote:
> Am 21.10.2006 um 19:49 schrieb Ian Bicking:
>> If a dispatcher (like `routes <http://routes.groovie.org/>`_ or
>> `selector <http://lukearno.com/projects/selector/>`_) pulls named
>> information out of the portion of the request path it parses, it  
>> can put
>> that information into a dictionary in ``environ['wsgi.url_vars']``.
> 
> While I think this is a great idea in general, I don't like that this  
> is limited to "named information".
 >
> In the kind of dispatching I normally use, there's only one or maybe  
> two parts of the URL that I want to receive as parameters. I like  
> saving the overhead of making those named groups in regexes, and  
> instead just use unnamed groups as positional arguments.
> 
> So not supporting both positional and named arguments limits the  
> usefulness of this specification IMHO. How about making  
> 'wsgi.url_vars' a tuple of the form "(args, kwargs)" (the first a  
> list or tuple, the second a dict)?

Hmm... so, a few things occur to me:

1. The dictionary could have integer keys like {1: arg1, 2: arg2}.  This 
is hard to unpack.  Eh, not a good idea I guess.

2. We use (args, kwargs).  Frameworks can probably handle this just 
fine.  Quite a few systems can't produce positional arguments, but that 
probably doesn't matter -- at least the end result is something like a 
Python function call, then there's usually a named equivalent to 
positional arguments.  When it is exposed directly to the application it 
seems a little more awkward.  For instance, I was thinking I'd add a 
req.url_vars to the request object that's just a proxy to 
environ['wsgi.url_vars'].  But having that return a tuple isn't very 
convenient.  I guess it could be req.url_vars and req.url_args, or 
something like that.  It adds to the complexity some.

3. Anything can go under *some* name, so you could do {'args': 
(positional args)}.  You'd still have to do a bit of unpacking if you 
had both positional and keyword arguments, but it would be fairly 
simple.  We could come up with a convention for this that we document in 
the spec.


I guess I didn't mention it in the spec, but I assumed that the 
dictionary would have only string keys (though I don't know if it 
matters), but the values could be of any type.  E.g., 
/archive/2005/10/01 could create {'date': date(2005, 10, 1)}.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From joe at bitworking.org  Mon Oct 23 21:34:07 2006
From: joe at bitworking.org (Joe Gregorio)
Date: Mon, 23 Oct 2006 15:34:07 -0400
Subject: [Web-SIG] Proposal: wsgi.url_vars
In-Reply-To: <453BB5AD.1030300@colorstudy.com>
References: <453A5D92.4000603@colorstudy.com>
	<B14835C0-0931-441B-88A4-88F83F132BDE@gmx.de>
	<453BB5AD.1030300@colorstudy.com>
Message-ID: <3f1451f50610231234vc315989t72c93e840ab72258@mail.gmail.com>

On 10/22/06, Ian Bicking <ianb at colorstudy.com> wrote:
> Hmm... so, a few things occur to me:
>
> 1. The dictionary could have integer keys like {1: arg1, 2: arg2}.  This
> is hard to unpack.  Eh, not a good idea I guess.

Why not {"1":arg1, "2":arg2, } if all the arguments are positional?

I think supporting both positional and keyword arguments mixed
in the same request is a corner case not worth covering.

    -joe

-- 
Joe Gregorio        http://bitworking.org

From ianb at colorstudy.com  Tue Oct 24 00:39:42 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 23 Oct 2006 17:39:42 -0500
Subject: [Web-SIG] wsgi.url_vars feedback
In-Reply-To: <B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
References: <B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
Message-ID: <453D44AE.3030604@colorstudy.com>

Simon Willison wrote:
> I've spotted a potential problem with your wsgi.url_vars specification 
> suggestion.
> 
> http://wsgi.org/wsgi/Specifications/url_vars
> 
> The spec calls for wsgi.url_vars to refer to a dictionary. In Django, we 
> originally required named captures in regular expressions - but 
> eventually realised that for many cases just having positional captures 
> was less work for developers and worked just as well. Here's some code I 
> wrote today:
> 
>     (r'^archive/(\d+)/(\d+)/(\d+)/(\w+)/$', blog.entry),
> 
> def entry(request, year, month, day, slug):
>     # ...
> 
> This form of URL variable extraction does not appear to be covered by 
> your wsgi.url_vars spec. One solution could be to extend the spec to 
> suggest using integer keys to represent this case?
> 
> environ['wsgi.url_vars'] = { 1: '2006', 2: '06', 3: '12', 4: 'slug' }

Christopher Lenz also brought this up.  My inclination is something like:

environ['wsgi.url_vars'] = {'__args__': ('2006', '06', '12', 'slug')}

By using a tuple or list, you can be sure you don't have a sparse list, 
which probably isn't something any system is likely to handle.  The 
double underscores kind of mark __args__ as a special kind of key, so 
it's less likely to overlap with a simple named key.  Removing it from 
the dict or handling it is special; you don't have to look at all the 
keys to see if any are ints, you just test "'__args__' in url_vars".

Would this satisfy everyone?

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From luke.arno at gmail.com  Tue Oct 24 04:14:52 2006
From: luke.arno at gmail.com (Luke Arno)
Date: Mon, 23 Oct 2006 22:14:52 -0400
Subject: [Web-SIG] wsgi.url_vars feedback
In-Reply-To: <453D44AE.3030604@colorstudy.com>
References: <B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
	<453D44AE.3030604@colorstudy.com>
Message-ID: <d79e89ce0610231914q6c12713egfb8ef864a26d6ead@mail.gmail.com>

On 10/23/06, Ian Bicking <ianb at colorstudy.com> wrote:
> Simon Willison wrote:
> > I've spotted a potential problem with your wsgi.url_vars specification
> > suggestion.
> >
> > http://wsgi.org/wsgi/Specifications/url_vars
> >
> > The spec calls for wsgi.url_vars to refer to a dictionary. In Django, we
> > originally required named captures in regular expressions - but
> > eventually realised that for many cases just having positional captures
> > was less work for developers and worked just as well. Here's some code I
> > wrote today:
> >
> >     (r'^archive/(\d+)/(\d+)/(\d+)/(\w+)/$', blog.entry),
> >
> > def entry(request, year, month, day, slug):
> >     # ...
> >
> > This form of URL variable extraction does not appear to be covered by
> > your wsgi.url_vars spec. One solution could be to extend the spec to
> > suggest using integer keys to represent this case?
> >
> > environ['wsgi.url_vars'] = { 1: '2006', 2: '06', 3: '12', 4: 'slug' }
>
> Christopher Lenz also brought this up.  My inclination is something like:
>
> environ['wsgi.url_vars'] = {'__args__': ('2006', '06', '12', 'slug')}
>
> By using a tuple or list, you can be sure you don't have a sparse list,
> which probably isn't something any system is likely to handle.  The
> double underscores kind of mark __args__ as a special kind of key, so
> it's less likely to overlap with a simple named key.  Removing it from
> the dict or handling it is special; you don't have to look at all the
> keys to see if any are ints, you just test "'__args__' in url_vars".
>
> Would this satisfy everyone?

Since numbers are not legal names for named groups
anyway, why not just put them in the dict? It seems
easier in more cases.

Of course, this is more difficult in the case of handling
an unknown number of positional args, but that case
seems rather far off into the corner, no?

(By the by, I consider named groups a best practice as
there is a tighter dependency between code and URI
with positional args. Sometimes it may not matter but...)

Cheers,
- Luke

From h.then at pythea.nl  Tue Oct 24 14:25:29 2006
From: h.then at pythea.nl (Hans Then)
Date: Tue, 24 Oct 2006 12:25:29 -0000
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
Message-ID: <20061024104829.032CC1E4006@bag.python.org>

Phillip,

> -1 on this being middleware.  If middleware wants to read the input,
> it should copy it to a temporary file or StringIO, not remove it.

> The simple, standard way to do something like this would be to have a
> library routine like 'get_form_vars(environ)'.  The routine would
> check for the form vars key, and if not present, then it would
> process the input and cache the information in the environment.  It
> could even have an option to clone the input, in case the routine is
> being used from middleware.

I think Ian's point is to standardise on a form key and on the interface of
the form object. Your point is valid that middleware should not
destructively read the wsgi.input variable.

Many web applications will at some point call other web applications. It
seems positively wasteful to have to clone and parse wsgi.input over and
over again. It makes sense to do it once, in middleware, and then stuff it
in a standard place in the wsgi environ.

Would you +1 the proposal if it is added that middleware does not destroy
the wsgi.input variable but clones it?

Regards,

Hans Then


From ianb at colorstudy.com  Tue Oct 24 17:25:53 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 24 Oct 2006 10:25:53 -0500
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <20061024104829.032CC1E4006@bag.python.org>
References: <20061024104829.032CC1E4006@bag.python.org>
Message-ID: <453E3081.7060205@colorstudy.com>

Hans Then wrote:
> Phillip,
> 
>> -1 on this being middleware.  If middleware wants to read the input,
>> it should copy it to a temporary file or StringIO, not remove it.
> 
>> The simple, standard way to do something like this would be to have a
>> library routine like 'get_form_vars(environ)'.  The routine would
>> check for the form vars key, and if not present, then it would
>> process the input and cache the information in the environment.  It
>> could even have an option to clone the input, in case the routine is
>> being used from middleware.
> 
> I think Ian's point is to standardise on a form key and on the interface of
> the form object. Your point is valid that middleware should not
> destructively read the wsgi.input variable.
> 
> Many web applications will at some point call other web applications. It
> seems positively wasteful to have to clone and parse wsgi.input over and
> over again. It makes sense to do it once, in middleware, and then stuff it
> in a standard place in the wsgi environ.

Ideally I'm not expecting middleware to do this parsing (unless there's 
some good reason for the middleware to want the information).  I'm 
suggesting a way the parsing can be done in a lazy fashion, but that one 
consumer doesn't get exclusive access to it.

I also see this as a kind of prerequisite for supporting multiple 
request objects over the WSGI environment, as each object is going to 
want access to wsgi.input.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From pje at telecommunity.com  Tue Oct 24 18:50:04 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 24 Oct 2006 12:50:04 -0400
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <20061024104829.032CC1E4006@bag.python.org>
Message-ID: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com>

At 12:25 PM 10/24/2006 +0000, Hans Then wrote:
>Phillip,
>
> > -1 on this being middleware.  If middleware wants to read the input,
> > it should copy it to a temporary file or StringIO, not remove it.
>
> > The simple, standard way to do something like this would be to have a
> > library routine like 'get_form_vars(environ)'.  The routine would
> > check for the form vars key, and if not present, then it would
> > process the input and cache the information in the environment.  It
> > could even have an option to clone the input, in case the routine is
> > being used from middleware.
>
>I think Ian's point is to standardise on a form key and on the interface of
>the form object. Your point is valid that middleware should not
>destructively read the wsgi.input variable.
>
>Many web applications will at some point call other web applications. It
>seems positively wasteful to have to clone and parse wsgi.input over and
>over again. It makes sense to do it once, in middleware, and then stuff it
>in a standard place in the wsgi environ.

Re-read what I wrote.  If you have a common library routine, the parsing 
(and optional cloning) only happens *once*.  If middleware needs access to 
the data, it can just call the library routine.

This should NOT be implemented as middleware that adds the key; it's 
completely unnecessary.  Middleware is only required for features that 
actually *modify* or *monitor* the request or response, as opposed to 
merely *adding* new request-side data derived from existing environ 
keys.  If you want to improve the WSGI request API, the proper place to do 
so is by using library routines that cache their computations in the 
environ dictionary.

In fact, there isn't even any technical need to "officially" standardize 
the environ keys for these functions.  Just release libraries that have the 
features, so everyone can just install them.  Then we won't all have five 
different libraries, each with its own routine just to do the same 
'get_form_vars()' operation!

Successful routines with sufficiently broad appeal and minimal impact could 
then be targeted for inclusion in later versions of wsgiref (and ultimately 
the stdlib).  This seems to me the cleanest overall way to add API 
"friendliness" to WSGI.

(We could even discuss such things in the form of proposed patches to the 
wsgiref code and documentation, then put them into the current wsgiref 
release.)


>Would you +1 the proposal if it is added that middleware does not destroy
>the wsgi.input variable but clones it?

I didn't -1 the proposal, I -1'd middleware.  And the -1 
stands.  Middleware is absolutely not the place for adding derivative 
environ keys like this.  It's 100% unnecessary, adds complexity, and 
reduces performance in the process.


From ianb at colorstudy.com  Tue Oct 24 18:56:03 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 24 Oct 2006 11:56:03 -0500
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com>
Message-ID: <453E45A3.20700@colorstudy.com>

Phillip J. Eby wrote:
> At 12:25 PM 10/24/2006 +0000, Hans Then wrote:
>> Phillip,
>>
>>> -1 on this being middleware.  If middleware wants to read the input,
>>> it should copy it to a temporary file or StringIO, not remove it.
>>> The simple, standard way to do something like this would be to have a
>>> library routine like 'get_form_vars(environ)'.  The routine would
>>> check for the form vars key, and if not present, then it would
>>> process the input and cache the information in the environment.  It
>>> could even have an option to clone the input, in case the routine is
>>> being used from middleware.
>> I think Ian's point is to standardise on a form key and on the interface of
>> the form object. Your point is valid that middleware should not
>> destructively read the wsgi.input variable.
>>
>> Many web applications will at some point call other web applications. It
>> seems positively wasteful to have to clone and parse wsgi.input over and
>> over again. It makes sense to do it once, in middleware, and then stuff it
>> in a standard place in the wsgi environ.
> 
> Re-read what I wrote.  If you have a common library routine, the parsing 
> (and optional cloning) only happens *once*.  If middleware needs access to 
> the data, it can just call the library routine.
> 
> This should NOT be implemented as middleware that adds the key; it's 
> completely unnecessary.  Middleware is only required for features that 
> actually *modify* or *monitor* the request or response, as opposed to 
> merely *adding* new request-side data derived from existing environ 
> keys.  If you want to improve the WSGI request API, the proper place to do 
> so is by using library routines that cache their computations in the 
> environ dictionary.
> 
> In fact, there isn't even any technical need to "officially" standardize 
> the environ keys for these functions.  Just release libraries that have the 
> features, so everyone can just install them.  Then we won't all have five 
> different libraries, each with its own routine just to do the same 
> 'get_form_vars()' operation!
> 
> Successful routines with sufficiently broad appeal and minimal impact could 
> then be targeted for inclusion in later versions of wsgiref (and ultimately 
> the stdlib).  This seems to me the cleanest overall way to add API 
> "friendliness" to WSGI.
> 
> (We could even discuss such things in the form of proposed patches to the 
> wsgiref code and documentation, then put them into the current wsgiref 
> release.)

That would be a landing place for an implementation of this library code 
that does what the spec implies.  But it relies on the release cycle for 
wsgiref, which is unclear and probably very slow since it is in the stdlib.

I have nothing against this being in wsgiref, I just would like to use 
this convention sooner rather than later.

>> Would you +1 the proposal if it is added that middleware does not destroy
>> the wsgi.input variable but clones it?
> 
> I didn't -1 the proposal, I -1'd middleware.  And the -1 
> stands.  Middleware is absolutely not the place for adding derivative 
> environ keys like this.  It's 100% unnecessary, adds complexity, and 
> reduces performance in the process.

Please respond to my proposal, which as I've clarified does not imply 
any particular middleware.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From pje at telecommunity.com  Tue Oct 24 18:56:41 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 24 Oct 2006 12:56:41 -0400
Subject: [Web-SIG] wsgi.url_vars feedback
In-Reply-To: <453D44AE.3030604@colorstudy.com>
References: <B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
	<B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
Message-ID: <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com>

At 05:39 PM 10/23/2006 -0500, Ian Bicking wrote:
>By using a tuple or list, you can be sure you don't have a sparse list,
>which probably isn't something any system is likely to handle.  The
>double underscores kind of mark __args__ as a special kind of key, so
>it's less likely to overlap with a simple named key.  Removing it from
>the dict or handling it is special; you don't have to look at all the
>keys to see if any are ints, you just test "'__args__' in url_vars".
>
>Would this satisfy everyone?

Call it "wsgi.url_args", and make it a two-item tuple: *args, **kw.  That's 
far simpler than any of the wacky encodings proposed so far, and can be 
used to invoke a function directly, e.g.:

     apply(f, *environ['wsgi.url_args'])

or, less cleverly (i.e. more readably):

     args, kw = environ['wsgi.url_args']
     f(*args, **kw)


From pje at telecommunity.com  Tue Oct 24 19:52:54 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 24 Oct 2006 13:52:54 -0400
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <453E45A3.20700@colorstudy.com>
References: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com>
	<5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20061024133724.02da8130@sparrow.telecommunity.com>

At 11:56 AM 10/24/2006 -0500, Ian Bicking wrote:
>That would be a landing place for an implementation of this library code 
>that does what the spec implies.  But it relies on the release cycle for 
>wsgiref, which is unclear and probably very slow since it is in the stdlib.

Not really.  wsgiref is distributed standalone from the cheeseshop, so 
newer versions are just an easy_install away.


>I have nothing against this being in wsgiref, I just would like to use 
>this convention sooner rather than later.

Of course; the wsgiref thing was just a suggestion for where canonical 
implementations of these things would live.


>>>Would you +1 the proposal if it is added that middleware does not destroy
>>>the wsgi.input variable but clones it?
>>I didn't -1 the proposal, I -1'd middleware.  And the -1 
>>stands.  Middleware is absolutely not the place for adding derivative 
>>environ keys like this.  It's 100% unnecessary, adds complexity, and 
>>reduces performance in the process.
>
>Please respond to my proposal, which as I've clarified does not imply any 
>particular middleware.

You should clarify that in the proposal itself, explicitly forbidding it 
from being done by middleware unless the middleware is taking 
responsibility for request processing, or the middleware clones the 
environ.  Too many people, upon first encountering WSGI middleware, want to 
use it to add things to the request API, when it isn't necessary.  Notice 
Hans Then's reaction to my -1 on middleware, for example.

Writing correct middleware is already difficult, let's not encourage people 
to write more incorrect middleware by increasing the temptation to use 
middleware for trivial API enhancements that would be better done as 
libraries.  (Yes, I know that wasn't your intent, but at least one person 
besides me interpreted it as such.)

As far as the other open issues in the proposal, I don't really care 
much.  My main concern is making sure that the proposal doesn't encourage 
people to start creating middleware whose sole purpose is to add 
unnecessary junk to environ while breaking other applications as a side 
effect.  :)

(I do suggest, however, that a simpler way to assure WSGI compliance when 
removing wsgi.input may be to set the incoming content-length to zero.  An 
application or library that tries to read wsgi.input when the content 
length is zero is itself non-compliant.)


From ianb at colorstudy.com  Tue Oct 24 20:14:36 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 24 Oct 2006 13:14:36 -0500
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <5.1.1.6.0.20061024133724.02da8130@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com>
	<5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com>
	<5.1.1.6.0.20061024133724.02da8130@sparrow.telecommunity.com>
Message-ID: <453E580C.8090503@colorstudy.com>

Phillip J. Eby wrote:
>>>> Would you +1 the proposal if it is added that middleware does not 
>>>> destroy
>>>> the wsgi.input variable but clones it?
>>> I didn't -1 the proposal, I -1'd middleware.  And the -1 stands.  
>>> Middleware is absolutely not the place for adding derivative environ 
>>> keys like this.  It's 100% unnecessary, adds complexity, and reduces 
>>> performance in the process.
>>
>> Please respond to my proposal, which as I've clarified does not imply 
>> any particular middleware.
> 
> You should clarify that in the proposal itself, explicitly forbidding it 
> from being done by middleware unless the middleware is taking 
> responsibility for request processing, or the middleware clones the 
> environ.  Too many people, upon first encountering WSGI middleware, want 
> to use it to add things to the request API, when it isn't necessary.  
> Notice Hans Then's reaction to my -1 on middleware, for example.

OK, I'll clarify this.  Not that it's *horrible* that someone use this 
library function in middleware, but only if there's some reason specific 
to the middleware that they want to look at the POSTed form.  Middleware 
is a very vague concept, really, as anything that can forward the 
request onto another WSGI app is middleware, but many such things are 
themselves full applications.  (The specific example where this first 
really started bugging me was in paste.evalexception, which really is a 
bit of both application and middleware.)

But I will note that you should not parse the form unless you actually 
want it, not just so that it will show up in a parsed form for a later 
consumer.

> Writing correct middleware is already difficult, let's not encourage 
> people to write more incorrect middleware by increasing the temptation 
> to use middleware for trivial API enhancements that would be better done 
> as libraries.  (Yes, I know that wasn't your intent, but at least one 
> person besides me interpreted it as such.)
> 
> As far as the other open issues in the proposal, I don't really care 
> much.  My main concern is making sure that the proposal doesn't 
> encourage people to start creating middleware whose sole purpose is to 
> add unnecessary junk to environ while breaking other applications as a 
> side effect.  :)
> 
> (I do suggest, however, that a simpler way to assure WSGI compliance 
> when removing wsgi.input may be to set the incoming content-length to 
> zero.  An application or library that tries to read wsgi.input when the 
> content length is zero is itself non-compliant.)

That seems like an inaccurate representation of the request itself, and 
likely to cover up problems.  If you want to look at the POSTed form, 
and you aren't aware of this convention and the environment key, then 
there's an unresolvable error.  So it should just produce an exception; 
if you set CONTENT_LENGTH to 0 then the consumer will just happily 
assume there is no data, which leads to incorrect behavior.

I won't be able to update the spec right away, but when I get a chance I 
will do so.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From foom at fuhm.net  Thu Oct 26 10:26:28 2006
From: foom at fuhm.net (James Y Knight)
Date: Thu, 26 Oct 2006 04:26:28 -0400
Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility
In-Reply-To: <ca471dc20609291231o60293553w6622187903ba784e@mail.gmail.com>
References: <451D1D22.5090607@openapp.biz>
	<ca471dc20609291231o60293553w6622187903ba784e@mail.gmail.com>
Message-ID: <B9291DC4-C2A8-4F5E-AC13-8227AF6259E2@fuhm.net>


On Sep 29, 2006, at 3:31 PM, Guido van Rossum wrote:

> On 9/29/06, Michael Kerrin <michael.kerrin at openapp.biz> wrote:
>>   But the current implementation of cgi.FieldStorage in the 2.4.4  
>> branch
>> and on Python 2.5 does call readline with the size argument. It has
>> started to do this in response to the Python bug #1112549 -
>> cgi.FieldStorage memory usage can spike in line-oriented ops. See
>> http://sourceforge.net/tracker/index.php? 
>> func=detail&aid=1112549&group_id=5470&atid=105470
>>
>>   Since it is reasonable for a WSGI application to use  
>> cgi.FieldStorage
>> I am wondering whether cgi.FieldStorage or the WSGI specification  
>> needs
>> to changed in order to solve this incompatibility.
>>
>>   Originally I thought it was cgi.FieldStorage that needs to be  
>> changed,
>> and hence tried to fix it by wrapping the input stream so that the
>> readline method always uses the read method on the input stream.  
>> While
>> this seems to work for me it introduces a level of complexity in the
>> cgi.py file, and possible some other bugs, that makes me think that
>> adding the size argument for readline into the WSGI specification  
>> isn't
>> such bad idea after all.
>
> Since that change to cgi.py was a security fix I would strongly
> recommend not to remove it and to change the WSGI spec instead.

Given that this change is now part of python 2.4.4 and python 2.5, it  
seems to me it is now a defacto requirement that all WSGI server  
implementations must support readline with a size argument in order  
to run any interesting software, despite the spec explicitly saying  
that you shouldn't. I suspect simply modifying the spec to follow the  
current reality would be the least bad option.

But this kind of destabilizing breakage really shouldn't be allowed  
to happen again. Once the error was discovered, the cgi.py change  
should have been immediately reverted until either a decision was  
made to change the WSGI spec, or else the change fixed to not break  
WSGI compliant servers. This limbo situation is pretty bad.

James


From jim at zope.com  Thu Oct 26 13:14:15 2006
From: jim at zope.com (Jim Fulton)
Date: Thu, 26 Oct 2006 07:14:15 -0400
Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility
In-Reply-To: <B9291DC4-C2A8-4F5E-AC13-8227AF6259E2@fuhm.net>
References: <451D1D22.5090607@openapp.biz>	<ca471dc20609291231o60293553w6622187903ba784e@mail.gmail.com>
	<B9291DC4-C2A8-4F5E-AC13-8227AF6259E2@fuhm.net>
Message-ID: <45409887.5020609@zope.com>

James Y Knight wrote:
> On Sep 29, 2006, at 3:31 PM, Guido van Rossum wrote:
> 
>> On 9/29/06, Michael Kerrin <michael.kerrin at openapp.biz> wrote:
>>>   But the current implementation of cgi.FieldStorage in the 2.4.4  
>>> branch
>>> and on Python 2.5 does call readline with the size argument. It has
>>> started to do this in response to the Python bug #1112549 -
>>> cgi.FieldStorage memory usage can spike in line-oriented ops. See
>>> http://sourceforge.net/tracker/index.php? 
>>> func=detail&aid=1112549&group_id=5470&atid=105470
>>>
>>>   Since it is reasonable for a WSGI application to use  
>>> cgi.FieldStorage
>>> I am wondering whether cgi.FieldStorage or the WSGI specification  
>>> needs
>>> to changed in order to solve this incompatibility.
>>>
>>>   Originally I thought it was cgi.FieldStorage that needs to be  
>>> changed,
>>> and hence tried to fix it by wrapping the input stream so that the
>>> readline method always uses the read method on the input stream.  
>>> While
>>> this seems to work for me it introduces a level of complexity in the
>>> cgi.py file, and possible some other bugs, that makes me think that
>>> adding the size argument for readline into the WSGI specification  
>>> isn't
>>> such bad idea after all.
>> Since that change to cgi.py was a security fix I would strongly
>> recommend not to remove it and to change the WSGI spec instead.
> 
> Given that this change is now part of python 2.4.4 and python 2.5, it  
> seems to me it is now a defacto requirement that all WSGI server  
> implementations must support readline with a size argument in order  
> to run any interesting software, despite the spec explicitly saying  
> that you shouldn't. I suspect simply modifying the spec to follow the  
> current reality would be the least bad option.

Yes and updating the server implementations, of course, where necessary.

> But this kind of destabilizing breakage really shouldn't be allowed  
> to happen again. Once the error was discovered, the cgi.py change  
> should have been immediately reverted until either a decision was  
> made to change the WSGI spec, or else the change fixed to not break  
> WSGI compliant servers. This limbo situation is pretty bad.

Agreed.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From ianb at colorstudy.com  Tue Oct 31 20:12:05 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 31 Oct 2006 13:12:05 -0600
Subject: [Web-SIG] Proposal: Handling POST forms in WSGI
In-Reply-To: <45422555.9020904@doxdesk.com>
References: <453A8B67.4070409@colorstudy.com> <45422555.9020904@doxdesk.com>
Message-ID: <4547A005.1010208@colorstudy.com>

(Copied back to the list)

Andrew Clover wrote:
> Ian Bicking <ianb at colorstudy.com> wrote:
> 
>  > When this happens, the form can be parsed by ``cgi.FieldStorage``.
> 
> Agree with the objections others have posted.
> 
> There are many alternative things one might want to do with the body 
> that don't involve the cgi module (which is old, frequently inconvenient 
> and offers poor performance in some areas). Please leave the decision on 
> what to do with the contents of wsgi.input to the discretion of a 
> higher-level framework/middleware component.

This does not require anyone to use the cgi module.  This addresses what 
you can do when you do use the cgi module (which realistically is what 
everyone does -- I've literally never seen an exception, though I 
imagine there are one or two somewhere).  It needs to be clarified that 
parsing should still be done lazily and deferred as long as possible, 
but when it doesn't get deferred this offers a simple solution for later 
consumers that also use cgi.FieldStorage.

> I have more sympathy for the idea of keeping a copy of the entire POST 
> request so it can be read again (eg. by having a component that consumes 
> wsgi.input replace it with a StringIO returning the same content). 
> However I don't see *mandating* this as a good move, given that a POST 
> can contain multimegabyte file uploads.

Keeping the POST request feels heavy considering it usually isn't 
needed.  The proposal requires very little overhead.

> How about asking that something that consumes wsgi.input replace it with 
> either:
> 
>   - the original stream seek()ed to 0, if possible;

Possible; depends on the deployment and the middleware involved. 
Requiring seek to work means that code will only work in particular 
deployments.

>   - a new streamlike echoing the post request;

This would be nice, and would allow for smart intermediaries to be 
compatible with dumb consumers, and potentially smart consumers could 
skip reparsing without any overhead.  I don't have any code to do this.

If code to do this emerged, it would be very reasonable to 
InputProcessed in the spec with this better implementation.

Note of course if you parse with cgi, throw the original body away, then 
recreate the body from the FieldStorage object, it is unlikely that you 
can improve on the cgi module in any way.  Only if you get first crack 
at wsgi.input can you improve it.

>   - or a None or dummy stream which will guarantee a quick exception
>     if the stream is re-read later, rather than just mysteriously
>     blocking forever?

This is part of my proposal.


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From ianb at colorstudy.com  Tue Oct 31 23:17:42 2006
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue, 31 Oct 2006 16:17:42 -0600
Subject: [Web-SIG] wsgi.url_vars feedback
In-Reply-To: <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com>
References: <B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
	<B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
	<5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com>
Message-ID: <4547CB86.5000803@colorstudy.com>

Phillip J. Eby wrote:
> At 05:39 PM 10/23/2006 -0500, Ian Bicking wrote:
>> By using a tuple or list, you can be sure you don't have a sparse list,
>> which probably isn't something any system is likely to handle.  The
>> double underscores kind of mark __args__ as a special kind of key, so
>> it's less likely to overlap with a simple named key.  Removing it from
>> the dict or handling it is special; you don't have to look at all the
>> keys to see if any are ints, you just test "'__args__' in url_vars".
>>
>> Would this satisfy everyone?
> 
> Call it "wsgi.url_args", and make it a two-item tuple: *args, **kw.  
> That's far simpler than any of the wacky encodings proposed so far, and 
> can be used to invoke a function directly, e.g.:
> 
>     apply(f, *environ['wsgi.url_args'])
> 
> or, less cleverly (i.e. more readably):
> 
>     args, kw = environ['wsgi.url_args']
>     f(*args, **kw)

Having thought about it, I think storing a tuple of (args, kwargs) is 
the best way to do this, since it's most explicit.  Consumers can deal 
with args specially, ignore them, or raise an error, as they see fit -- 
there are reasons to do each of these.  Hiding args in kwargs makes this 
choice more implicit, and probably more error prone as a result.

One little question: if a dispatcher can never produce one of the kinds 
of information (which happens for some of them), should they put in an 
empty list/tuple or empty dict, or should they put in None for that 
item?  I'm currently saying they must put in a list/tuple or dict.

Anyway, I've updated the spec:

http://wsgi.org/wsgi/Specifications/url_vars
http://wsgi.org/wsgi/Specifications/url_vars?action=diff

Is everyone happy with this version?


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From fumanchu at amor.org  Tue Oct 31 23:35:07 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Tue, 31 Oct 2006 14:35:07 -0800
Subject: [Web-SIG] wsgi.url_vars feedback
Message-ID: <435DF58A933BA74397B42CDEB8145A86064953D9@ex9.hostedexchange.local>

Ian Bicking wrote:
> Having thought about it, I think storing a tuple of
> (args, kwargs) is the best way to do this, since it's
> most explicit.  Consumers can deal with args specially,
> ignore them, or raise an error, as they see fit -- 
> there are reasons to do each of these.  Hiding args
> in kwargs makes this choice more implicit, and probably
> more error prone as a result.
> 
> One little question: if a dispatcher can never produce
> one of the kinds of information (which happens for some
> of them), should they put in an empty list/tuple or
> empty dict, or should they put in None for that item?
> I'm currently saying they must put in a list/tuple or dict.

I would've thought they'd just leave out the entry altogether.


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

From pje at telecommunity.com  Tue Oct 31 23:48:01 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 31 Oct 2006 17:48:01 -0500
Subject: [Web-SIG] wsgi.url_vars feedback
In-Reply-To: <4547CB86.5000803@colorstudy.com>
References: <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com>
	<B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
	<B6C7339A-B7C2-4FCB-B444-BB0D272887DB@simonwillison.net>
	<5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20061031174422.026b9108@sparrow.telecommunity.com>

At 04:17 PM 10/31/2006 -0600, Ian Bicking wrote:
>One little question: if a dispatcher can never produce one of the kinds of 
>information (which happens for some of them), should they put in an empty 
>list/tuple or empty dict, or should they put in None for that item?  I'm 
>currently saying they must put in a list/tuple or dict.

This is the correct choice, IMO.  I think the spec should be explicit, 
however, that these values should be usable with * and ** (or apply()), as 
that will help clarify the meaning/rationale of the values.


>Anyway, I've updated the spec:
>
>http://wsgi.org/wsgi/Specifications/url_vars
>http://wsgi.org/wsgi/Specifications/url_vars?action=diff
>
>Is everyone happy with this version?

I still think it should be url_args rather than url_vars -- I don't see any 
reason why they could be considered "variables" rather than 
arguments.  :)  But other than that, and the desire to see clarification 
about */** as an intended/supported use case, I give it a +1.