From evdo.hsdpa at gmail.com Thu Oct 5 01:29:10 2006 From: evdo.hsdpa at gmail.com (Robert Kim Wireless Internet Advisor) Date: Wed, 4 Oct 2006 16:29:10 -0700 Subject: [Web-SIG] ruby rails / python dev needed for small webapp Message-ID: <1ec620e90610041629p1d28fe6cx1c939b355abc11c5@mail.gmail.com> any body got time to build out a suuuper simple webapp? -- Robert Q Kim, Internet Advisor Provider http://wireless-internet-access-provider.com http://evdo-coverage.com 2611 S. Pacific Coast Highway 101 Suite 203 Cardiff by the Sea, CA 92007 206 984 0880 From michael.kerrin at openapp.biz Thu Oct 5 12:07:43 2006 From: michael.kerrin at openapp.biz (Michael Kerrin) Date: Thu, 5 Oct 2006 11:07:43 +0100 Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility In-Reply-To: References: <451D1D22.5090607@openapp.biz> Message-ID: <200610051107.43507.michael.kerrin@openapp.biz> Hi, On Friday 29 September 2006 20:31, Guido van Rossum wrote: > On 9/29/06, Michael Kerrin wrote: > > But the current implementation of cgi.FieldStorage in the 2.4.4 branch > > and on Python 2.5 does call readline with the size argument. It has > > started to do this in response to the Python bug #1112549 - > > cgi.FieldStorage memory usage can spike in line-oriented ops. See > > http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id > >=5470&atid=105470 > > > > Since it is reasonable for a WSGI application to use cgi.FieldStorage > > I am wondering whether cgi.FieldStorage or the WSGI specification needs > > to changed in order to solve this incompatibility. > > > > Originally I thought it was cgi.FieldStorage that needs to be changed, > > and hence tried to fix it by wrapping the input stream so that the > > readline method always uses the read method on the input stream. While > > this seems to work for me it introduces a level of complexity in the > > cgi.py file, and possible some other bugs, that makes me think that > > adding the size argument for readline into the WSGI specification isn't > > such bad idea after all. > > Since that change to cgi.py was a security fix I would strongly > recommend not to remove it and to change the WSGI spec instead. I wasn't recommending to remove that fix but instead I was trying get around both problems by using the read method on the input stream instead of the readline method. Since there are no problems passing the size argument to the read method. I think the best thing to do for now is to open a bug report on sourceforge. Thanks Michael -- Michael Kerrin 55 Fitzwilliam Sq., Dublin 2. Tel: 087 688 3894 From janssen at parc.com Wed Oct 18 00:48:12 2006 From: janssen at parc.com (Bill Janssen) Date: Tue, 17 Oct 2006 15:48:12 PDT Subject: [Web-SIG] WSGI -- usable for other protocols? Message-ID: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com> I've been working on Python IMAP server that uses PyLucene for indexing. It's mainly IMAP, but also speaks a bit of HTTP for an administrative interface. Does it make any sense to wrap it with WSGI? That is, does WSGI make sense for other protocols than HTTP (specifically IMAP)? And, what WSGI-supporting environments will also support PyLucene (the limiting factor is that the GCJ runtime has to be linked in, and all threads must be GCJ threads). Bill From ianb at colorstudy.com Wed Oct 18 01:25:46 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 17 Oct 2006 18:25:46 -0500 Subject: [Web-SIG] WSGI -- usable for other protocols? In-Reply-To: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com> References: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com> Message-ID: <4535667A.8010603@colorstudy.com> Bill Janssen wrote: > I've been working on Python IMAP server that uses PyLucene for > indexing. It's mainly IMAP, but also speaks a bit of HTTP for an > administrative interface. Does it make any sense to wrap it with > WSGI? That is, does WSGI make sense for other protocols than HTTP > (specifically IMAP)? I would probably wrap it in WSGI, because that would please me. For something like IMAP, FTP, etc., you'd have to have some persistent server that holds the connection open, then turns certain commands into requests. I've been thinking about doing this for dbus (http://www.freedesktop.org/wiki/Software/dbus) But I dunno... is there some WSGI libraries you'd like to leverage in your IMAP server? Do you want to maintain a IMAP server with a parallel HTTP interface? Anyway, I don't think it would be particularly hard to do. > And, what WSGI-supporting environments will also support PyLucene (the > limiting factor is that the GCJ runtime has to be linked in, and all > threads must be GCJ threads). Yikes, not sure about that. Can the normal threads communicate via some queue to gcj threads? Otherwise the WSGI server is where the threads are handled, so it would require tweaking some server for that (none use gcj threads currently). You'd also need a WSGI server that handled IMAP and persistent connections. So maybe another server is called for, or an adaptation of an existing multi-protocol server. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From janssen at parc.com Wed Oct 18 02:03:05 2006 From: janssen at parc.com (Bill Janssen) Date: Tue, 17 Oct 2006 17:03:05 PDT Subject: [Web-SIG] WSGI -- usable for other protocols? In-Reply-To: Your message of "Tue, 17 Oct 2006 16:25:46 PDT." <4535667A.8010603@colorstudy.com> Message-ID: <06Oct17.170311pdt."58648"@synergy1.parc.xerox.com> > You'd also need a WSGI server that handled IMAP and persistent > connections. So maybe another server is called for, or an adaptation of > an existing multi-protocol server. That's my tentative conclusion. The WSGI handling doesn't really match the IMAP connection requests very well. I figured I'd adapt Medusa for this, again; set up an HTTP handler and an IMAP handler. But I thought I'd check the wisdom of the crowd, first. Bill From luke.arno at gmail.com Wed Oct 18 02:42:36 2006 From: luke.arno at gmail.com (Luke Arno) Date: Tue, 17 Oct 2006 20:42:36 -0400 Subject: [Web-SIG] WSGI Components Mailing List Message-ID: I set up a mailing list for WSGI component users and developers. I have had a few emails asking questions and looking for help. I thought it would be good to have a list with that scope. Homepage: http://groups.google.com/group/wsgi-components Group email: wsgi-components at googlegroups.com Description: WSGI is transforming Python Web development. It is now easy to snap together best-of-breed components to build applications or even roll your own frameworks (or "application profiles"). This list is for users and developers of WSGI components. Cheers, - Luke From exarkun at divmod.com Wed Oct 18 02:50:13 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Tue, 17 Oct 2006 20:50:13 -0400 Subject: [Web-SIG] WSGI -- usable for other protocols? In-Reply-To: <06Oct17.154816pdt."58648"@synergy1.parc.xerox.com> Message-ID: <20061018005013.26151.2100991412.divmod.quotient.5363@ohm> On Tue, 17 Oct 2006 15:48:12 PDT, Bill Janssen wrote: >I've been working on Python IMAP server that uses PyLucene for >indexing. It's mainly IMAP, but also speaks a bit of HTTP for an >administrative interface. Does it make any sense to wrap it with >WSGI? That is, does WSGI make sense for other protocols than HTTP >(specifically IMAP)? > >And, what WSGI-supporting environments will also support PyLucene (the >limiting factor is that the GCJ runtime has to be linked in, and all >threads must be GCJ threads). Not really a response to your question, but might I suggest you contribute to a project which sounds roughly equivalent to the one you're describing? http://divmod.org/trac/wiki/DivmodQuotient Jean-Paul From titus at caltech.edu Wed Oct 18 02:46:35 2006 From: titus at caltech.edu (Titus Brown) Date: Tue, 17 Oct 2006 17:46:35 -0700 Subject: [Web-SIG] WSGI Components Mailing List In-Reply-To: References: Message-ID: <20061018004635.GI30517@caltech.edu> On Tue, Oct 17, 2006 at 08:42:36PM -0400, Luke Arno wrote: -> I set up a mailing list for WSGI component users -> and developers. I have had a few emails asking -> questions and looking for help. I thought it would -> be good to have a list with that scope. What's wrong with keeping WSGI discussions on the web-sig list? Is it off-topic? --titus From janssen at parc.com Wed Oct 18 03:19:10 2006 From: janssen at parc.com (Bill Janssen) Date: Tue, 17 Oct 2006 18:19:10 PDT Subject: [Web-SIG] WSGI -- usable for other protocols? In-Reply-To: Your message of "Tue, 17 Oct 2006 17:50:13 PDT." <20061018005013.26151.2100991412.divmod.quotient.5363@ohm> Message-ID: <06Oct17.181917pdt."58648"@synergy1.parc.xerox.com> > might I suggest you contribute > to a project which sounds roughly equivalent to the one you're describing? > > http://divmod.org/trac/wiki/DivmodQuotient Just for fun, I grepped the sources for IMAP. No hits. Seems like I'd spend more time understanding the framework system you're using than it would take me to write it from scratch. An IMAP server isn't hard. And I don't think the project is all that equivalent. Does Twisted support the use of PyLucene? I basically want an IMAP server that supports the MH mail storage format, uses Lucene for indexing and search, and has the ability to do auto-filtering on a per-user basis with either MH procmail scripts or a Python script that uses a particular API. I don't need an SMTP server, I don't need a Web interface to mail. If DivmodQuotient is anywhere close to that, I'll take a longer look. Bill From exarkun at divmod.com Wed Oct 18 03:39:12 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Tue, 17 Oct 2006 21:39:12 -0400 Subject: [Web-SIG] WSGI -- usable for other protocols? In-Reply-To: <06Oct17.181917pdt."58648"@synergy1.parc.xerox.com> Message-ID: <20061018013912.26151.1203616830.divmod.quotient.5405@ohm> On Tue, 17 Oct 2006 18:19:10 PDT, Bill Janssen wrote: >> might I suggest you contribute >> to a project which sounds roughly equivalent to the one you're describing? >> >> http://divmod.org/trac/wiki/DivmodQuotient > >Just for fun, I grepped the sources for IMAP. No hits. Quite so. We're currently most of the way through an (unfortunate) rewrite to fix some database-related problems. IMAP4 hasn't been high on the port-list, so there's no IMAP4 code in the new codebase yet. However, Twisted's IMAP4 protocol implementation was developed for this project, and IMAP4 is on our mind as we implement things, so adding it isn't going to be obstructed by anything in Quotient (I would say "easy" but nothing related to IMAP4 is easy). > >Seems like I'd spend more time understanding the framework system >you're using than it would take me to write it from scratch. Ahhh, I doubt it. This isn't to say you wouldn't spend a while understanding the framework, but writing it from scratch would take longer. >An IMAP server isn't hard. Having spent fair chunks of the last several years implementing various IMAP4 servers, I must disagree. :) Unless you're happy with a semi-protocol spec, semi-broken server that doesn't scale to a decent number of messages, it's quite a haul. >And I don't think the project is all that equivalent. Strictly speaking, an IMAP4 server will be a subset of Quotient, and IMAP4 is by no means the main focus of Quotient, so maybe equivalent wasn't the right word. > >Does Twisted support the use of PyLucene? Quotient's using PyLucene for fulltext indexing already. So... yes :) > >I basically want an IMAP server that supports the MH mail storage >format, uses Lucene for indexing and search, and has the ability to do >auto-filtering on a per-user basis with either MH procmail scripts or >a Python script that uses a particular API. I don't need an SMTP >server, I don't need a Web interface to mail. It's possible you'd be happier basing the IMAP4 server on Twisted's protocol support, rather than starting from Quotient (although _I'd_ be happier if you added IMAP4 support to Quotient ;). Quotient uses a SQLite database for storage of structured data about messages and a filesystem structure (currently not a great structure, but it's fixable) for actual message files. It supports per-user filtering rules (although not procmail based - and the work done in this area so far is extremely minimal, basically it can do substring matching on headers - expanding this would be pretty simple though, Quotient is designed for this kind of thing). The SMTP server can be turned off completely, although then you need another mechanism for adding messages to the system (inotify + directory would work, but you'll have to write that part). For users, the web interface is optional too, but various admin tasks may continue to require some web interaction. > >If DivmodQuotient is anywhere close to that, I'll take a longer look. It sounds like you might be happier starting from Twisted's IMAP4 code rather than doing the work in Quotient, unless your requirements are somewhat more flexible than I have gotten the impression that they are. I _certainly_ would not recommend doing the protocol implementation from scratch. Using Twisted there at least is a complete win. As for PyLucene in that scenario, there's no _direct_ support for gcj threads in Twisted. All of my work with PyLucene in Quotient has been in the main thread in a child process of the main process (desirable to avoid segfaulting the main server, mainly). Note (in case it isn't obvious yet) I'm a developer on both Quotient and Twisted, and I wrote pretty much all of Twisted's IMAP4 code. It's possible I'm slightly biased. That said, lots of people have told me Twisted's IMAP4 implementation is the best they've worked with, that it saved their thesis, project, company, life, etc. ;) Jean-Paul From janssen at parc.com Wed Oct 18 05:00:35 2006 From: janssen at parc.com (Bill Janssen) Date: Tue, 17 Oct 2006 20:00:35 PDT Subject: [Web-SIG] WSGI -- usable for other protocols? In-Reply-To: Your message of "Tue, 17 Oct 2006 18:39:12 PDT." <20061018013912.26151.1203616830.divmod.quotient.5405@ohm> Message-ID: <06Oct17.200037pdt."58648"@synergy1.parc.xerox.com> Well, I'll definitely check out Twisted's IMAP4 code. Thanks! > Quotient uses a SQLite database for storage of structured data about > messages and a filesystem structure (currently not a great structure, > but it's fixable) for actual message files. I was sort of planning on keeping all the message metadata in the Lucene DB. MH uses a filesystem structure too. Maybe there's hope. > It supports per-user filtering rules (although not procmail based - and > the work done in this area so far is extremely minimal, basically it can > do substring matching on headers - expanding this would be pretty simple > though, Quotient is designed for this kind of thing). This doesn't sound too far from what I intended, actually. I'd like to keep the Lucene index in memory, and don't particularly want the overhead of process swaps, so I'd like to be able to use them together in a single address space. It sounds like you've worked out most of the issues with IMAP4, so I'll take a closer look. Bill From luke.arno at gmail.com Wed Oct 18 05:05:21 2006 From: luke.arno at gmail.com (Luke Arno) Date: Tue, 17 Oct 2006 23:05:21 -0400 Subject: [Web-SIG] WSGI Components Mailing List In-Reply-To: <20061018004635.GI30517@caltech.edu> References: <20061018004635.GI30517@caltech.edu> Message-ID: On 10/17/06, Titus Brown wrote: > On Tue, Oct 17, 2006 at 08:42:36PM -0400, Luke Arno wrote: > -> I set up a mailing list for WSGI component users > -> and developers. I have had a few emails asking > -> questions and looking for help. I thought it would > -> be good to have a list with that scope. > > What's wrong with keeping WSGI discussions on the web-sig list? Is it > off-topic? > I am not talking about higher level conversations regarding WSGI. The various frameworks have communities where users can go for help and developers can coordinate their specific efforts. Maybe this list is the place for it, but I have a feeling that if I start giving support to users of various components, it would be a little too noisy. What do you think? I am happy to direct these conversations to wherever folks want. Is this the place, after all? Thanks. Cheers, - Luke From jim at zope.com Wed Oct 18 12:55:23 2006 From: jim at zope.com (Jim Fulton) Date: Wed, 18 Oct 2006 06:55:23 -0400 Subject: [Web-SIG] WSGI Components Mailing List In-Reply-To: <20061018004635.GI30517@caltech.edu> References: <20061018004635.GI30517@caltech.edu> Message-ID: <4536081B.80803@zope.com> Titus Brown wrote: > On Tue, Oct 17, 2006 at 08:42:36PM -0400, Luke Arno wrote: > -> I set up a mailing list for WSGI component users > -> and developers. I have had a few emails asking > -> questions and looking for help. I thought it would > -> be good to have a list with that scope. > > What's wrong with keeping WSGI discussions on the web-sig list? Is it > off-topic? I don't think so. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From janssen at parc.com Wed Oct 18 17:04:08 2006 From: janssen at parc.com (Bill Janssen) Date: Wed, 18 Oct 2006 08:04:08 PDT Subject: [Web-SIG] WSGI Components Mailing List In-Reply-To: Your message of "Tue, 17 Oct 2006 20:05:21 PDT." Message-ID: <06Oct18.080417pdt."58648"@synergy1.parc.xerox.com> > I am happy to direct these conversations to > wherever folks want. Is this the place, after all? You bet! Let's keep things here, till folks complain. Bill From sh at defuze.org Wed Oct 18 22:16:01 2006 From: sh at defuze.org (Sylvain Hellegouarch) Date: Wed, 18 Oct 2006 21:16:01 +0100 Subject: [Web-SIG] wsgiref bug with HEAD request Message-ID: <45368B81.8090308@defuze.org> All, It seems the default server from wsgiref (from wsgiref.simple_server import make_server) seems not to respect Content-Length in case of HEAD request. Since no body can be returned in a response to a HEAD request, the content length is set to 0 by the server. In that case Content-Length is therefore set by the application or a middleware. wsgiref server disregard the existing value and sets to 0 either way. Seems bogus to me or am I missing something here? - Sylvain From ianb at colorstudy.com Sat Oct 21 19:49:06 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 21 Oct 2006 12:49:06 -0500 Subject: [Web-SIG] Proposal: wsgi.url_vars Message-ID: <453A5D92.4000603@colorstudy.com> I think there's room for some more standards building on WSGI (that aren't actually extensions of the WSGI spec itself). I put a page up on the wsgi.org site for this: http://wsgi.org/wsgi/Specifications And I'm introducing what I think is low-hanging fruit in the specification realm: wsgi.url_vars http://wsgi.org/wsgi/Specifications/url_vars The spec is copied below for discussion: :Title: wsgi.url_vars :Author: Ian Bicking :Discussions-To: Python Web-SIG :Status: Draft :Created: 21-Oct-2006 .. contents:: Abstract -------- This proposes a new standard environment key ``environ['wsgi.url_vars']`` to represent the results of more complicated URL parsing strategies. Rationale --------- WSGI currently specifies the meaning of ``SCRIPT_NAME`` and ``PATH_INFO``, which allows generic prefix-based dispatchers to be created. These dispatchers can work with any WSGI application that respects the meaning of these two variables. The basic meaning of ``SCRIPT_NAME`` is *the portion of the path that has been consumed* and ``PATH_INFO`` is *the portion of the path left to the application*. Using these two variables more complex dispatchers cannot represent the information they pull out of the request path. This specification simply defines a place where such dispatchers can put their information: ``wsgi.url_vars``. Specification ------------- This specification defines a new key that can go in the WSGI environment, ``wsgi.url_vars``. This key is optional. If a dispatcher (like `routes `_ or `selector `_) pulls named information out of the portion of the request path it parses, it can put that information into a dictionary in ``environ['wsgi.url_vars']``. Portions of the path that have been parsed should still be moved to ``SCRIPT_NAME`` (and removed from ``PATH_INFO``). Example ------- This example is a dispatcher that is given regular expressions and matching applications. It checks each regular expression in turn, and when one matches it moves the named groups into ``wsgi.url_vars`` and dispatches to the associated application. :: class RegexDispatch(object): def __init__(self, patterns): self.patterns = patters def __call__(self, environ, start_response): script_name = environ.get('SCRIPT_NAME', '') path_info = environ.get('PATH_INFO', '') for regex, application in self.patterns: match = regex.match(path_info) if not match: continue extra_path_info = path_info[match.end():] if extra_path_info and not extra_path_info.startswith('/'): # Not a very good match continue groups = match.groupdict() environ.setdefault('wsgi.url_vars', {}).update(groups) environ['SCRIPT_NAME'] = script_name + path_info[:match.end()] environ['PATH_INFO'] = extra_path_info return application(environ, start_response) return self.not_found(environ, start_response) def not_found(self, environ, start_response): start_response('404 Not Found', [('Content-type', 'text/plain')]) return ['Not found'] dispatch_app = RegexDispatch([ (re.compile(r'/archive/(?P\d{4})/$'), archive_app), (re.compile(r'/archive/(?P\d{4})/(?P\d{2})/$'), archive_app), (re.compile(r'/archive/(?P\d{4})/(?P\d{2})/(?P\d+)$'), view_article), ]) From p.f.moore at gmail.com Sat Oct 21 21:39:36 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 21 Oct 2006 20:39:36 +0100 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <453A5D92.4000603@colorstudy.com> References: <453A5D92.4000603@colorstudy.com> Message-ID: <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com> On 10/21/06, Ian Bicking wrote: > Using these two variables more complex dispatchers cannot represent the > information they pull out of the request path. This specification > simply defines a place where such dispatchers can put their information: > ``wsgi.url_vars``. But what is the point? If the receiving application uses the url_vars information, it's tied to the particular dispatcher - so why does this need to be a standard key, rather than just a private convention? If the receiving application wants to remain compatible with generic dispatchers, how can it make use of url_vars? Or, to put it another way, can you provide a realistic example of a *consumer* of the information? Paul. From ianb at colorstudy.com Sat Oct 21 21:46:26 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 21 Oct 2006 14:46:26 -0500 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com> References: <453A5D92.4000603@colorstudy.com> <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com> Message-ID: <453A7912.4080608@colorstudy.com> Paul Moore wrote: > On 10/21/06, Ian Bicking wrote: >> Using these two variables more complex dispatchers cannot represent the >> information they pull out of the request path. This specification >> simply defines a place where such dispatchers can put their information: >> ``wsgi.url_vars``. > > But what is the point? If the receiving application uses the url_vars > information, it's tied to the particular dispatcher - so why does this > need to be a standard key, rather than just a private convention? If > the receiving application wants to remain compatible with generic > dispatchers, how can it make use of url_vars? Just like POST and QUERY_STRING variables, the meaning and content of the variables is unspecified. But it's useful that frameworks have a common way to parse and pass around the parsed information from those data sources. An application that uses url_vars is tied to *some* dispatcher that puts stuff into that location (though of course the application could also fall back to QUERY_STRING parsing or whatever). It's not tied to any particular dispatcher. Already there's two dispatchers (selector and routes) that put the same kind of information into environment keys (just in two separate locations). > Or, to put it another way, can you provide a realistic example of a > *consumer* of the information? Sure: http://bitworking.org/news/wsgicollection It takes arguments in 'selector.vars', but could take arguments from any dispatcher. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From p.f.moore at gmail.com Sat Oct 21 22:06:39 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 21 Oct 2006 21:06:39 +0100 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <453A7912.4080608@colorstudy.com> References: <453A5D92.4000603@colorstudy.com> <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com> <453A7912.4080608@colorstudy.com> Message-ID: <79990c6b0610211306p16ee53du21487d9985134138@mail.gmail.com> On 10/21/06, Ian Bicking wrote: > Just like POST and QUERY_STRING variables, the meaning and content of > the variables is unspecified. But it's useful that frameworks have a > common way to parse and pass around the parsed information from those > data sources. [...] > > Or, to put it another way, can you provide a realistic example of a > > *consumer* of the information? > > Sure: http://bitworking.org/news/wsgicollection > > It takes arguments in 'selector.vars', but could take arguments from any > dispatcher. Ah, I see now. Yes, that sounds like a good proposal (in the abstract - it's not something I have a need for myself). Paul. From luke.arno at gmail.com Sat Oct 21 22:10:22 2006 From: luke.arno at gmail.com (Luke Arno) Date: Sat, 21 Oct 2006 16:10:22 -0400 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <453A7912.4080608@colorstudy.com> References: <453A5D92.4000603@colorstudy.com> <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com> <453A7912.4080608@colorstudy.com> Message-ID: It seems like a good idea to me. I dislike dependencies. I have been working this way for a while and have been wondering about the same thing. Being able to use WSGI to wire up the components of an application (or framework or "application profile") enables more choice and flexibility. Relieving a dependency between a specific dispatcher and that which is dispatched to further serves the same ends. - Luke From ianb at colorstudy.com Sat Oct 21 22:37:41 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 21 Oct 2006 15:37:41 -0500 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <453A7912.4080608@colorstudy.com> References: <453A5D92.4000603@colorstudy.com> <79990c6b0610211239x7ea1f949m2a8aeaefa5b26db3@mail.gmail.com> <453A7912.4080608@colorstudy.com> Message-ID: <453A8515.3050806@colorstudy.com> Ian Bicking wrote: >> Or, to put it another way, can you provide a realistic example of a >> *consumer* of the information? > > Sure: http://bitworking.org/news/wsgicollection > > It takes arguments in 'selector.vars', but could take arguments from any > dispatcher. Another consumer came to mind: http://pythonpaste.org/class-paste.wsgiwrappers.WSGIRequest.html -- a generic wrapper around the WSGI environment, which could provide an attribute that would access this particular variable. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From joe at bitworking.org Sat Oct 21 23:02:57 2006 From: joe at bitworking.org (Joe Gregorio) Date: Sat, 21 Oct 2006 17:02:57 -0400 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <453A5D92.4000603@colorstudy.com> References: <453A5D92.4000603@colorstudy.com> Message-ID: <3f1451f50610211402m15f6ef2cw7a70396f1052a3a0@mail.gmail.com> On 10/21/06, Ian Bicking wrote: > I think there's room for some more standards building on WSGI (that > aren't actually extensions of the WSGI spec itself). > > I put a page up on the wsgi.org site for this: > http://wsgi.org/wsgi/Specifications > > And I'm introducing what I think is low-hanging fruit in the > specification realm: wsgi.url_vars > http://wsgi.org/wsgi/Specifications/url_vars > > The spec is copied below for discussion: +1 I like this, it will make middleware like wsgicollection possible without tightly binding them to the middleware you use to parse the URI. -joe -- Joe Gregorio http://bitworking.org From ianb at colorstudy.com Sat Oct 21 23:04:39 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 21 Oct 2006 16:04:39 -0500 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI Message-ID: <453A8B67.4070409@colorstudy.com> I've added another spec to wsgi.org: http://wsgi.org/wsgi/Specifications/handling_post_forms This one is a little more intrusive than wsgi.url_vars, but it addresses an outstanding source of problems: contention over wsgi.input. Text copied: :Title: Handling POST forms in WSGI :Author: Ian Bicking :Discussions-To: Python Web-SIG :Status: Draft :Created: 21-Oct-2006 .. contents:: Abstract -------- This suggests a way that WSGI middleware, applications, and frameworks can access POST form bodies so that there is less contention for the ``wsgi.input`` stream. Rationale --------- Currently ``environ['wsgi.input']`` points to a stream that represents the body of the HTTP request. Once this stream has been read, it cannot necessarily be read again. It may not have a ``seek`` method (none is required by the WSGI specification, and frequently none is provided by WSGI servers). As a result any piece of a system that looks at the request body essentially takes ownership of that body, and no one else is able to access it. This is particularly problematic for POST form requests, as many framework pieces expect to have access to this. Specification ------------- This applies when certain requirements of the WSGI environment are met:: def is_post_request(environ): if environ['REQUEST_METHOD'].upper() != 'POST': return False content_type = environ.get('CONTENT_TYPE', 'application/x-www-form-urlencoded') return ( content_type.startswith('application/x-www-form-urlencoded' or content_type.startswith('multipart/form-data')) That is, it must be a POST request, and it must be a form request (generally ``application/x-www-form-urlencoded`` or when there are file uploads ``multipart/form-data``). When this happens, the form can be parsed by ``cgi.FieldStorage``. The results of this parsing should be put in ``environ['wsgi.post_form']`` in a particular fashion:: def get_post_form(environ): assert is_post_request(environ) input = environ['wsgi.input'] post_form = environ.get('wsgi.post_form') if (post_form is not None and post_form[0] is input): return post_form[2] fs = cgi.FieldStorage(fp=input, environ=environ, keep_blank_values=1) new_input = InputProcessed('') post_form = (new_input, input, fs) environ['wsgi.post_form'] = post_form environ['wsgi.input'] = new_input return fs class InputProcessed(object): def read(self, *args): raise EOFError( 'The wsgi.input stream has already been consumed') readline = readlines = __iter__ = read This way multiple consumers can parse a POST form, accessing the form data in any order (later consumers will get the already-parsed data). The replacement ``wsgi.input`` guards against non-conforming access to the data, while the value in ``wsgi.post_form`` allows for access to the original ``wsgi.input`` in case it may be useful. By checking for the replacement ``wsgi.input`` when checking if ``wsgi.post_forms`` applies, this does not get in the way of WSGI middleware that may replace that key. If the key is replaced, then the parsed data is implicitly invalidated. Query String data ----------------- Note that nothing in this specification touches or applies to the query string (in ``environ['QUERY_STRING']``). This is not parsed as part of the process, and nothing in this specification applies to GET requests, or to the query string which may be present in a POST request. Open Issues ----------- 1. Is cgi.FieldStorage the best way to store the parsed data? It's the most common way, at least. 2. This doesn't address non-form-submission POST requests. Most of the same issues apply to such requests, except that frameworks tend not to touch the request body in that case. The body may be large, so the actual contents of the request body shouldn't go in the environment. Perhaps they could go in a temporary file, but this too might be an unnecessary indirection in many cases. Also other kinds of request (like PUT) that have a request body are not covered, for largely the same reason. In both these cases, it is much easier to construct a new ``wsgi.input`` that accesses whatever your internal representation of the request body is. 3. Is the tuple of information necessary in ``wsgi.post_form``, or could it just be the ``FieldStorage`` instance? 4. Should ``wsgi.input`` be replaced by ``InputProcessed``, or just left as is? From wilk-ml at flibuste.net Sun Oct 22 09:46:47 2006 From: wilk-ml at flibuste.net (William Dode) Date: Sun, 22 Oct 2006 07:46:47 +0000 (UTC) Subject: [Web-SIG] Proposal: wsgi.url_vars References: <453A5D92.4000603@colorstudy.com> Message-ID: On 21-10-2006, Ian Bicking wrote: > I think there's room for some more standards building on WSGI (that > aren't actually extensions of the WSGI spec itself). > > I put a page up on the wsgi.org site for this: > http://wsgi.org/wsgi/Specifications > > And I'm introducing what I think is low-hanging fruit in the > specification realm: wsgi.url_vars > http://wsgi.org/wsgi/Specifications/url_vars > > The spec is copied below for discussion: +1 for this kind of specs to make applications more independant of frameworks pieces. I hope you or somes others wsgi guru will also make somes proposals for session and cookies... -- William Dod? - http://flibuste.net From pje at telecommunity.com Sun Oct 22 13:40:07 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 22 Oct 2006 04:40:07 -0700 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <453A8B67.4070409@colorstudy.com> References: <453A8B67.4070409@colorstudy.com> Message-ID: <7.0.1.0.0.20061022042434.020fa938@telecommunity.com> At 02:04 PM 10/21/2006, Ian Bicking wrote: >I've added another spec to wsgi.org: >http://wsgi.org/wsgi/Specifications/handling_post_forms > >This one is a little more intrusive than wsgi.url_vars, but it addresses >an outstanding source of problems: contention over wsgi.input. -1 on this being middleware. If middleware wants to read the input, it should copy it to a temporary file or StringIO, not remove it. The broader principle here is that WSGI extensions should *add* to the WSGI specification, not subtract from it. Code running under middleware that does as you have proposed will be unable to use its own form processing or support nested applications. It's therefore not composable or further extensible, and I therefore have a hard time viewing the proposed middleware as being WSGI compliant. This is an extremely good example of something that belongs in a *library* and should not be done in middleware. Only end-application code that knows no further dispatching will occur is in a position to do destructive reading from wsgi.input. Middleware should be non-destructive, and should NOT be used where a library will suffice, since they add setup complexity and runtime performance overhead. The simple, standard way to do something like this would be to have a library routine like 'get_form_vars(environ)'. The routine would check for the form vars key, and if not present, then it would process the input and cache the information in the environment. It could even have an option to clone the input, in case the routine is being used from middleware. In general, where adding functionality doesn't require that the request or response be modified (as opposed to information simply being added to the environ), it should be done using library routines like this. There is no middleware setup or call-through overhead, and the calculation of additional environ entries only takes place if the information is actually used. There is also no need to use string constants as environ keys except in the routines themselves. This approach should be considered a best practice for *any* additions to the environ. From cmlenz at gmx.de Sun Oct 22 14:28:24 2006 From: cmlenz at gmx.de (Christopher Lenz) Date: Sun, 22 Oct 2006 14:28:24 +0200 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <453A5D92.4000603@colorstudy.com> References: <453A5D92.4000603@colorstudy.com> Message-ID: Am 21.10.2006 um 19:49 schrieb Ian Bicking: > If a dispatcher (like `routes `_ or > `selector `_) pulls named > information out of the portion of the request path it parses, it > can put > that information into a dictionary in ``environ['wsgi.url_vars']``. While I think this is a great idea in general, I don't like that this is limited to "named information". In the kind of dispatching I normally use, there's only one or maybe two parts of the URL that I want to receive as parameters. I like saving the overhead of making those named groups in regexes, and instead just use unnamed groups as positional arguments. So not supporting both positional and named arguments limits the usefulness of this specification IMHO. How about making 'wsgi.url_vars' a tuple of the form "(args, kwargs)" (the first a list or tuple, the second a dict)? Cheers, Chris -- Christopher Lenz cmlenz at gmx.de http://www.cmlenz.net/ From ianb at colorstudy.com Sun Oct 22 20:05:56 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 22 Oct 2006 13:05:56 -0500 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <7.0.1.0.0.20061022042434.020fa938@telecommunity.com> References: <453A8B67.4070409@colorstudy.com> <7.0.1.0.0.20061022042434.020fa938@telecommunity.com> Message-ID: <453BB304.1030108@colorstudy.com> Phillip J. Eby wrote: > At 02:04 PM 10/21/2006, Ian Bicking wrote: >> I've added another spec to wsgi.org: >> http://wsgi.org/wsgi/Specifications/handling_post_forms >> >> This one is a little more intrusive than wsgi.url_vars, but it addresses >> an outstanding source of problems: contention over wsgi.input. > > -1 on this being middleware. If middleware wants to read the input, it > should copy it to a temporary file or StringIO, not remove it. This isn't middleware, it's a suggestion of a library routine for reading POST form submissions. If multiple consumers use this same routine (or generally, the algorithm described) then they won't conflict. Copying to a StringIO or tempfile is possible, though it introduces a couple layers of indirection where it is likely none is needed. Potentially wsgi.input could be replaced with something that lazily serializes the parsed form back into an unparsed form; perhaps coupled with a monkeypatch on cgi that detects this case and also provides a shortcut. > The broader principle here is that WSGI extensions should *add* to the > WSGI specification, not subtract from it. Code running under middleware > that does as you have proposed will be unable to use its own form > processing or support nested applications. It's therefore not > composable or further extensible, and I therefore have a hard time > viewing the proposed middleware as being WSGI compliant. The status quo is that middleware or framework code that accesses POST vars are incompatible with any other middleware, framework code, or applications that also want to access POST vars. This does not subtract from WSGI, it enables a pattern that is currently problematic. It really is problematic, in that I've encountered this problem (contention over wsgi.input), and sometimes when I would like to access the POST vars in middleware I am currently unable to because it causes too many problems with code that comes later in the stack, or I am unable to because wsgi.input has already been consumed. > This is an extremely good example of something that belongs in a > *library* and should not be done in middleware. Only end-application > code that knows no further dispatching will occur is in a position to do > destructive reading from wsgi.input. Middleware should be > non-destructive, and should NOT be used where a library will suffice, > since they add setup complexity and runtime performance overhead. End application code knows no further dispatching will occur, but framework code does not know this. Typically it is a framework that parses the POST vars, not an application. > The simple, standard way to do something like this would be to have a > library routine like 'get_form_vars(environ)'. The routine would check > for the form vars key, and if not present, then it would process the > input and cache the information in the environment. It could even have > an option to clone the input, in case the routine is being used from > middleware. This is what paste.request.parse_formvars does -- I'm suggesting this standard so that all consumers, not just people using Paste, can be compatible with each other. > In general, where adding functionality doesn't require that the request > or response be modified (as opposed to information simply being added to > the environ), it should be done using library routines like this. There > is no middleware setup or call-through overhead, and the calculation of > additional environ entries only takes place if the information is > actually used. There is also no need to use string constants as environ > keys except in the routines themselves. This approach should be > considered a best practice for *any* additions to the environ. Reading from wsgi.input effectively does modify the request. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Sun Oct 22 20:07:17 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 22 Oct 2006 13:07:17 -0500 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: References: <453A5D92.4000603@colorstudy.com> Message-ID: <453BB355.3010909@colorstudy.com> William Dode wrote: > I hope you or somes others wsgi guru will also make somes proposals for > session and cookies... Ben Bangert made a proposal some time ago about session IDs. Maybe he'd like to resurrect that? I thought it was a good proposal. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Sun Oct 22 20:17:17 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 22 Oct 2006 13:17:17 -0500 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: References: <453A5D92.4000603@colorstudy.com> Message-ID: <453BB5AD.1030300@colorstudy.com> Christopher Lenz wrote: > Am 21.10.2006 um 19:49 schrieb Ian Bicking: >> If a dispatcher (like `routes `_ or >> `selector `_) pulls named >> information out of the portion of the request path it parses, it >> can put >> that information into a dictionary in ``environ['wsgi.url_vars']``. > > While I think this is a great idea in general, I don't like that this > is limited to "named information". > > In the kind of dispatching I normally use, there's only one or maybe > two parts of the URL that I want to receive as parameters. I like > saving the overhead of making those named groups in regexes, and > instead just use unnamed groups as positional arguments. > > So not supporting both positional and named arguments limits the > usefulness of this specification IMHO. How about making > 'wsgi.url_vars' a tuple of the form "(args, kwargs)" (the first a > list or tuple, the second a dict)? Hmm... so, a few things occur to me: 1. The dictionary could have integer keys like {1: arg1, 2: arg2}. This is hard to unpack. Eh, not a good idea I guess. 2. We use (args, kwargs). Frameworks can probably handle this just fine. Quite a few systems can't produce positional arguments, but that probably doesn't matter -- at least the end result is something like a Python function call, then there's usually a named equivalent to positional arguments. When it is exposed directly to the application it seems a little more awkward. For instance, I was thinking I'd add a req.url_vars to the request object that's just a proxy to environ['wsgi.url_vars']. But having that return a tuple isn't very convenient. I guess it could be req.url_vars and req.url_args, or something like that. It adds to the complexity some. 3. Anything can go under *some* name, so you could do {'args': (positional args)}. You'd still have to do a bit of unpacking if you had both positional and keyword arguments, but it would be fairly simple. We could come up with a convention for this that we document in the spec. I guess I didn't mention it in the spec, but I assumed that the dictionary would have only string keys (though I don't know if it matters), but the values could be of any type. E.g., /archive/2005/10/01 could create {'date': date(2005, 10, 1)}. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From joe at bitworking.org Mon Oct 23 21:34:07 2006 From: joe at bitworking.org (Joe Gregorio) Date: Mon, 23 Oct 2006 15:34:07 -0400 Subject: [Web-SIG] Proposal: wsgi.url_vars In-Reply-To: <453BB5AD.1030300@colorstudy.com> References: <453A5D92.4000603@colorstudy.com> <453BB5AD.1030300@colorstudy.com> Message-ID: <3f1451f50610231234vc315989t72c93e840ab72258@mail.gmail.com> On 10/22/06, Ian Bicking wrote: > Hmm... so, a few things occur to me: > > 1. The dictionary could have integer keys like {1: arg1, 2: arg2}. This > is hard to unpack. Eh, not a good idea I guess. Why not {"1":arg1, "2":arg2, } if all the arguments are positional? I think supporting both positional and keyword arguments mixed in the same request is a corner case not worth covering. -joe -- Joe Gregorio http://bitworking.org From ianb at colorstudy.com Tue Oct 24 00:39:42 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 23 Oct 2006 17:39:42 -0500 Subject: [Web-SIG] wsgi.url_vars feedback In-Reply-To: References: Message-ID: <453D44AE.3030604@colorstudy.com> Simon Willison wrote: > I've spotted a potential problem with your wsgi.url_vars specification > suggestion. > > http://wsgi.org/wsgi/Specifications/url_vars > > The spec calls for wsgi.url_vars to refer to a dictionary. In Django, we > originally required named captures in regular expressions - but > eventually realised that for many cases just having positional captures > was less work for developers and worked just as well. Here's some code I > wrote today: > > (r'^archive/(\d+)/(\d+)/(\d+)/(\w+)/$', blog.entry), > > def entry(request, year, month, day, slug): > # ... > > This form of URL variable extraction does not appear to be covered by > your wsgi.url_vars spec. One solution could be to extend the spec to > suggest using integer keys to represent this case? > > environ['wsgi.url_vars'] = { 1: '2006', 2: '06', 3: '12', 4: 'slug' } Christopher Lenz also brought this up. My inclination is something like: environ['wsgi.url_vars'] = {'__args__': ('2006', '06', '12', 'slug')} By using a tuple or list, you can be sure you don't have a sparse list, which probably isn't something any system is likely to handle. The double underscores kind of mark __args__ as a special kind of key, so it's less likely to overlap with a simple named key. Removing it from the dict or handling it is special; you don't have to look at all the keys to see if any are ints, you just test "'__args__' in url_vars". Would this satisfy everyone? -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From luke.arno at gmail.com Tue Oct 24 04:14:52 2006 From: luke.arno at gmail.com (Luke Arno) Date: Mon, 23 Oct 2006 22:14:52 -0400 Subject: [Web-SIG] wsgi.url_vars feedback In-Reply-To: <453D44AE.3030604@colorstudy.com> References: <453D44AE.3030604@colorstudy.com> Message-ID: On 10/23/06, Ian Bicking wrote: > Simon Willison wrote: > > I've spotted a potential problem with your wsgi.url_vars specification > > suggestion. > > > > http://wsgi.org/wsgi/Specifications/url_vars > > > > The spec calls for wsgi.url_vars to refer to a dictionary. In Django, we > > originally required named captures in regular expressions - but > > eventually realised that for many cases just having positional captures > > was less work for developers and worked just as well. Here's some code I > > wrote today: > > > > (r'^archive/(\d+)/(\d+)/(\d+)/(\w+)/$', blog.entry), > > > > def entry(request, year, month, day, slug): > > # ... > > > > This form of URL variable extraction does not appear to be covered by > > your wsgi.url_vars spec. One solution could be to extend the spec to > > suggest using integer keys to represent this case? > > > > environ['wsgi.url_vars'] = { 1: '2006', 2: '06', 3: '12', 4: 'slug' } > > Christopher Lenz also brought this up. My inclination is something like: > > environ['wsgi.url_vars'] = {'__args__': ('2006', '06', '12', 'slug')} > > By using a tuple or list, you can be sure you don't have a sparse list, > which probably isn't something any system is likely to handle. The > double underscores kind of mark __args__ as a special kind of key, so > it's less likely to overlap with a simple named key. Removing it from > the dict or handling it is special; you don't have to look at all the > keys to see if any are ints, you just test "'__args__' in url_vars". > > Would this satisfy everyone? Since numbers are not legal names for named groups anyway, why not just put them in the dict? It seems easier in more cases. Of course, this is more difficult in the case of handling an unknown number of positional args, but that case seems rather far off into the corner, no? (By the by, I consider named groups a best practice as there is a tighter dependency between code and URI with positional args. Sometimes it may not matter but...) Cheers, - Luke From h.then at pythea.nl Tue Oct 24 14:25:29 2006 From: h.then at pythea.nl (Hans Then) Date: Tue, 24 Oct 2006 12:25:29 -0000 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI Message-ID: <20061024104829.032CC1E4006@bag.python.org> Phillip, > -1 on this being middleware. If middleware wants to read the input, > it should copy it to a temporary file or StringIO, not remove it. > The simple, standard way to do something like this would be to have a > library routine like 'get_form_vars(environ)'. The routine would > check for the form vars key, and if not present, then it would > process the input and cache the information in the environment. It > could even have an option to clone the input, in case the routine is > being used from middleware. I think Ian's point is to standardise on a form key and on the interface of the form object. Your point is valid that middleware should not destructively read the wsgi.input variable. Many web applications will at some point call other web applications. It seems positively wasteful to have to clone and parse wsgi.input over and over again. It makes sense to do it once, in middleware, and then stuff it in a standard place in the wsgi environ. Would you +1 the proposal if it is added that middleware does not destroy the wsgi.input variable but clones it? Regards, Hans Then From ianb at colorstudy.com Tue Oct 24 17:25:53 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 24 Oct 2006 10:25:53 -0500 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <20061024104829.032CC1E4006@bag.python.org> References: <20061024104829.032CC1E4006@bag.python.org> Message-ID: <453E3081.7060205@colorstudy.com> Hans Then wrote: > Phillip, > >> -1 on this being middleware. If middleware wants to read the input, >> it should copy it to a temporary file or StringIO, not remove it. > >> The simple, standard way to do something like this would be to have a >> library routine like 'get_form_vars(environ)'. The routine would >> check for the form vars key, and if not present, then it would >> process the input and cache the information in the environment. It >> could even have an option to clone the input, in case the routine is >> being used from middleware. > > I think Ian's point is to standardise on a form key and on the interface of > the form object. Your point is valid that middleware should not > destructively read the wsgi.input variable. > > Many web applications will at some point call other web applications. It > seems positively wasteful to have to clone and parse wsgi.input over and > over again. It makes sense to do it once, in middleware, and then stuff it > in a standard place in the wsgi environ. Ideally I'm not expecting middleware to do this parsing (unless there's some good reason for the middleware to want the information). I'm suggesting a way the parsing can be done in a lazy fashion, but that one consumer doesn't get exclusive access to it. I also see this as a kind of prerequisite for supporting multiple request objects over the WSGI environment, as each object is going to want access to wsgi.input. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From pje at telecommunity.com Tue Oct 24 18:50:04 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 24 Oct 2006 12:50:04 -0400 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <20061024104829.032CC1E4006@bag.python.org> Message-ID: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com> At 12:25 PM 10/24/2006 +0000, Hans Then wrote: >Phillip, > > > -1 on this being middleware. If middleware wants to read the input, > > it should copy it to a temporary file or StringIO, not remove it. > > > The simple, standard way to do something like this would be to have a > > library routine like 'get_form_vars(environ)'. The routine would > > check for the form vars key, and if not present, then it would > > process the input and cache the information in the environment. It > > could even have an option to clone the input, in case the routine is > > being used from middleware. > >I think Ian's point is to standardise on a form key and on the interface of >the form object. Your point is valid that middleware should not >destructively read the wsgi.input variable. > >Many web applications will at some point call other web applications. It >seems positively wasteful to have to clone and parse wsgi.input over and >over again. It makes sense to do it once, in middleware, and then stuff it >in a standard place in the wsgi environ. Re-read what I wrote. If you have a common library routine, the parsing (and optional cloning) only happens *once*. If middleware needs access to the data, it can just call the library routine. This should NOT be implemented as middleware that adds the key; it's completely unnecessary. Middleware is only required for features that actually *modify* or *monitor* the request or response, as opposed to merely *adding* new request-side data derived from existing environ keys. If you want to improve the WSGI request API, the proper place to do so is by using library routines that cache their computations in the environ dictionary. In fact, there isn't even any technical need to "officially" standardize the environ keys for these functions. Just release libraries that have the features, so everyone can just install them. Then we won't all have five different libraries, each with its own routine just to do the same 'get_form_vars()' operation! Successful routines with sufficiently broad appeal and minimal impact could then be targeted for inclusion in later versions of wsgiref (and ultimately the stdlib). This seems to me the cleanest overall way to add API "friendliness" to WSGI. (We could even discuss such things in the form of proposed patches to the wsgiref code and documentation, then put them into the current wsgiref release.) >Would you +1 the proposal if it is added that middleware does not destroy >the wsgi.input variable but clones it? I didn't -1 the proposal, I -1'd middleware. And the -1 stands. Middleware is absolutely not the place for adding derivative environ keys like this. It's 100% unnecessary, adds complexity, and reduces performance in the process. From ianb at colorstudy.com Tue Oct 24 18:56:03 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 24 Oct 2006 11:56:03 -0500 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com> References: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com> Message-ID: <453E45A3.20700@colorstudy.com> Phillip J. Eby wrote: > At 12:25 PM 10/24/2006 +0000, Hans Then wrote: >> Phillip, >> >>> -1 on this being middleware. If middleware wants to read the input, >>> it should copy it to a temporary file or StringIO, not remove it. >>> The simple, standard way to do something like this would be to have a >>> library routine like 'get_form_vars(environ)'. The routine would >>> check for the form vars key, and if not present, then it would >>> process the input and cache the information in the environment. It >>> could even have an option to clone the input, in case the routine is >>> being used from middleware. >> I think Ian's point is to standardise on a form key and on the interface of >> the form object. Your point is valid that middleware should not >> destructively read the wsgi.input variable. >> >> Many web applications will at some point call other web applications. It >> seems positively wasteful to have to clone and parse wsgi.input over and >> over again. It makes sense to do it once, in middleware, and then stuff it >> in a standard place in the wsgi environ. > > Re-read what I wrote. If you have a common library routine, the parsing > (and optional cloning) only happens *once*. If middleware needs access to > the data, it can just call the library routine. > > This should NOT be implemented as middleware that adds the key; it's > completely unnecessary. Middleware is only required for features that > actually *modify* or *monitor* the request or response, as opposed to > merely *adding* new request-side data derived from existing environ > keys. If you want to improve the WSGI request API, the proper place to do > so is by using library routines that cache their computations in the > environ dictionary. > > In fact, there isn't even any technical need to "officially" standardize > the environ keys for these functions. Just release libraries that have the > features, so everyone can just install them. Then we won't all have five > different libraries, each with its own routine just to do the same > 'get_form_vars()' operation! > > Successful routines with sufficiently broad appeal and minimal impact could > then be targeted for inclusion in later versions of wsgiref (and ultimately > the stdlib). This seems to me the cleanest overall way to add API > "friendliness" to WSGI. > > (We could even discuss such things in the form of proposed patches to the > wsgiref code and documentation, then put them into the current wsgiref > release.) That would be a landing place for an implementation of this library code that does what the spec implies. But it relies on the release cycle for wsgiref, which is unclear and probably very slow since it is in the stdlib. I have nothing against this being in wsgiref, I just would like to use this convention sooner rather than later. >> Would you +1 the proposal if it is added that middleware does not destroy >> the wsgi.input variable but clones it? > > I didn't -1 the proposal, I -1'd middleware. And the -1 > stands. Middleware is absolutely not the place for adding derivative > environ keys like this. It's 100% unnecessary, adds complexity, and > reduces performance in the process. Please respond to my proposal, which as I've clarified does not imply any particular middleware. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From pje at telecommunity.com Tue Oct 24 18:56:41 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 24 Oct 2006 12:56:41 -0400 Subject: [Web-SIG] wsgi.url_vars feedback In-Reply-To: <453D44AE.3030604@colorstudy.com> References: Message-ID: <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com> At 05:39 PM 10/23/2006 -0500, Ian Bicking wrote: >By using a tuple or list, you can be sure you don't have a sparse list, >which probably isn't something any system is likely to handle. The >double underscores kind of mark __args__ as a special kind of key, so >it's less likely to overlap with a simple named key. Removing it from >the dict or handling it is special; you don't have to look at all the >keys to see if any are ints, you just test "'__args__' in url_vars". > >Would this satisfy everyone? Call it "wsgi.url_args", and make it a two-item tuple: *args, **kw. That's far simpler than any of the wacky encodings proposed so far, and can be used to invoke a function directly, e.g.: apply(f, *environ['wsgi.url_args']) or, less cleverly (i.e. more readably): args, kw = environ['wsgi.url_args'] f(*args, **kw) From pje at telecommunity.com Tue Oct 24 19:52:54 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 24 Oct 2006 13:52:54 -0400 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <453E45A3.20700@colorstudy.com> References: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com> <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20061024133724.02da8130@sparrow.telecommunity.com> At 11:56 AM 10/24/2006 -0500, Ian Bicking wrote: >That would be a landing place for an implementation of this library code >that does what the spec implies. But it relies on the release cycle for >wsgiref, which is unclear and probably very slow since it is in the stdlib. Not really. wsgiref is distributed standalone from the cheeseshop, so newer versions are just an easy_install away. >I have nothing against this being in wsgiref, I just would like to use >this convention sooner rather than later. Of course; the wsgiref thing was just a suggestion for where canonical implementations of these things would live. >>>Would you +1 the proposal if it is added that middleware does not destroy >>>the wsgi.input variable but clones it? >>I didn't -1 the proposal, I -1'd middleware. And the -1 >>stands. Middleware is absolutely not the place for adding derivative >>environ keys like this. It's 100% unnecessary, adds complexity, and >>reduces performance in the process. > >Please respond to my proposal, which as I've clarified does not imply any >particular middleware. You should clarify that in the proposal itself, explicitly forbidding it from being done by middleware unless the middleware is taking responsibility for request processing, or the middleware clones the environ. Too many people, upon first encountering WSGI middleware, want to use it to add things to the request API, when it isn't necessary. Notice Hans Then's reaction to my -1 on middleware, for example. Writing correct middleware is already difficult, let's not encourage people to write more incorrect middleware by increasing the temptation to use middleware for trivial API enhancements that would be better done as libraries. (Yes, I know that wasn't your intent, but at least one person besides me interpreted it as such.) As far as the other open issues in the proposal, I don't really care much. My main concern is making sure that the proposal doesn't encourage people to start creating middleware whose sole purpose is to add unnecessary junk to environ while breaking other applications as a side effect. :) (I do suggest, however, that a simpler way to assure WSGI compliance when removing wsgi.input may be to set the incoming content-length to zero. An application or library that tries to read wsgi.input when the content length is zero is itself non-compliant.) From ianb at colorstudy.com Tue Oct 24 20:14:36 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 24 Oct 2006 13:14:36 -0500 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <5.1.1.6.0.20061024133724.02da8130@sparrow.telecommunity.com> References: <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com> <5.1.1.6.0.20061024123232.026adef0@sparrow.telecommunity.com> <5.1.1.6.0.20061024133724.02da8130@sparrow.telecommunity.com> Message-ID: <453E580C.8090503@colorstudy.com> Phillip J. Eby wrote: >>>> Would you +1 the proposal if it is added that middleware does not >>>> destroy >>>> the wsgi.input variable but clones it? >>> I didn't -1 the proposal, I -1'd middleware. And the -1 stands. >>> Middleware is absolutely not the place for adding derivative environ >>> keys like this. It's 100% unnecessary, adds complexity, and reduces >>> performance in the process. >> >> Please respond to my proposal, which as I've clarified does not imply >> any particular middleware. > > You should clarify that in the proposal itself, explicitly forbidding it > from being done by middleware unless the middleware is taking > responsibility for request processing, or the middleware clones the > environ. Too many people, upon first encountering WSGI middleware, want > to use it to add things to the request API, when it isn't necessary. > Notice Hans Then's reaction to my -1 on middleware, for example. OK, I'll clarify this. Not that it's *horrible* that someone use this library function in middleware, but only if there's some reason specific to the middleware that they want to look at the POSTed form. Middleware is a very vague concept, really, as anything that can forward the request onto another WSGI app is middleware, but many such things are themselves full applications. (The specific example where this first really started bugging me was in paste.evalexception, which really is a bit of both application and middleware.) But I will note that you should not parse the form unless you actually want it, not just so that it will show up in a parsed form for a later consumer. > Writing correct middleware is already difficult, let's not encourage > people to write more incorrect middleware by increasing the temptation > to use middleware for trivial API enhancements that would be better done > as libraries. (Yes, I know that wasn't your intent, but at least one > person besides me interpreted it as such.) > > As far as the other open issues in the proposal, I don't really care > much. My main concern is making sure that the proposal doesn't > encourage people to start creating middleware whose sole purpose is to > add unnecessary junk to environ while breaking other applications as a > side effect. :) > > (I do suggest, however, that a simpler way to assure WSGI compliance > when removing wsgi.input may be to set the incoming content-length to > zero. An application or library that tries to read wsgi.input when the > content length is zero is itself non-compliant.) That seems like an inaccurate representation of the request itself, and likely to cover up problems. If you want to look at the POSTed form, and you aren't aware of this convention and the environment key, then there's an unresolvable error. So it should just produce an exception; if you set CONTENT_LENGTH to 0 then the consumer will just happily assume there is no data, which leads to incorrect behavior. I won't be able to update the spec right away, but when I get a chance I will do so. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From foom at fuhm.net Thu Oct 26 10:26:28 2006 From: foom at fuhm.net (James Y Knight) Date: Thu, 26 Oct 2006 04:26:28 -0400 Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility In-Reply-To: References: <451D1D22.5090607@openapp.biz> Message-ID: On Sep 29, 2006, at 3:31 PM, Guido van Rossum wrote: > On 9/29/06, Michael Kerrin wrote: >> But the current implementation of cgi.FieldStorage in the 2.4.4 >> branch >> and on Python 2.5 does call readline with the size argument. It has >> started to do this in response to the Python bug #1112549 - >> cgi.FieldStorage memory usage can spike in line-oriented ops. See >> http://sourceforge.net/tracker/index.php? >> func=detail&aid=1112549&group_id=5470&atid=105470 >> >> Since it is reasonable for a WSGI application to use >> cgi.FieldStorage >> I am wondering whether cgi.FieldStorage or the WSGI specification >> needs >> to changed in order to solve this incompatibility. >> >> Originally I thought it was cgi.FieldStorage that needs to be >> changed, >> and hence tried to fix it by wrapping the input stream so that the >> readline method always uses the read method on the input stream. >> While >> this seems to work for me it introduces a level of complexity in the >> cgi.py file, and possible some other bugs, that makes me think that >> adding the size argument for readline into the WSGI specification >> isn't >> such bad idea after all. > > Since that change to cgi.py was a security fix I would strongly > recommend not to remove it and to change the WSGI spec instead. Given that this change is now part of python 2.4.4 and python 2.5, it seems to me it is now a defacto requirement that all WSGI server implementations must support readline with a size argument in order to run any interesting software, despite the spec explicitly saying that you shouldn't. I suspect simply modifying the spec to follow the current reality would be the least bad option. But this kind of destabilizing breakage really shouldn't be allowed to happen again. Once the error was discovered, the cgi.py change should have been immediately reverted until either a decision was made to change the WSGI spec, or else the change fixed to not break WSGI compliant servers. This limbo situation is pretty bad. James From jim at zope.com Thu Oct 26 13:14:15 2006 From: jim at zope.com (Jim Fulton) Date: Thu, 26 Oct 2006 07:14:15 -0400 Subject: [Web-SIG] WSGI, cgi.FieldStorage incompatibility In-Reply-To: References: <451D1D22.5090607@openapp.biz> Message-ID: <45409887.5020609@zope.com> James Y Knight wrote: > On Sep 29, 2006, at 3:31 PM, Guido van Rossum wrote: > >> On 9/29/06, Michael Kerrin wrote: >>> But the current implementation of cgi.FieldStorage in the 2.4.4 >>> branch >>> and on Python 2.5 does call readline with the size argument. It has >>> started to do this in response to the Python bug #1112549 - >>> cgi.FieldStorage memory usage can spike in line-oriented ops. See >>> http://sourceforge.net/tracker/index.php? >>> func=detail&aid=1112549&group_id=5470&atid=105470 >>> >>> Since it is reasonable for a WSGI application to use >>> cgi.FieldStorage >>> I am wondering whether cgi.FieldStorage or the WSGI specification >>> needs >>> to changed in order to solve this incompatibility. >>> >>> Originally I thought it was cgi.FieldStorage that needs to be >>> changed, >>> and hence tried to fix it by wrapping the input stream so that the >>> readline method always uses the read method on the input stream. >>> While >>> this seems to work for me it introduces a level of complexity in the >>> cgi.py file, and possible some other bugs, that makes me think that >>> adding the size argument for readline into the WSGI specification >>> isn't >>> such bad idea after all. >> Since that change to cgi.py was a security fix I would strongly >> recommend not to remove it and to change the WSGI spec instead. > > Given that this change is now part of python 2.4.4 and python 2.5, it > seems to me it is now a defacto requirement that all WSGI server > implementations must support readline with a size argument in order > to run any interesting software, despite the spec explicitly saying > that you shouldn't. I suspect simply modifying the spec to follow the > current reality would be the least bad option. Yes and updating the server implementations, of course, where necessary. > But this kind of destabilizing breakage really shouldn't be allowed > to happen again. Once the error was discovered, the cgi.py change > should have been immediately reverted until either a decision was > made to change the WSGI spec, or else the change fixed to not break > WSGI compliant servers. This limbo situation is pretty bad. Agreed. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From ianb at colorstudy.com Tue Oct 31 20:12:05 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 31 Oct 2006 13:12:05 -0600 Subject: [Web-SIG] Proposal: Handling POST forms in WSGI In-Reply-To: <45422555.9020904@doxdesk.com> References: <453A8B67.4070409@colorstudy.com> <45422555.9020904@doxdesk.com> Message-ID: <4547A005.1010208@colorstudy.com> (Copied back to the list) Andrew Clover wrote: > Ian Bicking wrote: > > > When this happens, the form can be parsed by ``cgi.FieldStorage``. > > Agree with the objections others have posted. > > There are many alternative things one might want to do with the body > that don't involve the cgi module (which is old, frequently inconvenient > and offers poor performance in some areas). Please leave the decision on > what to do with the contents of wsgi.input to the discretion of a > higher-level framework/middleware component. This does not require anyone to use the cgi module. This addresses what you can do when you do use the cgi module (which realistically is what everyone does -- I've literally never seen an exception, though I imagine there are one or two somewhere). It needs to be clarified that parsing should still be done lazily and deferred as long as possible, but when it doesn't get deferred this offers a simple solution for later consumers that also use cgi.FieldStorage. > I have more sympathy for the idea of keeping a copy of the entire POST > request so it can be read again (eg. by having a component that consumes > wsgi.input replace it with a StringIO returning the same content). > However I don't see *mandating* this as a good move, given that a POST > can contain multimegabyte file uploads. Keeping the POST request feels heavy considering it usually isn't needed. The proposal requires very little overhead. > How about asking that something that consumes wsgi.input replace it with > either: > > - the original stream seek()ed to 0, if possible; Possible; depends on the deployment and the middleware involved. Requiring seek to work means that code will only work in particular deployments. > - a new streamlike echoing the post request; This would be nice, and would allow for smart intermediaries to be compatible with dumb consumers, and potentially smart consumers could skip reparsing without any overhead. I don't have any code to do this. If code to do this emerged, it would be very reasonable to InputProcessed in the spec with this better implementation. Note of course if you parse with cgi, throw the original body away, then recreate the body from the FieldStorage object, it is unlikely that you can improve on the cgi module in any way. Only if you get first crack at wsgi.input can you improve it. > - or a None or dummy stream which will guarantee a quick exception > if the stream is re-read later, rather than just mysteriously > blocking forever? This is part of my proposal. -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From ianb at colorstudy.com Tue Oct 31 23:17:42 2006 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 31 Oct 2006 16:17:42 -0600 Subject: [Web-SIG] wsgi.url_vars feedback In-Reply-To: <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com> References: <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com> Message-ID: <4547CB86.5000803@colorstudy.com> Phillip J. Eby wrote: > At 05:39 PM 10/23/2006 -0500, Ian Bicking wrote: >> By using a tuple or list, you can be sure you don't have a sparse list, >> which probably isn't something any system is likely to handle. The >> double underscores kind of mark __args__ as a special kind of key, so >> it's less likely to overlap with a simple named key. Removing it from >> the dict or handling it is special; you don't have to look at all the >> keys to see if any are ints, you just test "'__args__' in url_vars". >> >> Would this satisfy everyone? > > Call it "wsgi.url_args", and make it a two-item tuple: *args, **kw. > That's far simpler than any of the wacky encodings proposed so far, and > can be used to invoke a function directly, e.g.: > > apply(f, *environ['wsgi.url_args']) > > or, less cleverly (i.e. more readably): > > args, kw = environ['wsgi.url_args'] > f(*args, **kw) Having thought about it, I think storing a tuple of (args, kwargs) is the best way to do this, since it's most explicit. Consumers can deal with args specially, ignore them, or raise an error, as they see fit -- there are reasons to do each of these. Hiding args in kwargs makes this choice more implicit, and probably more error prone as a result. One little question: if a dispatcher can never produce one of the kinds of information (which happens for some of them), should they put in an empty list/tuple or empty dict, or should they put in None for that item? I'm currently saying they must put in a list/tuple or dict. Anyway, I've updated the spec: http://wsgi.org/wsgi/Specifications/url_vars http://wsgi.org/wsgi/Specifications/url_vars?action=diff Is everyone happy with this version? -- Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org From fumanchu at amor.org Tue Oct 31 23:35:07 2006 From: fumanchu at amor.org (Robert Brewer) Date: Tue, 31 Oct 2006 14:35:07 -0800 Subject: [Web-SIG] wsgi.url_vars feedback Message-ID: <435DF58A933BA74397B42CDEB8145A86064953D9@ex9.hostedexchange.local> Ian Bicking wrote: > Having thought about it, I think storing a tuple of > (args, kwargs) is the best way to do this, since it's > most explicit. Consumers can deal with args specially, > ignore them, or raise an error, as they see fit -- > there are reasons to do each of these. Hiding args > in kwargs makes this choice more implicit, and probably > more error prone as a result. > > One little question: if a dispatcher can never produce > one of the kinds of information (which happens for some > of them), should they put in an empty list/tuple or > empty dict, or should they put in None for that item? > I'm currently saying they must put in a list/tuple or dict. I would've thought they'd just leave out the entry altogether. Robert Brewer System Architect Amor Ministries fumanchu at amor.org From pje at telecommunity.com Tue Oct 31 23:48:01 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 31 Oct 2006 17:48:01 -0500 Subject: [Web-SIG] wsgi.url_vars feedback In-Reply-To: <4547CB86.5000803@colorstudy.com> References: <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com> <5.1.1.6.0.20061024125257.02d04008@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20061031174422.026b9108@sparrow.telecommunity.com> At 04:17 PM 10/31/2006 -0600, Ian Bicking wrote: >One little question: if a dispatcher can never produce one of the kinds of >information (which happens for some of them), should they put in an empty >list/tuple or empty dict, or should they put in None for that item? I'm >currently saying they must put in a list/tuple or dict. This is the correct choice, IMO. I think the spec should be explicit, however, that these values should be usable with * and ** (or apply()), as that will help clarify the meaning/rationale of the values. >Anyway, I've updated the spec: > >http://wsgi.org/wsgi/Specifications/url_vars >http://wsgi.org/wsgi/Specifications/url_vars?action=diff > >Is everyone happy with this version? I still think it should be url_args rather than url_vars -- I don't see any reason why they could be considered "variables" rather than arguments. :) But other than that, and the desire to see clarification about */** as an intended/supported use case, I give it a +1.