From casey at zope.com Mon Dec 1 15:18:33 2003 From: casey at zope.com (Casey Duncan) Date: Mon Dec 1 15:21:47 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike [was: Re: Python version...] In-Reply-To: References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> Message-ID: <20031201151833.4b9004fa.casey@zope.com> On Sun, 30 Nov 2003 21:13:54 +0000 (GMT) John J Lee wrote: > On Sun, 30 Nov 2003, Stuart Langridge wrote: > > John J Lee spoo'd forth: > [...] > > > Is this aimed at the standard library? xml.dom.ext.reader.HtmlLib? > [...] > > Um. What I was looking for was something that could parse HTML > > (including invalid HTML) and give me a DOM tree. I tried Twisted's > > Fine, but what we're talking about here is what should go into Python's > standard library. > > [...] > > I think > > that a DOM parser for HTML is pretty important, even if that parser > > *actually* just does "convert broken HTML to valid XHTML and then feed > > it to minidom" or something similar. Are there any others? > > There are lots of XML DOM implementations for Python (only one HTML DOM > implementation, though: 4DOM -- and that's out of date), including the one > that's already in the standard library. Parsing arbitrary HTML is hard, > though (xml.dom.ext.reader.HtmlLib doesn't even manage to generate an HTML > DOM from arbitrary *correct* HTML, and correct HTML is not often seen in > the wild ;-). tidylib is the only sane way I know of. See below. Hmmm, it sounds to me like implementing/updating the HTML parsing built into python is something worth considering if it blocks several other possible initiatives. HTML may be on the way out, but I think we're stuck with it for the forseeable future. -Casey (running away ;^) From jjl at pobox.com Mon Dec 1 15:55:47 2003 From: jjl at pobox.com (John J Lee) Date: Mon Dec 1 15:55:58 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <20031201151833.4b9004fa.casey@zope.com> References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> Message-ID: On Mon, 1 Dec 2003, Casey Duncan wrote: [...] > Hmmm, it sounds to me like implementing/updating the HTML parsing built > into python is something worth considering if it blocks several other > possible initiatives. [...] Problems: 1. no volunteer to write a plain-old-C-API wrapper of tidylib 2. tidylib was still a moving target last time I looked (maybe by the time 2.4 comes out, it will have settled down)... John From manfred.stienstra at dwerg.net Mon Dec 1 17:28:41 2003 From: manfred.stienstra at dwerg.net (Manfred Stienstra) Date: Mon Dec 1 17:29:48 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> Message-ID: <1070317721.2301.7.camel@ack.dwerg.net> On Mon, 2003-12-01 at 21:55, John J Lee wrote: > 1. no volunteer to write a plain-old-C-API wrapper of tidylib Tidylib is written in C. http://tidy.sourceforge.net/libintro.html Manfred From casey at zope.com Mon Dec 1 17:36:27 2003 From: casey at zope.com (Casey Duncan) Date: Mon Dec 1 17:39:46 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> Message-ID: <20031201173627.741b88f9.casey@zope.com> On Mon, 1 Dec 2003 20:55:47 +0000 (GMT) John J Lee wrote: > On Mon, 1 Dec 2003, Casey Duncan wrote: > [...] > > Hmmm, it sounds to me like implementing/updating the HTML parsing built > > into python is something worth considering if it blocks several other > > possible initiatives. > [...] > > Problems: > > 1. no volunteer to write a plain-old-C-API wrapper of tidylib I'll look into this, but I'll hold off volunteering until I see how big the API is. I suspect not very. -Casey From casey at zope.com Tue Dec 2 00:11:12 2003 From: casey at zope.com (Casey Duncan) Date: Tue Dec 2 00:12:23 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike References: <3FC4D804.70201@bath.ac.uk><3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> Message-ID: <001601c3b892$b4a00db0$6401a8c0@khatru> > On Mon, 1 Dec 2003 20:55:47 +0000 (GMT) > John J Lee wrote: [snip] > > Problems: > > > > 1. no volunteer to write a plain-old-C-API wrapper of tidylib > > I'll look into this, but I'll hold off volunteering until I see how big the API is. I suspect not very. After looking at it I'd say it's certainly a non-trivial task to wrap (by hand), depending on what the real needs are. Do we simply want a 1-to-1 (perhaps swigged) wrapper, do we want something pythonic, or what? The latter is obviously more involved and would need much more discussion and vetting, especially given its DOM-ish aspirations. Perhaps the most reasonable approach would be to generate a simple low-level wrapper first and then gradually develop a high-level interface to it, mostly written in Python. That might also insulate us from future API changes to tidy better. -Casey From sholden at holdenweb.com Tue Dec 2 07:39:08 2003 From: sholden at holdenweb.com (Steve Holden) Date: Tue Dec 2 07:43:51 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <001601c3b892$b4a00db0$6401a8c0@khatru> Message-ID: > -----Original Message----- > From: web-sig-bounces+sholden=holdenweb.com@python.org > [mailto:web-sig-bounces+sholden=holdenweb.com@python.org]On Behalf Of > Casey Duncan > Sent: Tuesday, December 02, 2003 12:11 AM > To: Casey Duncan; John J Lee > Cc: web-sig@python.org > Subject: Re: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike > > > > On Mon, 1 Dec 2003 20:55:47 +0000 (GMT) > > John J Lee wrote: > [snip] > > > Problems: > > > > > > 1. no volunteer to write a plain-old-C-API wrapper of tidylib > > > > I'll look into this, but I'll hold off volunteering until I > see how big > the API is. I suspect not very. > > After looking at it I'd say it's certainly a non-trivial task > to wrap (by > hand), depending on what the real needs are. Do we simply > want a 1-to-1 > (perhaps swigged) wrapper, do we want something pythonic, or what? The > latter is obviously more involved and would need much more > discussion and > vetting, especially given its DOM-ish aspirations. > > Perhaps the most reasonable approach would be to generate a > simple low-level > wrapper first and then gradually develop a high-level interface to it, > mostly written in Python. That might also insulate us from future API > changes to tidy better. > I think we also want to consider seriously whether tidy is what we need. Does it really provide a necessary function? And, even if it does, how valuable would that function be? I wasn't impressed with tidy in either of the two attempts I made to use it. Then, of course, there's the question of prior art: http://www.lemburg.com/files/python/mxTidy.html might be worth looking at before you go too much further ... regards -- Steve Holden +1 703 278 8281 http://www.holdenweb.com/ Improve the Internet http://vancouver-webpages.com/CacheNow/ Python Web Programming http://pydish.holdenweb.com/pwp/ Interview with GvR August 14, 2003 http://www.onlamp.com/python/ From casey at zope.com Tue Dec 2 09:16:52 2003 From: casey at zope.com (Casey Duncan) Date: Tue Dec 2 09:19:29 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <20031201173627.741b88f9.casey@zope.com> References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> Message-ID: <20031202091652.18f2daea.casey@zope.com> On Mon, 1 Dec 2003 17:36:27 -0500 Casey Duncan wrote: [snip] > I'll look into this, but I'll hold off volunteering until I see how big the API is. I suspect not very. After looking at it I think it is reasonable to wrap. It looks to be designed with that in mind. Ironically it seems that mxTidy was an inspiration for tidylib, so wrapping it will bring it full circle. I see the process going in two phases: 1. A low-level wrapper that exposes the C API directly, with only small pythonifications, like proper exception handling, simple type mapping, etc. 2. A high-level OO API specifically designed for use with Python. I volunteer for phase 1. Actually I will do a phase 0 first which will just be stupid wrapper that exposes the API and nothing else. From there we can discuss what needs to be done to complete phase 1. This looks like a good job for SWIG, does anyone oppose using it? -Casey From aquarius-lists at kryogenix.org Tue Dec 2 09:54:22 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Tue Dec 2 09:52:07 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike References: Message-ID: Steve Holden spoo'd forth: > I think we also want to consider seriously whether tidy is what we need. > Does it really provide a necessary function? And, even if it does, how > valuable would that function be? I wasn't impressed with tidy in either > of the two attempts I made to use it. I don't see that tidy's ability to tidy HTML per se is useful, but I think that it's very useful in that it can take invalid HTML and convert it to valid XHTML. That way, we can get a DOM tree from invalid HTML, which is very useful... sil -- "Willow hath gat hare off rede And doth geev soopurb heede. Buffy, as written by Geoffrey Chaucer, the dirty mediaeval git." -- Andy Spencer, after Certic From cs1spw at bath.ac.uk Tue Dec 2 11:07:19 2003 From: cs1spw at bath.ac.uk (Simon Willison) Date: Tue Dec 2 11:07:24 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: References: Message-ID: <3FCCB8B7.8070102@bath.ac.uk> Stuart Langridge wrote: > I don't see that tidy's ability to tidy HTML per se is useful, but I > think that it's very useful in that it can take invalid HTML and > convert it to valid XHTML. That way, we can get a DOM tree from invalid > HTML, which is very useful... Is there any way we could get a DOM tree from invalid HTML using pure Python tools? The HTML tools in the Python standard library at the moment are all pure Python. Could we even use the existing sgmllib module (or an extension of it) to create our own DOM tree from invalid HTML? From aquarius-lists at kryogenix.org Tue Dec 2 11:13:22 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Tue Dec 2 11:11:07 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike References: <3FCCB8B7.8070102@bath.ac.uk> Message-ID: Simon Willison spoo'd forth: > Stuart Langridge wrote: >> I don't see that tidy's ability to tidy HTML per se is useful, but I >> think that it's very useful in that it can take invalid HTML and >> convert it to valid XHTML. That way, we can get a DOM tree from invalid >> HTML, which is very useful... > > Is there any way we could get a DOM tree from invalid HTML using pure > Python tools? The HTML tools in the Python standard library at the > moment are all pure Python. Could we even use the existing sgmllib > module (or an extension of it) to create our own DOM tree from invalid HTML? Presumably we could (the existing things, like HtmlLib or microdom do it); I was just thinking of not having to implement it if we didn't have to :) I'm not all that hot on sgmllib, either -- parsing invalid HTML strikes me as being pretty hard, since browsers have to try hard to do it. I don't know, however, if the hard thing is *displaying* it right rather than just *parsing* it. Thought: Grail was a browser, so it might have done it? sil -- 2. Make it halfway normal. I don't have any use for laser-beam-shooting pocket combs, or non-existent existents existing within their own existences, or ballpoint pens made out of lettuce. -- CardinalT dictates rules for the raif Silly Game From casey at zope.com Tue Dec 2 11:58:46 2003 From: casey at zope.com (Casey Duncan) Date: Tue Dec 2 12:02:12 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <3FCCB8B7.8070102@bath.ac.uk> References: <3FCCB8B7.8070102@bath.ac.uk> Message-ID: <20031202115846.52a37f1a.casey@zope.com> On Tue, 02 Dec 2003 10:07:19 -0600 Simon Willison wrote: > Stuart Langridge wrote: > > I don't see that tidy's ability to tidy HTML per se is useful, but I > > think that it's very useful in that it can take invalid HTML and > > convert it to valid XHTML. That way, we can get a DOM tree from invalid > > HTML, which is very useful... > > Is there any way we could get a DOM tree from invalid HTML using pure > Python tools? The HTML tools in the Python standard library at the > moment are all pure Python. Could we even use the existing sgmllib > module (or an extension of it) to create our own DOM tree from invalid HTML? According to the docs, tidylib exposes a DOM-like interface for walking the document tree of documents it has parsed. My understanding is that this is designed to work for broken HTML up to valid XHTML. If it works as advertised, it could be a good engine to put behind a nice python api. See: http://tidy.sourceforge.net/docs/api/group__Tree.html The API gets a bit verbose in places (separate functions to test for each tag and attribute type). These look like compliments to the generic functions, perhaps to avoid putting too much HTML knowledge directly in the user code. Also, tidylib's memory allocation is hookable, in case we wanted to use Python's malloc/free (not sure whether we need to). -Casey From jjl at pobox.com Tue Dec 2 14:35:36 2003 From: jjl at pobox.com (John J Lee) Date: Tue Dec 2 14:35:44 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <3FCCB8B7.8070102@bath.ac.uk> References: <3FCCB8B7.8070102@bath.ac.uk> Message-ID: On Tue, 2 Dec 2003, Simon Willison wrote: [...] > Is there any way we could get a DOM tree from invalid HTML using pure > Python tools? The HTML tools in the Python standard library at the [...] No chance. A lot of work has gone into HTMLTidy / tidylib, reimplementing it would be a lot of work for little benefit. John From jjl at pobox.com Tue Dec 2 14:37:45 2003 From: jjl at pobox.com (John J Lee) Date: Tue Dec 2 14:37:52 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: References: <3FCCB8B7.8070102@bath.ac.uk> Message-ID: On Tue, 2 Dec 2003, Stuart Langridge wrote: > Simon Willison spoo'd forth: [...] > > Is there any way we could get a DOM tree from invalid HTML using pure > > Python tools? The HTML tools in the Python standard library at the [...] > Presumably we could (the existing things, like HtmlLib or microdom do > it); [...] No, they don't. There's a whole wonderful world of invalid HTML out there, that sgmllib and xml.dom.ext.reader.HtmlLib know nothing about. John From jjl at pobox.com Tue Dec 2 14:39:10 2003 From: jjl at pobox.com (John J Lee) Date: Tue Dec 2 14:39:17 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: References: Message-ID: [...] > > wrapper first and then gradually develop a high-level interface to it, > > mostly written in Python. That might also insulate us from future API > > changes to tidy better. > > > I think we also want to consider seriously whether tidy is what we need. > Does it really provide a necessary function? And, even if it does, how > valuable would that function be? Parsing arbitrary (including broken) HTML reliably. Processing that HTML with XML tools. Whether that's "necessary" or valuable is a matter for debate, obviously. > I wasn't impressed with tidy in either > of the two attempts I made to use it. > > Then, of course, there's the question of prior art: > > http://www.lemburg.com/files/python/mxTidy.html > > might be worth looking at before you go too much further ... mxTidy and tidylib are based on the same code (HTMLTidy). tidylib is being actively maintained (though that may be a mixed blessing, depending on the relative proportions of old and newly-introduced bugs). John From jjl at pobox.com Tue Dec 2 14:44:10 2003 From: jjl at pobox.com (John J Lee) Date: Tue Dec 2 14:44:16 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <20031202091652.18f2daea.casey@zope.com> References: <3FC4D804.70201@bath.ac.uk> <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> <20031202091652.18f2daea.casey@zope.com> Message-ID: On Tue, 2 Dec 2003, Casey Duncan wrote: [...] > 1. A low-level wrapper that exposes the C API directly, with only small > pythonifications, like proper exception handling, simple type mapping, > etc. > > 2. A high-level OO API specifically designed for use with Python. > > I volunteer for phase 1. Actually I will do a phase 0 first which will > just be stupid wrapper that exposes the API and nothing else. From there > we can discuss what needs to be done to complete phase 1. Great! Maybe it's worth bouncing the idea off python-dev first, though, in case it gets ruled out by the BDFL (unlikely, I suspect, but I don't know). Unless you want it regardless of whether it's in the library, of course. > This looks like a good job for SWIG, does anyone oppose using it? That sounds like another question for python-dev. John From gward at python.net Tue Dec 2 22:28:03 2003 From: gward at python.net (Greg Ward) Date: Tue Dec 2 22:28:08 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <20031202091652.18f2daea.casey@zope.com> References: <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> <20031202091652.18f2daea.casey@zope.com> Message-ID: <20031203032803.GA2473@cthulhu.gerg.ca> On 02 December 2003, Casey Duncan said: > I volunteer for phase 1. Actually I will do a phase 0 first which will > just be stupid wrapper that exposes the API and nothing else. From > there we can discuss what needs to be done to complete phase 1. > > This looks like a good job for SWIG, does anyone oppose using it? Note that the current Berkeley DB wrapper did not get into the standard library until AMK rewrote it from hand with no hint of SWIG. (And even then, it took a year or two before the bsddb in 2.3 got in.) As I recall, there were Serious Reservations about the quality of code generated by SWIG. Grovel through the python-dev archives for more. If SWIG has changed much since then, it might be worth revisiting -- but I suspect you'd have a selling job to do to get SWIGged code past python-dev. Greg -- Greg Ward http://www.gerg.ca/ Don't hate yourself in the morning -- sleep till noon. From casey at zope.com Tue Dec 2 23:03:03 2003 From: casey at zope.com (Casey Duncan) Date: Tue Dec 2 23:05:20 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <20031203032803.GA2473@cthulhu.gerg.ca> References: <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> <20031202091652.18f2daea.casey@zope.com> <20031203032803.GA2473@cthulhu.gerg.ca> Message-ID: <20031202230303.0052c52e.casey@zope.com> On Tue, 2 Dec 2003 22:28:03 -0500 Greg Ward wrote: > On 02 December 2003, Casey Duncan said: > > I volunteer for phase 1. Actually I will do a phase 0 first which will > > just be stupid wrapper that exposes the API and nothing else. From > > there we can discuss what needs to be done to complete phase 1. > > > > This looks like a good job for SWIG, does anyone oppose using it? > > Note that the current Berkeley DB wrapper did not get into the standard > library until AMK rewrote it from hand with no hint of SWIG. (And even > then, it took a year or two before the bsddb in 2.3 got in.) And it still seems to break often due to the API instabilities of bsddb itself. Oh well. > As I recall, there were Serious Reservations about the quality of code > generated by SWIG. Grovel through the python-dev archives for more. If > SWIG has changed much since then, it might be worth revisiting -- but I > suspect you'd have a selling job to do to get SWIGged code past > python-dev. Yup, I have reservations of my own about it. I definitely don't want to do it by hand (and maintain it) if it will see little use, so I think we should discuss a bit more exactly what our needs are. >From what I understand we want a DOM parser for real-world (aka broken) HTML code. From what I can see, tidylib will (or at least aspires to) do this. I think some testing is in order, now if only I could find some broken HTML code... ;^) Now the DOM api from tidylib is not W3C compliant. If we were to use tidylib as a base for some new HTML DOM parser, would we desire a W3C compliant api? As much as I want to say no, it would probably help its credibility in terms of becoming part of the st lib. OTOH, if anyone has a better idea, I'm all ears. What kind of api do people want? So a revised plan A will be to vet tidylib as the solution to the HTML parser problem. I will do this, but can anyone already speak more specifically about their experiences good and bad? -Casey From aquarius-lists at kryogenix.org Wed Dec 3 05:01:34 2003 From: aquarius-lists at kryogenix.org (Stuart Langridge) Date: Wed Dec 3 04:59:12 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike References: <3FCCB8B7.8070102@bath.ac.uk> Message-ID: John J Lee spoo'd forth: > On Tue, 2 Dec 2003, Stuart Langridge wrote: >> Simon Willison spoo'd forth: >> > Is there any way we could get a DOM tree from invalid HTML using pure >> > Python tools? The HTML tools in the Python standard library at the >> Presumably we could (the existing things, like HtmlLib or microdom do >> it); > > No, they don't. There's a whole wonderful world of invalid HTML > out there, that sgmllib and xml.dom.ext.reader.HtmlLib know nothing about. Really? What sort of thing do they fail to parse? sil -- If hard data were the filtering criterion you could fit the entire contents of the Internet on a floppy disk. -- Cecil Adams From jjl at pobox.com Wed Dec 3 09:20:02 2003 From: jjl at pobox.com (John J Lee) Date: Wed Dec 3 09:20:34 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: References: <3FCCB8B7.8070102@bath.ac.uk> Message-ID: On Wed, 3 Dec 2003, Stuart Langridge wrote: > John J Lee spoo'd forth: > > On Tue, 2 Dec 2003, Stuart Langridge wrote: > >> Simon Willison spoo'd forth: > >> > Is there any way we could get a DOM tree from invalid HTML using pure > >> > Python tools? The HTML tools in the Python standard library at the > >> Presumably we could (the existing things, like HtmlLib or microdom do > >> it); > > > > No, they don't. There's a whole wonderful world of invalid HTML > > out there, that sgmllib and xml.dom.ext.reader.HtmlLib know nothing about. > > Really? What sort of thing do they fail to parse? Hmm, I thought microdom used tidylib, but it seems not. Haven't tried that yet. The problem is that tidylib has had a lot of input over many years from people reporting bugs (where "bug" is very widely defined to include failing to understand all kinds of bad HTML that one wouldn't imagine people would write or browsers would put up with). microdom hasn't. But maybe it works well enough. It's not a full DOM implementation, though. BTW, I had thought of tidylib simply as a way of transforming HTML into valid HTML or XHTML, not as a DOM implementation. You could just have a single tidy() function (like mxTidy, IIRC). Here's some valid HTML that xml.dom.ext.reader.HtmlLib (from PyXML, and based on sgmlop) fails to parse. #!/usr/bin/env python # Example from Martin v. Loewis (PyXML SF bug 409605). # The missing optional tag is not inferred. good_html = """

I prefer (all things being equal) regularity/orthogonality and logical syntax/semantics in a language because there is less to have to remember. (Of course I know all things are NEVER really equal!)

Guido van Rossum, 6 Dec 91

The details of that silly code are irrelevant.

Tim Peters, 4 Mar 92 & < > é ö   """ from xml.dom.ext.reader.HtmlLib import FromHtml from xml.dom.ext import XHtmlPrettyPrint dom = FromHtml(good_html) XHtmlPrettyPrint(dom) That could be fixed. Nobody has, probably because there are better XML DOM parsers. IIRC HTMLParser still doesn't handle CDATA properly (this one has annoyed a lot of people, but I don't think anybody has fixed it yet). For invalid HTML, it's true that badly-matched tags tend to work OK with HTMLParser, but of course that just gives you "bad callbacks" instead of bad HTML, if you get what I mean -- if you want to build a DOM out of that, for example, good luck. I suppose this is really the most important issue. Browsers seem to be full of code to parse or ignore the weirdest stuff that even the underlying parser (HTMLParser, etc) choke on: I've seen things that look like SGML declarations but didn't even seem to be valid SGML, let alone HTML (but I don't know SGML). John From jjl at pobox.com Wed Dec 3 09:23:00 2003 From: jjl at pobox.com (John J Lee) Date: Wed Dec 3 09:23:23 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <20031202230303.0052c52e.casey@zope.com> References: <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> <20031202091652.18f2daea.casey@zope.com> <20031203032803.GA2473@cthulhu.gerg.ca> <20031202230303.0052c52e.casey@zope.com> Message-ID: On Tue, 2 Dec 2003, Casey Duncan wrote: [...] > OTOH, if anyone has a better idea, I'm all ears. What kind of api do people want? [...] from tidy import tidy xhtml = tidy(html) ...plus some optional args. mxTidy does this, more-or-less, I think (but is based on the old HTMLTidy, not tidylib, of course). John From casey at zope.com Wed Dec 3 10:08:59 2003 From: casey at zope.com (Casey Duncan) Date: Wed Dec 3 10:12:26 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: References: <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> <20031202091652.18f2daea.casey@zope.com> <20031203032803.GA2473@cthulhu.gerg.ca> <20031202230303.0052c52e.casey@zope.com> Message-ID: <20031203100859.5c748589.casey@zope.com> On Wed, 3 Dec 2003 14:23:00 +0000 (GMT) John J Lee wrote: > On Tue, 2 Dec 2003, Casey Duncan wrote: > [...] > > OTOH, if anyone has a better idea, I'm all ears. What kind of api do people want? > [...] > > from tidy import tidy > xhtml = tidy(html) That would be a pretty easy wrapper methinks. At first that was pretty much all I thought tidylib would do, but it exposes its object model in such a way that you could parse HTML directly to a DOM if you wanted to. If you merely use tidy to create xhtml and then parse that, you are doing a DOM parse twice and not only is that inefficient, its probably lossy (depending on how strict the conversion is). Cycles are cheap so I'm willing to live with inefficency if it means forward progress in functionality. The loss part might not be so great. So maybe the approach should be: 1. Expose the basic functionality that the tidy binary has as a python function and see how we like it. I think this is worthwhile regardless of whether it makes it into the stdlib. 2. Think about whether we want/need a direct HTML->DOM parser. And then decide how much we need it 8^) 3. Go get a beer and think about something entirely different. -Casey From jjl at pobox.com Wed Dec 3 10:40:58 2003 From: jjl at pobox.com (John J Lee) Date: Wed Dec 3 10:41:04 2003 Subject: [Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike In-Reply-To: <20031203100859.5c748589.casey@zope.com> References: <3FC61716.90909@bath.ac.uk> <20031201151833.4b9004fa.casey@zope.com> <20031201173627.741b88f9.casey@zope.com> <20031202091652.18f2daea.casey@zope.com> <20031203032803.GA2473@cthulhu.gerg.ca> <20031202230303.0052c52e.casey@zope.com> <20031203100859.5c748589.casey@zope.com> Message-ID: On Wed, 3 Dec 2003, Casey Duncan wrote: > On Wed, 3 Dec 2003 14:23:00 +0000 (GMT) John J Lee wrote: [...] > > from tidy import tidy > > xhtml = tidy(html) > > That would be a pretty easy wrapper methinks. At first that was pretty > much all I thought tidylib would do, but it exposes its object model in > such a way that you could parse HTML directly to a DOM if you wanted to. Loss is inevitable if you're tidying. How could it be otherwise? Usually you don't get huge DOMs from HTML documents, unlike XML, so that's not a major problem -- I hope! Marc-Andre's page talks about poor performance from HTMLTidy due to character-based operation, but I don't know how severe that is or whether it's been addressed in tidylib. 4DOM seems damn slow (I may be unfairly blaming 4DOM, since I'm using a hacked version with JavaScript interpretation on top, so it could easily be my fault, or the fault of the JS code I'm running), but of course there are faster, more compliant implementations, so that shouldn't be a problem. Finally, DOM *processing* might well be faster using tidylib just as a tidier than it would be as a DOM (especially if you wrap the tidy-DOM to get a real, compliant, DOM). John From pje at telecommunity.com Sun Dec 7 13:53:43 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Dec 7 13:51:39 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 Message-ID: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> Your comments and feedback are requested. Thanks. PEP: XXX Title: Python Web Container Interface v1.0 Version: $Revision: 1.1 $ Last-Modified: $Date: 2003/12/07 13:29:50 $ Author: Phillip J. Eby Discussions-To: Python Web-SIG Status: Draft Type: Informational Content-Type: text/x-rst Created: 07-Dec-2003 Post-History: 07-Dec-2003 Abstract ======== This document specifies a proposed standard interface between web applications and web application "containers" implemented in Python, making it possible to use a variety of application frameworks with a single container, and to use a variety of containers with a single application. Rationale ========= Python currently boasts a wide variety of web frameworks, such as Zope, Quixote, Webware, Skunkware, PSO, and Twisted -- to name just a few [1]_. This wide range of available choices would not be a problem, if only it weren't necessary to choose between them! Because few Python web frameworks can interoperate in the same process, users are generally forced to select one and only one framework. Making matters worse, not all frameworks support the same launching mechanisms. Some use an embedded webserver, others use CGI, FastCGI, or some custom server-to-application protocol. But, it is quite rare for a single framework to provide built-in support for all of these methods. Thus, the launching mechanism, or "container", becomes a key constraint for users selecting a web development tool. They are limited to the frameworks that support (or can be made to support) their desired runtime environment. This can narrow the field of choices considerably. This is a problem for framework authors as well as for framework users. For their framework to become popular, the author must at least implement container mechanisms for the most popular runtime environments. Although container implementation is not complex, it is tedious and sometimes riddled with platform-specific issues. Being able to separate container development from framework development would therefore benefit framework developers as well as users. This PEP, therefore, proposes a simple and universal interface between web "containers" and web "applications". The proposed interface is 100% framework neutral, and does not favor any development style over any other. Conformance to this interface will permit framework-neutral containers to be developed, independently of any application framework, and any application framework will then be usable with any container (potentially subject to certain environmental issues such as threading support). Finally, the interface also makes it potentially possible to combine the use of multiple web framework tools in a single application container. Specification Overview ====================== A "container" is a mechanism for executing Python code in response to a request made on a web server. The mechanism by which this occurs is specific to the container. For example, a CGI container would use the Common Gateway Interface, while a mod_python container would use Apache's internal API. An "application" is a Python object that does useful work in response to a request made on a Web server. A container invokes an application by calling its ``runCGI`` method, whose signature is defined as follows (the ``self`` argument is omitted for clarity.):: def runCGI(input,output,errors,environ): pass In other words, an application calls ``app.runCGI(input,output,errors,environ)`` to invoke the application. The ``runCGI`` method should read from ``input``, if required, and write its response to ``output``, using the ``environ`` dictionary to obtain other information about the request. Error messages or log output may be written to ``errors``. The return value of ``runCGI`` is ignored by the container. The contents and format of ``input``, ``output``, and ``environ`` are defined by the Common Gateway Interface [2]_. The application object *must* support repeated calls to ``runCGI``, as virtually all containers will make such repeated requests. Containers *should* trap and log exceptions raised by applications, and *may* continue to execute, or attempt to shut down gracefully. Applications *should* avoid allowing exceptions to escape their ``runCGI`` method, since the precise effect of this is container-dependent. Thread support, or lack thereof, is also container-dependent. Containers that can run multiple requests in parallel, *should* also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used. This specification does not define how a container selects or obtains an application to invoke. These and other configuration options are highly container-specific matters. It is expected that container authors will document how to configure the container to execute a particular application object, and with what options (such as threading options, if applicable). Framework authors, on the other hand, will document how to create an application object that wraps their framework's functionality. The user, who has chosen both the container and the application framework, must connect the two together. However, since both the framework and the container now have a common interface, this should now be merely a mechanical matter, rather than a significant engineering effort. Specification Details ===================== The ``input``, ``output``, and ``errors`` objects supplied to the ``runCGI`` method must be "file-like" objects, while the ``environ`` object *must* be a Python dictionary. The ``runCGI`` method is allowed to modify the dictionary in-place, making it easier for authors to create simple "routing" components that forward ``runCGI`` calls to other components. The rationale for requiring a dictionary is to maximize portability between containers. The alternative would be to define here some subset of a dictionary's methods as being the standard and portable interface. In practice, however, most containers will probably want to use a simple dictionary anyway, and some frameworks may end up relying upon the fact that most containers do this. So, in the interest of a simple specification, and because there is little need for a custom type here anyway, a Python dictionary is mandatory for communicating the CGI environment. The "file-like" objects are another matter, though. There is much more diversity between containers as to how these "file-like objects" are likely to be implemented. They may be pipes, or sockets, or buffered asynchronous communication objects of some kind. Therefore, we must define the following subset of file methods that containers are required to provide, and urge framework authors to use these, and only these methods: =================== ===================== ======== Method Files Notes =================== ===================== ======== ``close()`` All ``read(size)`` ``input`` ``readline()`` ``input`` 1 ``readlines(hint)`` ``input`` 2 ``__iter__()`` ``input`` ``flush()`` ``output``,``errors`` 3 ``write(str)`` ``output``,``errors`` ``writelines(seq)`` ``output``,``errors`` =================== ===================== ======== The semantics of each method are as documented in the Python Library Reference, except for these notes as listed in the table above: 1. The optional "size" argument to ``readline()`` is not supported, as it may be complex for container authors to implement, and is not often used in practice. 2. Note that the ``hint`` argument to ``readlines()`` is optional for both caller and implementer. The application author is free not to supply it, and the container author is free to ignore it. 3. Since ``output`` and ``errors`` may not be rewound, a container is free to forward write operations immediately, without buffering. In this case, the ``flush()`` method may be a no-op. Portable applications, however, cannot assume that output is unbuffered or that ``flush()`` is a no-op. They must call ``flush()`` if they need to ensure that output has in fact been written. Luckily, the use of ``output.flush()`` is only an issue for applications performing "server push" operations, since closing ``output`` will also flush it. Applications writing logs or other output to ``errors``, however, may wish to perform a flush after each complete item is output, to minimize intermingling of data from multiple processes writing to the same log. The methods listed in the table above *must* be supported by all containers conforming to this specification. Applications conforming to this specification *must not* use any other methods or attributes of the ``input``, ``output``, or ``errors`` objects. Implementation and Application Notes ==================================== Proofs-of-concept of this specification are currently available in the PEAK application framework [3]_. PEAK includes a CGI container and two FastCGI containers, as well as a sample non-framework application, and a ``peak.web`` framework application. Together, these components demonstrate the ability to mix and match containers and applications/frameworks by way of the interface specified here. (Note: the containers and applications were implemented prior to the creation of this specification, and so should not be taken as examples of conforming implementations at this time.) It is expected that future versions of Python will include updated versions of current "containers" so that they can support this interface. For example, the Python standard library now contains various web server implementations, and these could be modified to allow invoking application objects that conform to this specification. Widespread adoption of this specification would also make it possible to implement simple "router" applications that forward ``runCGI`` calls to other application objects, using information in the ``environ`` to determine the recipient. Because the CGI environment variables include both URL path information and cookies, such "router" components could be very sophisticated, if desired. And, they would potentially allow more than one framework to be used in the same application, permitting Python developers to take the best from all possible worlds. For load balancing and remote processing, it would also be possible to write "bridge" applications, that forward a ``runCGI`` call over a network. Or, to add CGI capability to a Python webserver, one might write a bridge that simply invoked another process in response to ``runCGI``. Such bridges would again be usable in any container conforming to this specification. References ========== .. [1] The Python Wiki "Web Programming" topic (http://www.python.org/cgi-bin/moinmoin/WebProgramming) .. [2] The Common Gateway Interface Specification (http://hoohoo.ncsa.uiuc.edu/cgi/interface.html) .. [3] PEAK: The Python Enterprise Application Kit (http://peak.telecommunity.com/) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From amk at amk.ca Sun Dec 7 16:35:21 2003 From: amk at amk.ca (A.M. Kuchling) Date: Sun Dec 7 16:35:46 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> Message-ID: <20031207213521.GB19481@rogue.amk.ca> On Sun, Dec 07, 2003 at 01:53:43PM -0500, Phillip J. Eby wrote: > to a request made on a Web server. A container invokes an application > by calling its ``runCGI`` method, whose signature is defined as Name nit: why include the irrelevant 'CGI' in the name? Just 'run()' would be fine. > Containers that can run multiple requests in parallel, *should* also > provide the option of running an application in a single-threaded > fashion, so that applications or frameworks that are not thread-safe > may still be used. Should there also be a is_thread_safe() method that returns a Boolean, so containers can serialize if necessary? > The rationale for requiring a dictionary is to maximize portability > between containers. The alternative would be to define here some > subset of a dictionary's methods as being the standard and portable > interface. In practice, however, most containers will probably want Note that the UserDict.DictMixin class implements all of the other dictionary methods as long as you implement __getitem__, __setitem__, __delitem__, and keys(). It seems unpythonic to require a particular class here. The spec looks very good, though -- simple, easy to implement, and useful. --amk From pje at telecommunity.com Sun Dec 7 19:05:26 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Dec 7 19:03:22 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031207213521.GB19481@rogue.amk.ca> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> At 04:35 PM 12/7/03 -0500, A.M. Kuchling wrote: >On Sun, Dec 07, 2003 at 01:53:43PM -0500, Phillip J. Eby wrote: > > to a request made on a Web server. A container invokes an application > > by calling its ``runCGI`` method, whose signature is defined as > >Name nit: why include the irrelevant 'CGI' in the name? Just 'run()' would >be fine. Well, if you're going to go that route, why not just make it a callable? :) My thought here was that many kinds of Python frameworks have objects with 'run' methods, and they all have different signatures. So, explicit being better than implicit, I chose a name that was midway between a nameless callable and, say, 'executeWebRequest'. :) I'm not too strongly attached to the name, but would like to keep it a bit more explicit than 'run()' or a bare callable. > > Containers that can run multiple requests in parallel, *should* also > > provide the option of running an application in a single-threaded > > fashion, so that applications or frameworks that are not thread-safe > > may still be used. > >Should there also be a is_thread_safe() method that returns a Boolean, >so containers can serialize if necessary? I thought about it. But there are going to be more applications than containers, so why put extra burden on the app side to benefit the few containers that will be threaded? My conclusion (which others might not share) was that such containers are going to need other per-app configuration settings anyway, like perhaps the path at which the app is located, how many threads maximum to use in a thread pool for that app, and of course how to get the app object in the first place. Thus, there's little added burden for the container to require explicit configuration for threadedness. It's also possible that what constitutes thread-safety might vary somewhat from container to container. Second, if container configuration becomes complex, there's always the possibility to go back and create some kind of "deployment descriptor" spec, to make apps deployable in a variety of containers. But I think that should wait until there's enough field experience with *this* spec, to know what's really needed for the deployment spec. And last, but far from least, the more things there are in the spec, the more things there are for people to disagree with or have different interpretations of. :) > > The rationale for requiring a dictionary is to maximize portability > > between containers. The alternative would be to define here some > > subset of a dictionary's methods as being the standard and portable > > interface. In practice, however, most containers will probably want > >Note that the UserDict.DictMixin class implements all of the other >dictionary methods as long as you implement __getitem__, __setitem__, >__delitem__, and keys(). It seems unpythonic to require a particular class >here. Maybe I'm overreacting to being burned by imperfect dictionary simulations in the past. OTOH, I noticed you haven't actually given a use case for *not* using a dictionary. :) However, there is ample precedent in Python for requiring at least a *subclass* of dictionary, and perhaps we could compromise there. >The spec looks very good, though -- simple, easy to implement, and useful. Thanks. I've often found this "plumbing" issue to be quite annoying. I know that I personally would likely experiment with more web app frameworks, if I knew that I could plug them into a container I was already familiar with. And, I recently finished developing a very nice multiprocess FastCGI container which I expect to be my main runtime environment for web applications in future. I don't want it to only be useful for myself and other PEAK users, though. Hence, the spec. From stuart at stuartbishop.net Sun Dec 7 22:20:28 2003 From: stuart at stuartbishop.net (Stuart Bishop) Date: Sun Dec 7 22:21:11 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> Message-ID: <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/12/2003, at 5:53 AM, Phillip J. Eby wrote: > In other words, an application calls > ``app.runCGI(input,output,errors,environ)`` to invoke the application. > The ``runCGI`` method should read from ``input``, if required, and > write its response to ``output``, using the ``environ`` dictionary > to obtain other information about the request. Error messages or log > output may be written to ``errors``. The return value of ``runCGI`` > is ignored by the container. The contents and format of ``input``, > ``output``, and ``environ`` are defined by the Common Gateway > Interface [2]_. Should environ['REMOTE_USER'] return '', None, or raise a KeyError if the web server has performed no authentication on a request? Some keys should always have valid values available (REQUEST_METHOD), but others only for some requests (CONTENT_LENGTH, REMOTE_USER). We don't want applications raising KeyError exceptions when moved to different frameworks because of frameworks handling this differently. +1 for using None for missing/meaningless value, and accessing any variable defined at http://hoohoo.ncsa.uiuc.edu/cgi/env.html will never raise a KeyError. I don't think errors should be a file - we now have a logging package so we might as well use it. We could pass in a Logger instance, although I'd just scrap the argument and let the handler instantiate the Logger if it wants one. The container could define a Handler that sends the log messages to the 'standard' location (eg. CGI's Handler would just be a StreamHandler that uses sys.stderr). > Thread support, or lack thereof, is also container-dependent. > Containers that can run multiple requests in parallel, *should* also > provide the option of running an application in a single-threaded > fashion, so that applications or frameworks that are not thread-safe > may still be used. A thread_safety method should be provided by the application. It should be specified only once, rather than in every container that invokes the application. The thread_level might be generated programatically, eg. by querying a DB-API database Connection's thread_safety attribute. > The rationale for requiring a dictionary is to maximize portability > between containers. The alternative would be to define here some > subset of a dictionary's methods as being the standard and portable > interface. In practice, however, most containers will probably want > to use a simple dictionary anyway, and some frameworks may end up > relying upon the fact that most containers do this. So, in the > interest of a simple specification, and because there is little need > for a custom type here anyway, a Python dictionary is mandatory for > communicating the CGI environment. Or just 'environment should be a standard mapping, or subclass of ``map`` or ``UserDict``. - -- Stuart Bishop http://www.stuartbishop.net/ - -- Stuart Bishop http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/0+4DAfqZj7rGN0oRAlphAJ9wEqZt835o4IDl2QjnBvTVT8X2BwCePFCG qMqU+BCwk8aZKMNKBt5Qc3M= =yAPk -----END PGP SIGNATURE----- From ngps at netmemetic.com Sun Dec 7 22:36:40 2003 From: ngps at netmemetic.com (Ng Pheng Siong) Date: Sun Dec 7 22:36:32 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> Message-ID: <20031208033640.GA825@vista.netmemetic.com> On Mon, Dec 08, 2003 at 02:20:28PM +1100, Stuart Bishop wrote: > Should environ['REMOTE_USER'] return '', None, or raise a KeyError if > the > web server has performed no authentication on a request? +1 for None. Zope is able to use REMOTE_USER if the web server sets it, e.g., - ZServerSSL sets it to the client certificate's subject DN when available and asked to. - The RemoteUserFolder product was originally written to allow IIS to do Windows authentication. > accessing any variable defined at > http://hoohoo.ncsa.uiuc.edu/cgi/env.html will never raise a KeyError. For HTTPS there are a bunch of additional variables. I suppose most people might consider mod_ssl's list canonical; I looked at it and copped out: ZServerSSL exports only SSL_CIPHER for now. -- Ng Pheng Siong http://firewall.rulemaker.net -+- All Your Rulebase Are Belong To You[tm] http://sandbox.rulemaker.net/ngps -+- Open Source Python Crypto & SSL From stuart at stuartbishop.net Sun Dec 7 22:50:22 2003 From: stuart at stuartbishop.net (Stuart Bishop) Date: Sun Dec 7 22:50:53 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/12/2003, at 11:05 AM, Phillip J. Eby wrote: > At 04:35 PM 12/7/03 -0500, A.M. Kuchling wrote: >> On Sun, Dec 07, 2003 at 01:53:43PM -0500, Phillip J. Eby wrote: >> > to a request made on a Web server. A container invokes an >> application >> > by calling its ``runCGI`` method, whose signature is defined as >> >> Name nit: why include the irrelevant 'CGI' in the name? Just 'run()' >> would >> be fine. > > Well, if you're going to go that route, why not just make it a > callable? :) Callable or something simpler and more obvious like 'run()' is good if you exect objects to only talk to one protocol. doCGI is ok (I'd prefer handle_CGI...) if you think a single object might also want to handle other protocols (XMLRPC, FTP), as defined by future PEPs. > I thought about it. But there are going to be more applications than > containers, so why put extra burden on the app side to benefit the few > containers that will be threaded? My conclusion (which others might > not share) was that such containers are going to need other per-app > configuration settings anyway, like perhaps the path at which the app > is located, how many threads maximum to use in a thread pool for that > app, and of course how to get the app object in the first place. > Thus, there's little added burden for the container to require > explicit configuration for threadedness. It's also possible that what > constitutes thread-safety might vary somewhat from container to > container. Although there will be more applications than containers, I doubt that there will be many that actually implement the Web Container Interface - sane people will simply subclass StandardWebContainer (to be defined), since sane people generally don't want to rewrite header formatting, response buffering, cookie decoding/encoding, POST and QUERY_STRING decoding, gzip compression, i18n etc. > And last, but far from least, the more things there are in the spec, > the more things there are for people to disagree with or have > different interpretations of. :) I think it is good to define a bare interface between request brokers and applications, and CGI is a good common denominator to work from. The real arguing will be from wanting to have python ship with a higher level interface implementing this specification. I'm sure cookies, response headers, streaming & buffering, QUERY_STRING and POST decoding can all be agreed on without bloodshed, but getting people to agree that standalone Zope Page Templates should go in too might be more difficult :-) - -- Stuart Bishop http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/0/T+AfqZj7rGN0oRArAvAKCZ3FLT/kcdF7sKAYWd6e0C8+w8nACdFRw1 0kKa88u1VA8f110rJei6KPQ= =YCkJ -----END PGP SIGNATURE----- From amk at amk.ca Mon Dec 8 06:38:06 2003 From: amk at amk.ca (A.M. Kuchling) Date: Mon Dec 8 06:38:32 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> Message-ID: <20031208113806.GB2689@rogue.amk.ca> On Sun, Dec 07, 2003 at 07:05:26PM -0500, Phillip J. Eby wrote: > Maybe I'm overreacting to being burned by imperfect dictionary simulations > in the past. OTOH, I noticed you haven't actually given a use case for > *not* using a dictionary. :) os.environ is not a dictionary (nor a subclass of dict), so the simplest CGI case would be runCGI(sys.stdin, sys.stdout, sys.stderr, os.environ.copy()). Seems silly. --amk From pje at telecommunity.com Mon Dec 8 09:55:02 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Dec 8 09:53:02 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> At 02:20 PM 12/8/03 +1100, Stuart Bishop wrote: >Should environ['REMOTE_USER'] return '', None, or raise a KeyError if the >web server has performed no authentication on a request? Some keys >should always have valid values available (REQUEST_METHOD), but others >only for some requests (CONTENT_LENGTH, REMOTE_USER). We don't want >applications raising KeyError exceptions when moved to different >frameworks because of frameworks handling this differently. +1 for using >None for missing/meaningless value, and accessing any variable defined at >http://hoohoo.ncsa.uiuc.edu/cgi/env.html will never raise a KeyError. I'm -1 on it. This interface is intended to support *existing* application frameworks with minimal glue. For example, I've successfully run both Zope 2 ZPublisher and Zope 3 zope.publisher under this gateway interface. Putting in 'None' where a sane CGI environment lacks the variable is asking for trouble. >I don't think errors should be a file - we now have a logging package >so we might as well use it. We could pass in a Logger instance, although >I'd just scrap the argument and let the handler instantiate the Logger >if it wants one. The container could define a Handler that sends the log >messages to the 'standard' location (eg. CGI's Handler would just be a >StreamHandler that uses sys.stderr). "errors" is intended to allow access to the web server's error log, as FastCGI and other protocols permit. There are times when it is very useful to see application errors in the same context as server errors, so this is included for completeness. A container is free to provide a different destination for the errors stream. Also, this again is for the greatest possible compatibility with existing applications and containers. >A thread_safety method should be provided by the application. It should >be specified only once, rather than in every container that invokes the >application. The thread_level might be generated programatically, eg. >by querying a DB-API database Connection's thread_safety attribute. AFAIK, there are only maybe 2 or 3 threaded containers currently available, and I don't believe any of them have an option *not* to run threading. So, this seems like a YAGNI to me. I would prefer there to be actual field experience with the minimal spec, in order to decide what kind of threading categories would be appropriate. For example, suppose that a threaded container wishes to configure, instead of one application object, a factory for returning new application objects, so that there is no threading problem? I think that a premature attempt to define threading models in advance of experience/experimentation would not only hold up delivery of a usable spec, but could also close off fruitful lines of experimentation for container developers. I'm similarly concerned about other forms of deployment parameterization. >Or just 'environment should be a standard mapping, or subclass of ``map`` >or ``UserDict``. Dunno why everyone feels so strongly about that one, but if that's what it takes to get through, then perhaps we can decide on a small set of required methods. Or, maybe we could simply require that environ.copy() must always *return* a dictionary, and then portable apps would only use the copy. :) From pje at telecommunity.com Mon Dec 8 09:57:34 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Dec 8 09:55:32 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031208033640.GA825@vista.netmemetic.com> References: <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> Message-ID: <5.1.0.14.0.20031208095517.01e63020@mail.telecommunity.com> At 11:36 AM 12/8/03 +0800, Ng Pheng Siong wrote: >On Mon, Dec 08, 2003 at 02:20:28PM +1100, Stuart Bishop wrote: > > Should environ['REMOTE_USER'] return '', None, or raise a KeyError if > > the > > web server has performed no authentication on a request? > >+1 for None. > >Zope is able to use REMOTE_USER if the web server sets it, e.g., But what does it do if it's set to 'None'? And even if it's happy with this, will the fifty or so other existing application frameworks be happy with it? Compatibility with the vast existing app framework code base demands that environ values *must* be strings, or else not present. (Guess I should add that to the spec.) From amk at amk.ca Mon Dec 8 10:05:12 2003 From: amk at amk.ca (A.M. Kuchling) Date: Mon Dec 8 10:05:36 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> References: <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> Message-ID: <20031208150512.GA3373@rogue.amk.ca> On Mon, Dec 08, 2003 at 09:55:02AM -0500, Phillip J. Eby wrote: > interface. Putting in 'None' where a sane CGI environment lacks the > variable is asking for trouble. Agreed; leave the environment alone, and leave stderr as a file. If we start defining logger objects, we're now building yet another framework. Bonus: most frameworks probably have a method matching this signature already. For example, in Quixote you could just add a 'runCGI = publish' assignment to the Publisher class and voila, it's now compatible. > For example, suppose that a threaded container wishes to configure, instead > of one application object, a factory for returning new application objects, > so that there is no threading problem? I think that a premature attempt to Only the application knows if it can handle threads, though; if there's some unthreaded global cache, creating new application objects is not going to make everything threadsafe. I don't use threads and think their use is brain-damaged 95% of the time, so I don't really care if there's a thread-safety mechanism in the spec or not. --amk From pje at telecommunity.com Mon Dec 8 10:18:25 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Dec 8 10:16:25 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: References: <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20031208095809.03e80150@mail.telecommunity.com> At 02:50 PM 12/8/03 +1100, Stuart Bishop wrote: >Although there will be more applications than containers, I doubt that >there will be many that actually implement the Web Container Interface - >sane people will simply subclass StandardWebContainer (to be defined), I presume you mean StandardWebApp, since the container is the component that *invokes* the proposed interface. >since sane people generally don't want to rewrite header formatting, >response buffering, cookie decoding/encoding, POST and QUERY_STRING >decoding, gzip compression, i18n etc. Right. But, again, consider the existing fifty or so frameworks that do this stuff. With the interface as specified, those framework authors can slap a few lines of code on top of their existing setup, and have instant comformance. But, the framework author -- except in rare cases -- is probably *not* going to be able to specify thread compliance on behalf of the actual user application. Thus, they're going to have to also design some way for the framework's user to specify the level of threading support to be flagged by the application object. That's an unnecessary burden, when the container is already going to have to manage other kinds of configuration. Also, consider this: only a very few containers will support threading. mod_python on Apache 1.3 won't be threaded. Most FastCGI implementations aren't. CGI definitely isn't. That pretty much leaves half-async webservers written in Python, like those belonging to Zope and Twisted. And, it's not clear to me at this point if they will even *care* about this. Even if they do, the thread pool models used by Twisted and Zope are probably different in interesting ways that are completely outside the scope of this proposal. Easy backward compatibility is extremely important to this interface. If users have to change their apps to make this work, it's not going to fly. If, on the other hand, a framework developer puts a wrapper on their framework, then the app is portable. What's not portable is configuration of the container. It's one thing for a user to learn how to configure a container to run their existing app, and another thing to make them have to change the existing app to support a thread safety indicator. What's more, I have the nightmare vision of an app needing to specify different thread safety levels for different containers, because of the way those containers handle different threading levels. Explicit (configuration of the container) is better than implicit (funnelling a safety flag up from an app, through a framework to the interface, for the container to then interpret according to its own schema). >>And last, but far from least, the more things there are in the spec, the >>more things there are for people to disagree with or have different >>interpretations of. :) > >I think it is good to define a bare interface between request brokers >and applications, and CGI is a good common denominator to work from. Great. >The real arguing will be from wanting to have python ship with >a higher level interface implementing this specification. You mean on the application side, I presume? Containers in the standard library (or adapters from the existing containers to allow invocation of conforming apps) should be non-controversial. >I'm >sure cookies, response headers, streaming & buffering, QUERY_STRING >and POST decoding can all be agreed on without bloodshed, but getting >people to agree that standalone Zope Page Templates should go in too >might be more difficult :-) I'd assume that a trivial "CGIApp" class would be written so as to simply create a cgi.FieldStorage, and invoke an abstract method. Anything more than that would be encroaching on highly disputed and disputable territory. :) (And perhaps not really needed, anyway.) But I care almost nothing about the stdlib effects of this proposal for the moment. Anything that happens, won't happen until 2.4. But, if this becomes the "community standard" interface *now*, then framework developers can start splitting their container code from their framework code, and expand the reach of both their containers and their frameworks. And they can do it with existing code, today, on older Pythons. That, I think, is something worth working towards. From pje at telecommunity.com Mon Dec 8 10:29:40 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Dec 8 10:27:39 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031208113806.GB2689@rogue.amk.ca> References: <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207184325.03123440@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20031208102419.03707ec0@mail.telecommunity.com> At 06:38 AM 12/8/03 -0500, A.M. Kuchling wrote: >On Sun, Dec 07, 2003 at 07:05:26PM -0500, Phillip J. Eby wrote: > > Maybe I'm overreacting to being burned by imperfect dictionary simulations > > in the past. OTOH, I noticed you haven't actually given a use case for > > *not* using a dictionary. :) > >os.environ is not a dictionary (nor a subclass of dict), so the simplest CGI >case would be runCGI(sys.stdin, sys.stdout, sys.stderr, os.environ.copy()). >Seems silly. The copy() in that case would arguably be necessary anyway. Remember that the spec requires the caller to be allowed to *modify* environ in place. Anyway, as per my response to Stuart, I suppose I could further compromise to having the spec require that the copy() method return a dictionary. Then people who want to be sure their manipulations are portable, can simply take a copy of environ. (Or, alternatively, .items() could be required, and the portable mechanism would be to use 'dict(environ.items())'.) But, given how simple it is for the container to use a dictionary in the first place, it seems silly to force every layer to do a copy "just in case" to be portable. And, I think that os.environ really is the exception rather than the rule. How many existing containers use non-dictionaries now? From pje at telecommunity.com Mon Dec 8 10:35:47 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Dec 8 10:33:45 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031208150512.GA3373@rogue.amk.ca> References: <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20031208103018.03e847b0@mail.telecommunity.com> At 10:05 AM 12/8/03 -0500, A.M. Kuchling wrote: >On Mon, Dec 08, 2003 at 09:55:02AM -0500, Phillip J. Eby wrote: > > For example, suppose that a threaded container wishes to configure, > instead > > of one application object, a factory for returning new application > objects, > > so that there is no threading problem? I think that a premature attempt to > >Only the application knows if it can handle threads, though; if there's some >unthreaded global cache, creating new application objects is not going to >make everything threadsafe. My point is that no matter what, if you use a container, you have to configure it with a bunch of other facts about your application. So, you might as well explicitly configure any threading-related settings in their *native form*. That is, whatever threading settings the *container* has, whatever they might be. Making the app or framework declare their safety through a narrow interface on the application object seen by the container incurs needlessly "lossy" transfer of information. So, IMO, threading configuration should be part of container configuration, not part of the application interface. > I don't use threads and think their use is >brain-damaged 95% of the time, +1. >so I don't really care if there's a >thread-safety mechanism in the spec or not. And I actively *don't* want it, because it will interfere with the ability of container authors to add support for this interface, especially if they *do* support threading now. From ngps at post1.com Mon Dec 8 10:39:01 2003 From: ngps at post1.com (Ng Pheng Siong) Date: Mon Dec 8 10:40:44 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <5.1.0.14.0.20031208095517.01e63020@mail.telecommunity.com> References: <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> <5.1.0.14.0.20031208095517.01e63020@mail.telecommunity.com> Message-ID: <20031208153901.GA367@vista.netmemetic.com> On Mon, Dec 08, 2003 at 09:57:34AM -0500, Phillip J. Eby wrote: > >+1 for None. > > > >Zope is able to use REMOTE_USER if the web server sets it, e.g., > > But what does it do if it's set to 'None'? Let's see... ~/pkg/zope262/lib/python/ZPublisher$ egrep -i remote_user *.py BaseRequest.py: elif request.environ.has_key('REMOTE_USER'): BaseRequest.py: name=request.environ['REMOTE_USER'] HTTPRequest.py: 'REMOTE_USER' : 1, Publish.py: if realm and not request.get('REMOTE_USER',None): If it is None, Zope does nothing about it, I suppose. ZServerSSL... def get_environment(self, request): env = zhttps0_handler.get_environment(self, request) peer = request.channel.get_peer_cert() if peer is not None: env['REMOTE_USER'] = str(peer.get_subject()) return env (Oh, ok, it's just a setter. I'd forgotten.) ZServerSSL sets REMOTE_USER for RemoteUserFolder's consumption. > Compatibility with the vast existing app framework code base demands that > environ values *must* be strings, or else not present. (Guess I should add > that to the spec.) Looking at RemoteUserFolder: name = request.environ.get('REMOTE_USER', None) name = self.normalizeName(name) #LOG('RemoteUserFolder', INFO, 'validate %s' % str(name) ) if name is None: ... Well, the plural of `anecdote' is not `data' ;-), but it does seem to me the following 2 styles will be dominant: 1) x = dict.get('XX', None) if x is None: ... 2) if dict.has_key('XX'): ... So I'm guessing it is not terribly important whether it is '' or None. Cheers. -- Ng Pheng Siong http://firewall.rulemaker.net -+- All Your Rulebase Are Belong To You[tm] http://sandbox.rulemaker.net/ngps -+- Open Source Python Crypto & SSL From amk at amk.ca Mon Dec 8 11:18:00 2003 From: amk at amk.ca (A.M. Kuchling) Date: Mon Dec 8 11:18:25 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <5.1.0.14.0.20031208103018.03e847b0@mail.telecommunity.com> References: <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031208103018.03e847b0@mail.telecommunity.com> Message-ID: <20031208161800.GA4146@rogue.amk.ca> On Mon, Dec 08, 2003 at 10:35:47AM -0500, Phillip J. Eby wrote: > whatever they might be. Making the app or framework declare their safety > through a narrow interface on the application object seen by the container > incurs needlessly "lossy" transfer of information. And that convinces me; forget about threading. --amk From pje at telecommunity.com Mon Dec 8 12:09:17 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Dec 8 12:09:40 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031208153901.GA367@vista.netmemetic.com> References: <5.1.0.14.0.20031208095517.01e63020@mail.telecommunity.com> <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <7904F90F-292D-11D8-A22F-000A95A06FC6@stuartbishop.net> <5.1.0.14.0.20031208095517.01e63020@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20031208120413.02a813b0@telecommunity.com> At 11:39 PM 12/8/03 +0800, Ng Pheng Siong wrote: >On Mon, Dec 08, 2003 at 09:57:34AM -0500, Phillip J. Eby wrote: > > >+1 for None. > > > > > >Zope is able to use REMOTE_USER if the web server sets it, e.g., > > > > But what does it do if it's set to 'None'? > >Let's see... > > ~/pkg/zope262/lib/python/ZPublisher$ egrep -i remote_user *.py > > BaseRequest.py: elif request.environ.has_key('REMOTE_USER'): > BaseRequest.py: name=request.environ['REMOTE_USER'] > HTTPRequest.py: 'REMOTE_USER' : 1, > Publish.py: if realm and not request.get('REMOTE_USER',None): > >If it is None, Zope does nothing about it, I suppose. Did you trace every use of 'name' after it's set from REMOTE_USER, to be sure that it's okay for it to be None? I'm not saying there's a problem, I'm saying that it's silly to force the authors of every framework to go hunt down every existing use of *every* environment variable to be sure they're safe with them being None. >Well, the plural of `anecdote' is not `data' ;-), but it does seem to me No kidding. Even if "some" set of frameworks are okay with None, that's not the same as "all" frameworks. OTOH, any currently correct code will work if we *don't* use None or '', making that approach immeasurably superior from an "immediate adoption ability" point of view. >the following 2 styles will be dominant: > >1) > x = dict.get('XX', None) > if x is None: > ... > >2) > if dict.has_key('XX'): > ... > >So I'm guessing it is not terribly important whether it is '' or None. Actually, you've just given evidence that it is VERY important. Code that currently uses 'has_key' (or 'in') will BREAK if we put None OR '' for non-existent keys. Non-existent keys are clearly critical for backward compatibility with the second style you show above. From gstein at lyra.org Mon Dec 8 19:54:00 2003 From: gstein at lyra.org (Greg Stein) Date: Mon Dec 8 19:56:55 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031208161800.GA4146@rogue.amk.ca>; from amk@amk.ca on Mon, Dec 08, 2003 at 11:18:00AM -0500 References: <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031208103018.03e847b0@mail.telecommunity.com> <20031208161800.GA4146@rogue.amk.ca> Message-ID: <20031208165400.G15042@lyra.org> On Mon, Dec 08, 2003 at 11:18:00AM -0500, A.M. Kuchling wrote: > On Mon, Dec 08, 2003 at 10:35:47AM -0500, Phillip J. Eby wrote: > > whatever they might be. Making the app or framework declare their safety > > through a narrow interface on the application object seen by the container > > incurs needlessly "lossy" transfer of information. > > And that convinces me; forget about threading. I'm not convinced. If an application is designed with a per-process model in mind (e.g. CGI), and then you drop it into a threaded model... BOOM! The application needs to declare whether it is thread-safe. The container can then verify whether that application can be run within the container and the container's current configuration. For example, if you drop a non-thread-safe app into a threaded mod_python, then I would expect an error to be thrown, and the app to *not* be loaded. The simple fact is that threading (and the execution model, in general) is part of the environment. You can't limit it to just the three streams plus some "environ" dictionary. There is a *very* real impact on the application, based on how the container is executing those apps. Cheers, -g -- Greg Stein, http://www.lyra.org/ From titus at caltech.edu Mon Dec 8 20:12:02 2003 From: titus at caltech.edu (Titus Brown) Date: Mon Dec 8 20:12:05 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031208165400.G15042@lyra.org> References: <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031208103018.03e847b0@mail.telecommunity.com> <20031208161800.GA4146@rogue.amk.ca> <20031208165400.G15042@lyra.org> Message-ID: <20031209011202.GA1822@caltech.edu> -> > > whatever they might be. Making the app or framework declare their safety -> > > through a narrow interface on the application object seen by the container -> > > incurs needlessly "lossy" transfer of information. -> > -> > And that convinces me; forget about threading. -> -> I'm not convinced. If an application is designed with a per-process model -> in mind (e.g. CGI), and then you drop it into a threaded model... BOOM! -> -> The application needs to declare whether it is thread-safe. The container -> can then verify whether that application can be run within the container -> and the container's current configuration. -> -> For example, if you drop a non-thread-safe app into a threaded mod_python, -> then I would expect an error to be thrown, and the app to *not* be loaded. -> -> The simple fact is that threading (and the execution model, in general) is -> part of the environment. You can't limit it to just the three streams plus -> some "environ" dictionary. There is a *very* real impact on the -> application, based on how the container is executing those apps. I agree; often only a little bit of thought is needed to make sure something is thread safe, but that thought should be added into the framework ahead of time. cheers, --titus From grisha at modpython.org Tue Dec 9 12:29:58 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Tue Dec 9 12:30:04 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 Message-ID: <20031209122505.T62979@onyx.ispol.com> I must say this is very well written, and seeing such level of thoroughness on this list makes me very hopeful, because it produces a substantive discussion. My hat's off to Phillip for making the effort to write this. Having said that, I'm -1 on this PEP. I think it does a very good job of stating the problem (and this in itself is immensely valuable), but I do not agree with the solution. I think it trades efficiency for simplicity, and to paraphrase Franklin, if you give up one for the other you will get neither :-) The approach this spec takes is modeled after CGI, which was designed with shell scripts in mind and condenses things down to the UNIX primitives of stdin, stdout, stderr, environ (and cwd). On the surface this appears fine, but consider setting an HTTP header. Headers do not fit into the above-mentioned primitives, so CGI requires the application to send them to stdout. Writing headers to stdout is much more cumbersome than passing them in a mapping object of some sort. And most web server's CGI implementations do not pass the header portion of stdout straight to the client. They actually parse those headers, optionally alter them and adjust their own behavior based on the header information, then add the resulting data to the server header structure (e.g. headers_out table in case of Apache). This is inefficient, and ugly. And it is a direct consequence of the way CGI is specified. It is understandable why CGI does it, given that CGI was meant for running executables in a separate process on UNIX to serve a request. But there is no reason why such limitations should be carried over to environments that do not have the constraints of CGI. Especially considering that the whole idea of running an executable to serve an HTTP request looks pretty weird as a way to develop web applications these days. Whatever spec we come up with, IMO should deal in terms of the HTTP protocol request, headers, body, etc. Trying to narrow it down to input, output and environment is fitting a square peg into a round hole. Three other notes: 1. On the threading point - aside from thread-safety there is another big issue, it's the shared memory space. Some frameworks assume that they are running in one process and take it for granted that making something global will make it available to all other requests, which obviously isn't going to work on per-process servers. 2. If we're going to refer to a CGI specification, then we should rely on the RFC draft at http://cgi-spec.golux.com/. The stuff at NCSA's hoohoo page is more of a joke than a spec. 3. Mod_python *can* be threaded on apache 1.3, because 1.3 is threaded on Windows. Considering that Apache, IIS and iPlanet (or whatever it's called now) account for vast majority of the web servers out there, there are likely more threaded servers than not threaded, so I wouldn't through out thread-safety as a non-consideration. Grisha From pje at telecommunity.com Tue Dec 9 14:32:06 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Dec 9 14:32:17 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031209122505.T62979@onyx.ispol.com> Message-ID: <5.1.1.6.0.20031209134549.02a98ec0@telecommunity.com> At 12:29 PM 12/9/03 -0500, Gregory (Grisha) Trubetskoy wrote: >On the surface this appears fine, but consider setting an HTTP header. >Headers do not fit into the above-mentioned primitives, so CGI requires >the application to send them to stdout. Writing headers to stdout is much >more cumbersome than passing them in a mapping object of some sort. This is a non-problem. You can't wave your hand without hitting at least half a dozen *already written*, documented, even supported libraries that handle this in as many ways as one might like. And plenty of people obviously find them to be of adequate performance and usability. > And >most web server's CGI implementations do not pass the header portion of >stdout straight to the client. They actually parse those headers, >optionally alter them and adjust their own behavior based on the header >information, then add the resulting data to the server header structure >(e.g. headers_out table in case of Apache). This is inefficient, and ugly. ...and implemented, and documented, and portable, and highly available, and widely accepted. Practicality beats purity. >Whatever spec we come up with, IMO should deal in terms of the HTTP >protocol request, headers, body, etc. Trying to narrow it down to input, >output and environment is fitting a square peg into a round hole. I think perhaps there's some confusion about the PEP's goals here. It is in no way intended to be an ideal spec, a pure spec, or an efficient spec. It's *absolutely* not trying to be another framework. It is aimed solely at being an *implemented* and *universally available* spec -- right now, today, without waiting for another version of Python or trying to convince people to use it *in place of* their existing working tools. Rather, the spec should enable people to use other tools in *addition* to their existing ones. Note that this does not preclude the existence of other specifications for more advanced capabilities. However, such specs will naturally be less frequently available or implemented. Meanwhile, there's scarcely a server in existence that doesn't support CGI. >Three other notes: > >1. On the threading point - aside from thread-safety there is another big >issue, it's the shared memory space. Some frameworks assume that they are >running in one process and take it for granted that making something >global will make it available to all other requests, which obviously isn't >going to work on per-process servers. That's true. Such frameworks, however, will need to document that they will only work in single-process containers. Users will then correctly perceive this as a limitation of the framework. But it's an important point to add to the spec. Thanks for pointing it out. >2. If we're going to refer to a CGI specification, then we should rely on >the RFC draft at http://cgi-spec.golux.com/. The stuff at NCSA's hoohoo >page is more of a joke than a spec. Thanks for the reference; I took the first thing that came up in Google that seemed informative. :) >3. Mod_python *can* be threaded on apache 1.3, because 1.3 is threaded on >Windows. My mistake, sorry. >Considering that Apache, IIS and iPlanet (or whatever it's called >now) account for vast majority of the web servers out there, there are >likely more threaded servers than not threaded, so I wouldn't through out >thread-safety as a non-consideration. I was referring to existing, available containers written in/for Python code, but I can certainly see that might be the case. But it still doesn't change the possibility of containers having different threading models. For example, some servers may use dedicated per-application thread pools. Others might have a generic thread pool. Some might pre-allocate application objects, others might allocate on demand. Whatever the model, these are things that a container must configure. Explicit being better than implicit, it would be better to configure these things in the container. And because "in the face of ambiguity, refuse the temptation to guess," I don't want to guess what threading settings can or should exist and define a spec for them, nor should containers try to guess their settings from an ambiguous "I'm (not) threadsafe" flag. Thus, the intent to merely provide a transport conduit, rather than a configuration mechanism. I'd prefer to leave a threading spec to version 2.0, *after* there's widespread adoption -- and therefore widespread experience with -- the needs of containers and the issues of applications operating under the 1.0 spec. From pje at telecommunity.com Tue Dec 9 14:43:59 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Dec 9 14:44:05 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031208165400.G15042@lyra.org> References: <20031208161800.GA4146@rogue.amk.ca> <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031207133404.023e1070@mail.telecommunity.com> <5.1.0.14.0.20031208094130.03e80bd0@mail.telecommunity.com> <5.1.0.14.0.20031208103018.03e847b0@mail.telecommunity.com> <20031208161800.GA4146@rogue.amk.ca> Message-ID: <5.1.1.6.0.20031209143316.00aba080@telecommunity.com> At 04:54 PM 12/8/03 -0800, Greg Stein wrote: >I'm not convinced. If an application is designed with a per-process model >in mind (e.g. CGI), and then you drop it into a threaded model... BOOM! > >The application needs to declare whether it is thread-safe. The container >can then verify whether that application can be run within the container >and the container's current configuration. > >For example, if you drop a non-thread-safe app into a threaded mod_python, >then I would expect an error to be thrown, and the app to *not* be loaded. > >The simple fact is that threading (and the execution model, in general) is >part of the environment. You can't limit it to just the three streams plus >some "environ" dictionary. There is a *very* real impact on the >application, based on how the container is executing those apps. I'll add a "Threading and Process Issues" section to the PEP, explicitly addressing the types of issues that could occur (that we know of at present), and recommending what framework and container authors should document about their framework or container's requirements or capabilities. However, I think that trying to establish a metadata standard for these issues is premature, and should be left to a version 2.0, similar to the way the DBAPI 2.0 added a threading metadata specification, after driver authors and users had some experience with what kinds of issues existed. (Note that although lots of people have so far said "threading is important and should be in the spec", nobody has said, "this is what the spec should say about it." I'm taking this as an indication that nobody really knows what it should say, and that it's therefore premature to specify it.) Anyway, attempting to summarize the issues raised so far: * A framework that uses globals for inter-request communication will fail in a multi-process container * A framework that uses files or shared memory as an IPC mechanism will fail in a multi-server cluster container * Some frameworks are not thread-safe unless multiple application objects are created * Some frameworks are not thread-safe *even if* multiple application objects are created * Some frameworks may require explicit flagging, or other special coding practices in order to be thread-safe Have I missed anything? From paul.boddie at ementor.no Wed Dec 10 10:48:39 2003 From: paul.boddie at ementor.no (Paul Boddie) Date: Wed Dec 10 10:48:44 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 Message-ID: Gregory (Grisha) Trubetskoy wrote: > > The approach this spec takes is modeled after CGI, which was designed with > shell scripts in mind and condenses things down to the UNIX primitives of > stdin, stdout, stderr, environ (and cwd). I would have thought that this kind of interface would have been more suitable between the environment and the container, or possibly between components within the container. > On the surface this appears fine, but consider setting an HTTP header. > Headers do not fit into the above-mentioned primitives, so CGI requires > the application to send them to stdout. Writing headers to stdout is much > more cumbersome than passing them in a mapping object of some sort. And I can imagine that for many applications in many of the current frameworks, they would need some kind of "insulating wrapper" to comply with this interface. Certainly, I don't recall Webware, mod_python, Twisted or Zope applications sending headers to the same output stream as the data (or even using an output stream for the headers at all). [...] > Whatever spec we come up with, IMO should deal in terms of the HTTP > protocol request, headers, body, etc. Trying to narrow it down to input, > output and environment is fitting a square peg into a round hole. Agreed. I think we also need to consider where this interface "surfaces" in the application or framework; ie. where you would expect to find it, and what might sit on top. As I noted above, right now, many applications would need a few framework calls between the invocation of the runCGI function and an actual entry point into the application itself. This pre-PEP seems to serve an important purpose: it attempts to make a certain part of the Web request handling "stack" explicit. I'd certainly be interested in trying to make other parts of that "stack" more obvious, too. For example, it would be nice to consider the resolution of requests according to information contained within them, and the dispatching of such requests to resources. Right now, each framework seems to have its own ideology which states that requests using particular paths must get resolved in a particular way - it would be great if an API appeared that let developers rewire frameworks without resorting to external hacks to get the desired behaviour. Paul From jjl at pobox.com Wed Dec 10 12:15:35 2003 From: jjl at pobox.com (John J Lee) Date: Wed Dec 10 12:16:01 2003 Subject: [Web-SIG] [Python-Dev] PEP 292 and templating (fwd) Message-ID: ---------- Forwarded message ---------- Date: Tue, 9 Dec 2003 21:55:55 -0500 From: Raymond Hettinger Reply-To: python@rcn.com To: python-dev@python.org Subject: [Python-Dev] PEP 292 and templating Is there interest in having a templating module with two functions one for simple substitutions and the other with more tools? The first would be Barry's simple substitutions using only $name or ${name} for templates exposed to the user. The second would extend the first with Cheetah style dotted names for more advanced templates controlled by the programmer. Raymond Hettinger -------------- next part -------------- _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jjl%40pobox.com From pje at telecommunity.com Wed Dec 10 13:17:36 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Dec 10 13:17:49 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: Message-ID: <5.1.1.6.0.20031210113456.02b183c0@telecommunity.com> At 04:48 PM 12/10/03 +0100, Paul Boddie wrote: >Gregory (Grisha) Trubetskoy wrote: > > > > The approach this spec takes is modeled after CGI, which was designed with > > shell scripts in mind and condenses things down to the UNIX primitives of > > stdin, stdout, stderr, environ (and cwd). > >I would have thought that this kind of interface would have been more >suitable between the environment and the container, or possibly between >components within the container. I'm guessing that for mod_python, the proposed interface isn't as suitable, doubtless prompting some of Grisha's concerns. From a mod_python point of view, the proposed interface is "lossy", at least from a performance point of view, and probably also from a power/flexibility point of view. But the flip side is that if this "lossy" interface were available in mod_python, it would actually bring many more users to mod_python, since they'd be able to use a wider variety of frameworks with it. If those users then came to want things that weren't available through the narrow "runCGI" interface, then they could consider doing additional work to use mod_python's native interface. I know that I, for one, would be more likely to experiment with other mod_python capabilities once I had my "foot in the door" via the simple interface. > > On the surface this appears fine, but consider setting an HTTP header. > > Headers do not fit into the above-mentioned primitives, so CGI requires > > the application to send them to stdout. Writing headers to stdout is much > > more cumbersome than passing them in a mapping object of some sort. > >And I can imagine that for many applications in many of the current >frameworks, they would need some kind of "insulating wrapper" to comply with >this interface. Certainly, I don't recall Webware, mod_python, Twisted or >Zope applications sending headers to the same output stream as the data (or >even using an output stream for the headers at all). Zope definitely does, and from Andrew's comments, so does Quixote. Twisted is a web server, so it won't, but I believe it already has a CGI interface for running external programs, that could be used for this purpose (presumably by running the application in a separate thread). Here are example wrappers for Zope 2 and Zope 3 (untested, but based on existing code I use in production (Z2) and dev (Z3)): class Zope2App: def __init__(self, modulename): self.moduleToPublish = modulename def runCGI(self,input,output,errors,environ): from ZPublisher.Publish import publish_module publish_module( self.moduleToPublish, stdin=input, stdout=output, stderr=errors, environ=environ ) class Zope3App: _browser_methods = 'GET','HEAD','POST def __init__(self, publication): self.policy = publication def runCGI(self,input,output,errors,environ): from zope.publisher import http, browser, xmlrpc, publish method = environ.get('REQUEST_METHOD', 'GET').upper() if method in self._browser_methods: if (method == 'POST' and env.get('CONTENT_TYPE', '').lower().startswith('text/xml') ): request_type = xmlrpc.XMLRPCRequest else: request_type = browser.BrowserRequest else: request_type = http.HTTPRequest request = request_type(input, output, environ) request.setPublication(self.policy) publish.publish(request) >This pre-PEP seems to serve an important purpose: it attempts to make a >certain part of the Web request handling "stack" explicit. I'd certainly be >interested in trying to make other parts of that "stack" more obvious, too. >For example, it would be nice to consider the resolution of requests >according to information contained within them, and the dispatching of such >requests to resources. Right now, each framework seems to have its own >ideology which states that requests using particular paths must get resolved >in a particular way - it would be great if an API appeared that let >developers rewire frameworks without resorting to external hacks to get the >desired behaviour. The proposed interface actually allows that too; in fact, it's why environ must be modifiable by the "app". It should be easy to create a "router" app that accepts a runCGI call and forwards it to other application objects implementing the interface. Thus, multiple frameworks, apps, or other objects can be "mounted" within a container even at a single virtual "mount point". From neel at mediapulse.com Wed Dec 10 14:15:10 2003 From: neel at mediapulse.com (Michael C. Neel) Date: Wed Dec 10 14:15:16 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 Message-ID: > I'm guessing that for mod_python, the proposed interface > isn't as suitable, > doubtless prompting some of Grisha's concerns. From a > mod_python point of > view, the proposed interface is "lossy", at least from a > performance point > of view, and probably also from a power/flexibility point of view. > > But the flip side is that if this "lossy" interface were available in > mod_python, it would actually bring many more users to > mod_python, since > they'd be able to use a wider variety of frameworks with it. > If those > users then came to want things that weren't available through > the narrow > "runCGI" interface, then they could consider doing additional > work to use > mod_python's native interface. > > I know that I, for one, would be more likely to experiment with other > mod_python capabilities once I had my "foot in the door" via > the simple > interface. This highlights my two concerns with this PEP, which I've been following the thread on. One is how willing are developers of the current systems to rewrite or provide a wrapper for this new one? Off the top of my head I know mod_python has for it: (it's own) PSP and Publisher, Albatross, Spyce, and Draco. Can we really expect all of these to update to use this new standard? Or do we just want mod_python to expose another interface? Which leads to my other concern; should this even be a concern? The goal here is to update/add to the stdlib. Since the odds of mod_python becoming part of the stdlib are nil, should we even worry about a spec for things like mod_python and Zope? I freely admit I don't "get it" yet, and may be missing the bigger picture. This sounds to me like a Java server type of thing - a generic enough framework when I can take my app from one system to another with no changes needed. While I need my client side to be as flexible as possbible, it's extreamly rare that in pratice it's needed at the server side because it's rare the whole platform changes (and usally when it does it along with a rewrite/upgrade to the app anyway, making keeping the code even less useful). That said, I want anything in the stdlib to jive, so that if I change from one class to another (for the same role), they both expose the same interface. So in that scope, I see something like this being very helpful. Mike From amk at amk.ca Wed Dec 10 17:50:57 2003 From: amk at amk.ca (A.M. Kuchling) Date: Wed Dec 10 17:51:28 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: References: Message-ID: <20031210225057.GA13911@rogue.amk.ca> On Wed, Dec 10, 2003 at 02:15:10PM -0500, Michael C. Neel wrote: > Which leads to my other concern; should this even be a concern? The > goal here is to update/add to the stdlib. Since the odds of mod_python > becoming part of the stdlib are nil, should we even worry about a spec > for things like mod_python and Zope? No, adding to the stdlib is not necessarily the goal. The DB-API isn't represented in the stdlib either, yet it's still useful for ensuring a certain amount of consistency between database modules. Authors of modules can follow the API or not, and they're only responsible to their users about whether they do. Similarly, this PEP is an informational document describing a certain convention that web frameworks can follow or not, as they see fit. And it helps alleviate the O(n**2) problem of connecting various publishing schemes together. Want to run Quixote under Twisted? Go write an adapter. Want to run Webware under SCGI. Go write an adapter. If each piece supported this interface, at least it would be fairly easy to combine tools without having to write a different chunk of adapter code for each possible pair. --amk From pje at telecommunity.com Wed Dec 10 18:42:39 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Dec 10 18:42:46 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: Message-ID: <5.1.1.6.0.20031210181755.02a64ca0@telecommunity.com> At 02:15 PM 12/10/03 -0500, Michael C. Neel wrote: >One is how willing are developers of the current systems to rewrite or >provide a wrapper for this new one? Off the top of my head I know >mod_python has for it: (it's own) PSP and Publisher, Albatross, Spyce, >and Draco. Can we really expect all of these to update to use this new >standard? Or do we just want mod_python to expose another interface? Yes. But note that it's not necessarily the authors of mod_python that have to provide it. Somebody that wants to run PyWCI apps under mod_python could write a PyWCI container that runs under the existing mod_python API. However, somebody would only need to write this once, for everybody to be able to take advantage of it under mod_python. And, other frameworks would need only to expose a PyWCI-compliant 'runCGI' method, to be able to run in that container (assuming that their process model was compatible). >Which leads to my other concern; should this even be a concern? The >goal here is to update/add to the stdlib. That's a minor and mostly tangential concern for the proposal as such. I posted the proposal here before putting it out in the wider world of python-list, because: 1) the proposal offers some direction for an interface between any new stdlib container pieces and any application-like pieces 2) There's lots of web framework and container authors here, who presumably have some interest in Python "web standards". So, I assumed that the best peer review for early feedback would be found here. So, my goals for the proposal are really orthogonal to the standard library goals of the Web-SIG, but are nonetheless of interest to the Web-SIG membership, if that makes sense. >I freely admit I don't "get it" yet, and may be missing the bigger >picture. This sounds to me like a Java server type of thing - a generic >enough framework when I can take my app from one system to another with >no changes needed. Assuming that your threading and/or process model are compatible, yes, you should have your choice of containers for physical deployment of the app. But there are bigger gains than that to be had. See below. > While I need my client side to be as flexible as >possbible, it's extreamly rare that in pratice it's needed at the server >side because it's rare the whole platform changes (and usally when it >does it along with a rewrite/upgrade to the app anyway, making keeping >the code even less useful). That's all true, but not the point of the proposal. The issue is user choice when initially *selecting* the container. Right now, your runtime platform needs can drastically affect your options for what kind of framework you can use, because what frameworks you can use depends heavily on what kind of runtime container you need to support. With widespread adoption of PyWCI, your container choice would not significantly narrow your framework choice, and you would also have the option of mixing frameworks by using a PyWCI-based request router. So, it's not so much about being able to *move* your application (although it's nice to know you can "move up" or "move sideways" as needed), as it is about being able to have more choices in the first place. The thing that creates user uncertainty about Python web programming right now is *not* that there are dozens of choices. It's that you have to pick *one*, and then you're probably stuck with it. And *none* of your learning or runtime environment may stay with you if you switch. The mere *existence* of a widely-supported container interface will be a significant peace-of-mind booster for PHB's and developers alike. >That said, I want anything in the stdlib to jive, so that if I change >from one class to another (for the same role), they both expose the same >interface. So in that scope, I see something like this being very >helpful. Yes, and this ties into my point about having a widely-supported "standard". But, my intent is to bootstrap the standard into widespread use, without necessarily going through the stdlib first. In the past, Guido has seemed to me to prefer to base the stdlib on "de facto" standards representing community experience, over "de jure" standards representing what people think might be a good idea. Thus, if PyWCI were widely implemented, that would be in itself a justification for its use in the standard library, and thus beneficial to the Web-SIG's efforts in that regard. From grisha at modpython.org Wed Dec 10 23:52:45 2003 From: grisha at modpython.org (Gregory (Grisha) Trubetskoy) Date: Wed Dec 10 23:52:48 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031210225057.GA13911@rogue.amk.ca> References: <20031210225057.GA13911@rogue.amk.ca> Message-ID: <20031210233452.D92881@onyx.ispol.com> On Wed, 10 Dec 2003, A.M. Kuchling wrote: > Similarly, this PEP is an informational document describing a certain > convention that web frameworks can follow or not, as they see fit. And it > helps alleviate the O(n**2) problem of connecting various publishing schemes > together. Want to run Quixote under Twisted? Go write an adapter. Want to > run Webware under SCGI. Go write an adapter. If each piece supported this > interface, at least it would be fairly easy to combine tools without having > to write a different chunk of adapter code for each possible pair. The PEP will help with this problem, and as such I'm willing to support it, but at the same I won't with all honesty be albe to say "problem solved" in the best possible way or even that we are moving in that direction. (But I think we agree with Phillip on this). I really liked the problem statement in the PEP; perhaps we can add a note to it that the problem can have a much more comprehensive solution and that the solution described, although simple, isn't the most efficient and in many ways defficient. This will shut up people like me who will read the PEP and say "But this is just the old lame CGI?". The real solution IMHO opinion is going to be something similar to Java Servlet specification. It's a pretty complex issue, probably enough so to start a whole separate SIG on. Grisha From pje at telecommunity.com Thu Dec 11 00:10:31 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Dec 11 00:08:45 2003 Subject: [Web-SIG] Pre-PEP: Python Web Container Interface v1.0 In-Reply-To: <20031210233452.D92881@onyx.ispol.com> References: <20031210225057.GA13911@rogue.amk.ca> <20031210225057.GA13911@rogue.amk.ca> Message-ID: <5.1.0.14.0.20031210235844.03b8aec0@mail.telecommunity.com> At 11:52 PM 12/10/03 -0500, Gregory (Grisha) Trubetskoy wrote: >I really liked the problem statement in the PEP; perhaps we can add a note >to it that the problem can have a much more comprehensive solution and >that the solution described, although simple, isn't the most efficient and >in many ways defficient. This will shut up people like me who will read >the PEP and say "But this is just the old lame CGI?". I'll make sure this viewpoint is included when I do the next draft (probably this weekend). It will probably be by saying something like, "this spec doesn't give the application any direct control over a container, and so may be unsatisfactory for some more-demanding applications. In practice, such applications today must interact directly with a web server, as via mod_python, or via the internal API of a web server written in Python. It is possible that future versions of this specification, or another specification, will address these more demanding needs. "However, in the interests of providing the greatest good to the greatest number as soon as practical, this version of the specification will focus on simplicity and ease of implementation (to encourage rapid adoption), and high portability (to encourage widespread adoption). Once this occurs, container and application/framework developers will be in a better position to define requirements for a complementary application-to-container interface to supplement this container-to-application interface." Something like that, anyway. I'll probably work that in with some of the threading stuff. So far, there's going to be a new Goals/Scope section that'll deal with these and other scope issues that people found confusing. There'll be a section added on threading and process model issues. There'll need to be an expanded rationale regarding the whole dictionary thing. And, I'll add a "Discussion and Dissention" section to cover the positive and negative feedback so far.