From iwan at reahl.org Fri Jul 4 13:13:57 2008 From: iwan at reahl.org (Iwan Vosloo) Date: Fri, 04 Jul 2008 13:13:57 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack Message-ID: <1215170037.17590.29.camel@easymoney> Hi, Many web frameworks and ORM tools have the need to propagate data depending on some or other context within which a request is dealt with. Passing it all via parameters to every nook of your code is cumbersome. A lot of the frameworks use a thread local context to solve this problem. I'm assuming these are based on threading.local. (See, for example: http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual ) Such usage assumes that one request is served per thread. This is not necessarily the case. (Twisted would perhaps be an example, but I have not checked how the twisted people deal with the issue.) The bottom line for me is that if you build a WSGI app, you'd not want to restrict it to being able to run in a one request-per-thread setup. So I've been playing with the idea to use something that creates a context local to the current call stack instead. I.e. a context (dict) which is inserted into the call stack at some point, and can be accessed by any method/function deeper in the stack. The normal use case for this is to propagate a database connection. But it can also be used to propagate other things, such as information about the user who is currently logged in, etc. Since this is one way of creating objects that are global to a context (the call stack), I'm sure it is in some ways evil and can be abused. But that criticism can be levelled against the thread-local solution too... I attach some code to illustrate - and would appreciate some feedback on the idea and its implementation. -i -------------- next part -------------- A non-text attachment was scrubbed... Name: callcontext.py Type: text/x-python Size: 3397 bytes Desc: not available URL: From faassen at startifact.com Fri Jul 4 13:31:30 2008 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 4 Jul 2008 13:31:30 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215170037.17590.29.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> Message-ID: <8928d4e90807040431nb12790bwb6f9084772599019@mail.gmail.com> Hi there, 2008/7/4 Iwan Vosloo : [snip] > A lot of the frameworks use a thread local context to solve this > problem. I'm assuming these are based on threading.local. > > (See, for example: > http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual ) scoped_session is actually, I think, a bad example, as SQLAlchemy uses the thread id to scope things per session, not threading.local. As long as there's a way to uniquely identify "context", scoped_session could also be scoped differently, as long as it has a way identify the context that doesn't need any non-global parameters. Zope 3 may be a better example, as it does use thread locals to scope things per thread (I believe this requirement by Zope was actually one of the reasons this feature was moved into Python). There may also be other parts of SQLAlchemy that indeed use thread local variables. Regards, Martijn From iwan at reahl.org Fri Jul 4 13:37:15 2008 From: iwan at reahl.org (Iwan Vosloo) Date: Fri, 04 Jul 2008 13:37:15 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <8928d4e90807040431nb12790bwb6f9084772599019@mail.gmail.com> References: <1215170037.17590.29.camel@easymoney> <8928d4e90807040431nb12790bwb6f9084772599019@mail.gmail.com> Message-ID: <1215171435.17590.32.camel@easymoney> Hi Martijn, On Fri, 2008-07-04 at 13:31 +0200, Martijn Faassen wrote: > scoped_session is actually, I think, a bad example, as SQLAlchemy uses > the thread id to scope things per session, not threading.local. As > long as there's a way to uniquely identify "context", scoped_session > could also be scoped differently, as long as it has a way identify the > context that doesn't need any non-global parameters. > > Zope 3 may be a better example, as it does use thread locals to scope > things per thread (I believe this requirement by Zope was actually one > of the reasons this feature was moved into Python). There may also be > other parts of SQLAlchemy that indeed use thread local variables. Point taken, I'm not familiar with the implementation of scoped_session. But still, it is the same idea as that implemented in threading.local, isn't it? -i From manlio_perillo at libero.it Fri Jul 4 13:42:07 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 04 Jul 2008 13:42:07 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215170037.17590.29.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> Message-ID: <486E0C8F.90503@libero.it> Iwan Vosloo ha scritto: > Hi, > > Many web frameworks and ORM tools have the need to propagate data > depending on some or other context within which a request is dealt with. > Passing it all via parameters to every nook of your code is cumbersome. > > A lot of the frameworks use a thread local context to solve this > problem. I'm assuming these are based on threading.local. > > (See, for example: > http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual ) > > Such usage assumes that one request is served per thread. > > This is not necessarily the case. (Twisted would perhaps be an example, > but I have not checked how the twisted people deal with the issue.) > The natural solution with WSGI is to store objects in the environ dictionary. In fact in my web applications I always pass the environ dictionary explicitly to every functions. > [...] Manlio Perillo From iwan at reahl.org Fri Jul 4 13:56:24 2008 From: iwan at reahl.org (Iwan Vosloo) Date: Fri, 04 Jul 2008 13:56:24 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <486E0C8F.90503@libero.it> References: <1215170037.17590.29.camel@easymoney> <486E0C8F.90503@libero.it> Message-ID: <1215172584.17590.35.camel@easymoney> On Fri, 2008-07-04 at 13:42 +0200, Manlio Perillo wrote: > Iwan Vosloo ha scritto: > > Hi, > > > > Many web frameworks and ORM tools have the need to propagate data > > depending on some or other context within which a request is dealt with. > > Passing it all via parameters to every nook of your code is cumbersome. > > > The natural solution with WSGI is to store objects in the environ > dictionary. > > In fact in my web applications I always pass the environ dictionary > explicitly to every functions. But, this passing of the environ dictionary to every function in you web app is exactly what I'd want to avoid? -i From faassen at startifact.com Fri Jul 4 14:05:15 2008 From: faassen at startifact.com (Martijn Faassen) Date: Fri, 4 Jul 2008 14:05:15 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215171435.17590.32.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> <8928d4e90807040431nb12790bwb6f9084772599019@mail.gmail.com> <1215171435.17590.32.camel@easymoney> Message-ID: <8928d4e90807040505p1491f92ve820d1507d30f1a2@mail.gmail.com> Hey, On Fri, Jul 4, 2008 at 1:37 PM, Iwan Vosloo wrote: > On Fri, 2008-07-04 at 13:31 +0200, Martijn Faassen wrote: >> scoped_session is actually, I think, a bad example, as SQLAlchemy uses >> the thread id to scope things per session, not threading.local. As >> long as there's a way to uniquely identify "context", scoped_session >> could also be scoped differently, as long as it has a way identify the >> context that doesn't need any non-global parameters. >> >> Zope 3 may be a better example, as it does use thread locals to scope >> things per thread (I believe this requirement by Zope was actually one >> of the reasons this feature was moved into Python). There may also be >> other parts of SQLAlchemy that indeed use thread local variables. > > Point taken, I'm not familiar with the implementation of scoped_session. > But still, it is the same idea as that implemented in threading.local, > isn't it? Yes, I think so, except that scoped_session is more flexible than that and could actually be convinced to use your technique for identification of scope as well. Regards, Martijn From manlio_perillo at libero.it Fri Jul 4 14:17:55 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 04 Jul 2008 14:17:55 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215172584.17590.35.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> <486E0C8F.90503@libero.it> <1215172584.17590.35.camel@easymoney> Message-ID: <486E14F3.10603@libero.it> Iwan Vosloo ha scritto: > On Fri, 2008-07-04 at 13:42 +0200, Manlio Perillo wrote: >> Iwan Vosloo ha scritto: >>> Hi, >>> >>> Many web frameworks and ORM tools have the need to propagate data >>> depending on some or other context within which a request is dealt with. >>> Passing it all via parameters to every nook of your code is cumbersome. >>> >> The natural solution with WSGI is to store objects in the environ >> dictionary. >> >> In fact in my web applications I always pass the environ dictionary >> explicitly to every functions. > > But, this passing of the environ dictionary to every function in you web > app is exactly what I'd want to avoid? > Yes, but you only need to pass the environ dictionary and not N paramerers. I think this is a good compromise. Using thread local storage is not the solution to every problem (as you have noted it can not be used when the server handle more then one request per thread). > -i > Manlio Perillo From matt at pollenation.net Fri Jul 4 14:39:25 2008 From: matt at pollenation.net (Matt Goodall) Date: Fri, 04 Jul 2008 13:39:25 +0100 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215170037.17590.29.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> Message-ID: <486E19FD.7070207@pollenation.net> Iwan Vosloo wrote: > Hi, > > Many web frameworks and ORM tools have the need to propagate data > depending on some or other context within which a request is dealt with. > Passing it all via parameters to every nook of your code is cumbersome. > > A lot of the frameworks use a thread local context to solve this > problem. I'm assuming these are based on threading.local. > > (See, for example: > http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual ) > > Such usage assumes that one request is served per thread. > > This is not necessarily the case. (Twisted would perhaps be an example, > but I have not checked how the twisted people deal with the issue.) You're correct that Twisted Web does not allocate a thread per request. All requests are handled by an event loop in the main thread. However, the WSGI request handled in Twisted does actually spawn a thread to run the WSGI application because most WSGI applications are blocking. > > The bottom line for me is that if you build a WSGI app, you'd not want > to restrict it to being able to run in a one request-per-thread setup. > > So I've been playing with the idea to use something that creates a > context local to the current call stack instead. I.e. a context (dict) > which is inserted into the call stack at some point, and can be accessed > by any method/function deeper in the stack. In Twisted, the call stack tends to gets fragmented during a sequence of asynchronous calls because of its callback mechanism. Basically, you're hopping in and out of the Twisted reactor (the event mainloop) all the time. Leaving something in the call stack would not work at all. The ideal solution is, of course, to pass everything around to whatever needs it. However, there's really tedious at times. Whatever the architecture of the web server there is always a request or, in case of WSGI, an env dict. Therefore, request-scope objects should be associated with the request. > > The normal use case for this is to propagate a database connection. But > it can also be used to propagate other things, such as information about > the user who is currently logged in, etc. > > Since this is one way of creating objects that are global to a context > (the call stack), I'm sure it is in some ways evil and can be abused. > But that criticism can be levelled against the thread-local solution > too... Yep, thread and call stack locals are both bad. Think in terms of request locals instead and things start getting better. > > I attach some code to illustrate - and would appreciate some feedback on > the idea and its implementation. > > -i > > > ------------------------------------------------------------------------ > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/matt%40pollenation.net -- Matt Goodall Technical Director, Pollenation Internet Ltd Registered Number: 4382123 Registered Office: 237 Lidgett Lane, Leeds, West Yorkshire, LS17 6QR A member of the Brunswick MCL Group of Companies w: http://www.pollenation.net/ e: matt at pollenation.net t: +44 (0) 113 2252500 This message may be confidential and the views expressed may not reflect the views of my employers. Please read http://eudaimon-group.com/email if you are uncertain what this means. From iwan at reahl.org Fri Jul 4 15:23:09 2008 From: iwan at reahl.org (Iwan Vosloo) Date: Fri, 04 Jul 2008 15:23:09 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <486E19FD.7070207@pollenation.net> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> Message-ID: <1215177789.17590.49.camel@easymoney> On Fri, 2008-07-04 at 13:39 +0100, Matt Goodall wrote: > Iwan Vosloo wrote: > You're correct that Twisted Web does not allocate a thread per request. > All requests are handled by an event loop in the main thread. > In Twisted, the call stack tends to gets fragmented during a sequence of > asynchronous calls because of its callback mechanism. Basically, you're > hopping in and out of the Twisted reactor (the event mainloop) all the > time. Leaving something in the call stack would not work at all. Couldn't you put something in the call stack each time in the main loop, before calling a callback (which will be popped again when that callback returns to the main loop)? > The ideal solution is, of course, to pass everything around to whatever > needs it. However, there's really tedious at times. > > Whatever the architecture of the web server there is always a request > or, in case of WSGI, an env dict. Therefore, request-scope objects > should be associated with the request. True, but even passing a request or env dict around to everyone gets tedious don't you think? -i From benji at benjiyork.com Fri Jul 4 16:13:31 2008 From: benji at benjiyork.com (Benji York) Date: Fri, 4 Jul 2008 10:13:31 -0400 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215177789.17590.49.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> Message-ID: On Fri, Jul 4, 2008 at 9:23 AM, Iwan Vosloo wrote: > On Fri, 2008-07-04 at 13:39 +0100, Matt Goodall wrote: >> The ideal solution is, of course, to pass everything around to whatever >> needs it. However, there's really tedious at times. >> >> Whatever the architecture of the web server there is always a request >> or, in case of WSGI, an env dict. Therefore, request-scope objects >> should be associated with the request. > > True, but even passing a request or env dict around to everyone gets > tedious don't you think? It can. Zope 3 makes a pretty good compromise here. The "top level" object involved in handing the request -- a view -- gets the request object explicitly passed as a parameter. If the view wants to pass the request to function calls or other objects, then it's free to do so. But, if at some point you find yourself without a reference to the current request and really need it, you can get it "out of thin air" by calling (essentially) get_request(). The Zope 3 publisher precesses requests using a thread pool, so get_request() is implemented by stashing the request object in the tread-local storage prior to processing the request and digging it back out if requested. Other implementations could store the request somewhere else, but the idea is the same. -- Benji York From fumanchu at aminus.org Fri Jul 4 19:52:56 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Fri, 4 Jul 2008 10:52:56 -0700 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: References: <1215170037.17590.29.camel@easymoney><486E19FD.7070207@pollenation.net><1215177789.17590.49.camel@easymoney> Message-ID: Benji York wrote: > On Fri, Jul 4, 2008 at 9:23 AM, Iwan Vosloo wrote: > > On Fri, 2008-07-04 at 13:39 +0100, Matt Goodall wrote: > >> The ideal solution is, of course, to pass everything around to > whatever > >> needs it. However, there's really tedious at times. > >> > >> Whatever the architecture of the web server there is always a > request > >> or, in case of WSGI, an env dict. Therefore, request-scope objects > >> should be associated with the request. > > > > True, but even passing a request or env dict around to everyone gets > > tedious don't you think? > > It can. Zope 3 makes a pretty good compromise here. The "top level" > object involved in handing the request -- a view -- gets the request > object explicitly passed as a parameter. If the view wants to pass the > request to function calls or other objects, then it's free to do so. > > But, if at some point you find yourself without a reference to the > current request and really need it, you can get it "out of thin air" by > calling (essentially) get_request(). > > The Zope 3 publisher precesses requests using a thread pool, so > get_request() is implemented by stashing the request object in the > tread-local storage prior to processing the request and digging it back > out if requested. > > Other implementations could store the request somewhere else, but the > idea is the same. CherryPy does something similar. The "top level" object involved in handing the request -- cherrypy.serving -- gets the request and response objects set as attributes. But instead of calling get_request() as in Zope 3, there are proxy objects sitting at cherrypy.request and cherrypy.response which shuttle getattr and setattr to cherrypy.serving.request/response. That allows app code to just "import cherrypy" and have access everywhere. Now, cherrypy.serving _is_ a threadlocal object. But I don't imagine it would be difficult for a non-threaded HTTP server to replace cherrypy.serving with some other-context-local if they liked. Robert Brewer fumanchu at aminus.org From ianb at colorstudy.com Fri Jul 4 21:10:27 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 04 Jul 2008 14:10:27 -0500 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215170037.17590.29.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> Message-ID: <486E75A3.6040905@colorstudy.com> Iwan Vosloo wrote: > Many web frameworks and ORM tools have the need to propagate data > depending on some or other context within which a request is dealt with. > Passing it all via parameters to every nook of your code is cumbersome. > > A lot of the frameworks use a thread local context to solve this > problem. I'm assuming these are based on threading.local. > > (See, for example: > http://www.sqlalchemy.org/docs/05/session.html#unitofwork_contextual ) > > Such usage assumes that one request is served per thread. > > This is not necessarily the case. (Twisted would perhaps be an example, > but I have not checked how the twisted people deal with the issue.) The Spawning server (http://ulaluma.com/pyx/archives/2008/06/spawning_01_rel.html) would indeed get things mixed up this way, as uses greenlets to make (at least some) blocking calls async. So it would encounter this problem full-force. To throw another wrench in things, with the Paste/WebError evalexception interactive exception handler, it restores this thread-local context so you can later execute expressions in the same context. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From jason.baker at ttu.edu Sun Jul 6 09:19:35 2008 From: jason.baker at ttu.edu (Baker, Jason) Date: Sun, 6 Jul 2008 02:19:35 -0500 Subject: [Web-SIG] Web-SIG Digest, Vol 57, Issue 2 References: Message-ID: <42A63B2A29346D489E0D84692CC8F7BF029320@CALYPSO.net.ttu.edu> help -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 2216 bytes Desc: not available URL: From matt at pollenation.net Mon Jul 7 14:48:03 2008 From: matt at pollenation.net (Matt Goodall) Date: Mon, 07 Jul 2008 13:48:03 +0100 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <1215177789.17590.49.camel@easymoney> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> Message-ID: <48721083.7060402@pollenation.net> Iwan Vosloo wrote: > On Fri, 2008-07-04 at 13:39 +0100, Matt Goodall wrote: >> Iwan Vosloo wrote: >> You're correct that Twisted Web does not allocate a thread per request. >> All requests are handled by an event loop in the main thread. > >> In Twisted, the call stack tends to gets fragmented during a sequence of >> asynchronous calls because of its callback mechanism. Basically, you're >> hopping in and out of the Twisted reactor (the event mainloop) all the >> time. Leaving something in the call stack would not work at all. > > Couldn't you put something in the call stack each time in the main loop, > before calling a callback (which will be popped again when that callback > returns to the main loop)? Yes, that's probably achievable by subclassing Deferred (the callback class) and using a closure to reinstate the context before the callback function is called. Perhaps I'll give it a go out of interest. However, I'm not convinced it's a good idea and I suspect the Twisted developers would sooner pluck out their eyeballs (or worse still, mine!) than allow it into Twisted core ;-). > >> The ideal solution is, of course, to pass everything around to whatever >> needs it. However, there's really tedious at times. >> >> Whatever the architecture of the web server there is always a request >> or, in case of WSGI, an env dict. Therefore, request-scope objects >> should be associated with the request. > > True, but even passing a request or env dict around to everyone gets > tedious don't you think? Yes, it can be tedious but I believe explicit arg passing is necessary to make code readable, testable and reusable. If it's web-related code then give it the request, it will almost certainly need it. Otherwise, don't. I would even advocate extracting request-scope objects, e.g. a database connection, the current user, etc, as early as possible and passing them around explicitly (along with the request, if necessary). I've made the mistake of relying on magic contexts in the past. I'm still trying to fix things. - Matt -- Matt Goodall Technical Director, Pollenation Internet Ltd Registered Number: 4382123 Registered Office: 237 Lidgett Lane, Leeds, West Yorkshire, LS17 6QR A member of the Brunswick MCL Group of Companies w: http://www.pollenation.net/ e: matt at pollenation.net t: +44 (0) 113 2252500 This message may be confidential and the views expressed may not reflect the views of my employers. Please read http://eudaimon-group.com/email if you are uncertain what this means. From iwan at reahl.org Mon Jul 7 15:10:12 2008 From: iwan at reahl.org (Iwan Vosloo) Date: Mon, 07 Jul 2008 15:10:12 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <48721083.7060402@pollenation.net> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> <48721083.7060402@pollenation.net> Message-ID: <1215436212.6811.79.camel@easymoney> On Mon, 2008-07-07 at 13:48 +0100, Matt Goodall wrote: > Iwan Vosloo wrote: > > On Fri, 2008-07-04 at 13:39 +0100, Matt Goodall wrote: > >> The ideal solution is, of course, to pass everything around to whatever > >> needs it. However, there's really tedious at times. > >> > >> Whatever the architecture of the web server there is always a request > >> or, in case of WSGI, an env dict. Therefore, request-scope objects > >> should be associated with the request. > > > > True, but even passing a request or env dict around to everyone gets > > tedious don't you think? > > Yes, it can be tedious but I believe explicit arg passing is necessary > to make code readable, testable and reusable. > > If it's web-related code then give it the request, it will almost > certainly need it. Otherwise, don't. > > I would even advocate extracting request-scope objects, e.g. a database > connection, the current user, etc, as early as possible and passing them > around explicitly (along with the request, if necessary). I understand the explicit passing arguments. However, if you pass a particular argument to _each and every_ little method, readability/testability/reusability are adversely affected too. And sometimes you need to pass, say the request, to the strangest little methods just because one of them somewhere needs to do something with the request which you did not anticipate. It may not even be logically related to what that method does. Or, worse, it may be that you sit with a bunch of polymorphic methods, and one of their implementations needs to have a request - forcing you to add a request parameter to all of them. Bottom line for me is that if you add, say "request" to a method signature, it must make sense from a caller's perspective to have to give a request. (And I want to add: to give a request to a method named xxx.) Otherwise the interface of the method contains illogical bits necessitated by its implementation. Isn't that bad too? -i From manlio_perillo at libero.it Mon Jul 7 15:40:15 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 07 Jul 2008 15:40:15 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <48721083.7060402@pollenation.net> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> <48721083.7060402@pollenation.net> Message-ID: <48721CBF.2020103@libero.it> Matt Goodall ha scritto: > [...] >> True, but even passing a request or env dict around to everyone gets >> tedious don't you think? > > Yes, it can be tedious but I believe explicit arg passing is necessary > to make code readable, testable and reusable. > > If it's web-related code then give it the request, it will almost > certainly need it. Otherwise, don't. > > I would even advocate extracting request-scope objects, e.g. a database > connection, the current user, etc, as early as possible and passing them > around explicitly (along with the request, if necessary). > This exactly what I too have realized! I'm developing a WSGI framework with all these (and other) ideas: http://hg.mperillo.ath.cx/wsgix Its still not documented, so I have not yet made an official announcement. The main design goal is to keep the level of the interface as low level as possible. I don't like additional interfaces (like Request and Response) objects around the WSGI dictionary, and I don't like frameworks like Django that completely hides the WSGI interface. > [...] Manlio Perillo From manlio_perillo at libero.it Mon Jul 7 18:36:14 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 07 Jul 2008 18:36:14 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <487240D1.8030403@colorstudy.com> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> <48721083.7060402@pollenation.net> <48721CBF.2020103@libero.it> <487240D1.8030403@colorstudy.com> Message-ID: <487245FE.30706@libero.it> Ian Bicking ha scritto: > Manlio Perillo wrote: I'm adding web-sig in Cc. > [...] >> I'm developing a WSGI framework with all these (and other) ideas: >> http://hg.mperillo.ath.cx/wsgix >> >> Its still not documented, so I have not yet made an official >> announcement. >> >> The main design goal is to keep the level of the interface as low >> level as possible. >> >> I don't like additional interfaces (like Request and Response) objects >> around the WSGI dictionary, and I don't like frameworks like Django >> that completely hides the WSGI interface. > > Have you tried webob? My first run as Paste avoided wrappers around > those objects, but an object interface has been very helpful. > I have not tried it, but I have read the code (as I have read the code of Paste). In principle I'm against using additional interface, and one of the reason I wrote wsgix is to have a prof of concept, for trying to understand if it is feasible to write a WSGI application using an alternative framework. wsgix (+ mod_wsgi for Nginx) has the same role as Paste, but I have decided to use a rather different approach. As an example, in Paste you have choosed to using config dictionary for middleware configuration, that is, you have middleware factories. In wsgix it is very different. As an example: http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/contrib/messages.py http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/contrib/error_page.py There are no factories. The configuration is read (and globally cached) at request time from the environ dictionary. With Nginx, configuration parameters can be defined in the server configuration. There is an helper class: http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/options.py that helps with the parsing. There is also a middleware: http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/conf/middleware.py that reads the configuration from a YAML file, and merge it into the environ dictionary. Of course it's all a matter of personal taste :). The goal is to have the possibility to write "truly" reusable middlewares, that are easy to "plug" inside any WSGI server (almost all of configuration parameters have default values). Manlio Perillo From ianb at colorstudy.com Mon Jul 7 19:54:28 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 07 Jul 2008 12:54:28 -0500 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <487245FE.30706@libero.it> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> <48721083.7060402@pollenation.net> <48721CBF.2020103@libero.it> <487240D1.8030403@colorstudy.com> <487245FE.30706@libero.it> Message-ID: <48725854.1060305@colorstudy.com> Manlio Perillo wrote: > Ian Bicking ha scritto: >> Manlio Perillo wrote: > > I'm adding web-sig in Cc. > >> [...] >>> I'm developing a WSGI framework with all these (and other) ideas: >>> http://hg.mperillo.ath.cx/wsgix >>> >>> Its still not documented, so I have not yet made an official >>> announcement. >>> >>> The main design goal is to keep the level of the interface as low >>> level as possible. >>> >>> I don't like additional interfaces (like Request and Response) >>> objects around the WSGI dictionary, and I don't like frameworks like >>> Django that completely hides the WSGI interface. >> >> Have you tried webob? My first run as Paste avoided wrappers around >> those objects, but an object interface has been very helpful. >> > > I have not tried it, but I have read the code (as I have read the code > of Paste). > > In principle I'm against using additional interface, and one of the > reason I wrote wsgix is to have a prof of concept, for trying to > understand if it is feasible to write a WSGI application using an > alternative framework. > > wsgix (+ mod_wsgi for Nginx) has the same role as Paste, but I have > decided to use a rather different approach. > > As an example, in Paste you have choosed to using config dictionary for > middleware configuration, that is, you have middleware factories. I think this is a red herring. WebOb specifically doesn't do anything related to configuration or the setup of the stack. What it does do is stuff like: expires = http.format_time(0) http.generate_cookie( environ, headers, name, '', expires=expires, domain=cookie_domain(environ), path=path, max_age=0) which would be resp.delete_cookie(name) (well, cookie_domain seems to be derived from a setting, but that's mostly unrelated). This isn't a particularly substantial difference, but these small conveniences add up. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From fumanchu at aminus.org Mon Jul 7 21:07:22 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 7 Jul 2008 12:07:22 -0700 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <48721083.7060402@pollenation.net> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net><1215177789.17590.49.camel@easymoney> <48721083.7060402@pollenation.net> Message-ID: Matt Goodall wrote: > Yes, it can be tedious but I believe explicit arg passing > is necessary to make code readable, testable and reusable. > ... > I've made the mistake of relying on magic contexts in the > past. I'm still trying to fix things. Can you elaborate? Robert Brewer fumanchu at aminus.org From manlio_perillo at libero.it Mon Jul 7 21:31:05 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 07 Jul 2008 21:31:05 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <48725854.1060305@colorstudy.com> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> <48721083.7060402@pollenation.net> <48721CBF.2020103@libero.it> <487240D1.8030403@colorstudy.com> <487245FE.30706@libero.it> <48725854.1060305@colorstudy.com> Message-ID: <48726EF9.6060206@libero.it> Ian Bicking ha scritto: > Manlio Perillo wrote: > [...] >> >> As an example, in Paste you have choosed to using config dictionary >> for middleware configuration, that is, you have middleware factories. > > I think this is a red herring. WebOb specifically doesn't do anything > related to configuration or the setup of the stack. What it does do is > stuff like: > > expires = http.format_time(0) > http.generate_cookie( > environ, headers, name, '', expires=expires, > domain=cookie_domain(environ), path=path, > max_age=0) > > which would be resp.delete_cookie(name) (well, cookie_domain seems to be > derived from a setting, but that's mostly unrelated). This isn't a > particularly substantial difference, but these small conveniences add up. > As I have said, this is a personal taste, I don't like the "architecture" used by WebOb and prefer to directly use the environ dictionary without introducing other abstractions. This is possible, I'm writing a "not simple" application using wsgix. I'm still evaluating if I can reuse WebOb parsing functions (and this would be a great thing: I think that we *really* need a package with *only* low *level* parsing functions for the HTTP protocol). From what I can see, WebOb *does* not offer a low level interface for the parsers: you *have* to use the Request object. I really like multilevel architectures, instead. Manlio Perillo From manlio_perillo at libero.it Mon Jul 7 21:58:59 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 07 Jul 2008 21:58:59 +0200 Subject: [Web-SIG] help with the implementation of a WSGI middleware Message-ID: <48727583.2020302@libero.it> As I have informally written in previous messages, I'm writing a small WSGI framework. The framework is available here (a Mercurial repository): http://hg.mperillo.ath.cx/wsgix In wsgix I have written two middleware that I find interesting since I have learned a bit more about how to write middlewares (and Eby concerns about WSGI 1.0). One of this middleware is wsgix.contrib.messages: http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/contrib/messages.py The purpose of this middleware is to support sending messages to a client. The idea originates from Django, however in wsgix I use cookies (since I find not a really good idea to use a database for this) and messages can be sent to every user (Django sends messages only to authenticated users, if I'm correct). The wsgix support for messages consist of two parts. The first is the implementation of a simple API for sending an retrieving messages (only Unicode strings are supported): message_push(environ, message) message_pop(environ) # this returns and remove the messages These functions does not actually manage cookies: the messages are stored in environ['wsgix.messages'], as a list. The latter is the implementation of a middleware that take care of cookies handling. The problem is that, if I have well understood, a middleware is allowed to entirely replace the environ dictionary. This means that if such a middleware is presend before the messages middleware is called, messages are not sent to the client. Is this true? In this case the first solution is to use this middleware as a decorator, instead of a full middleware. The other solution is to implement an additional interface: message_push(environ, start_response, headers, message) that explicitly handle the cookie (this is possible but harder to implement and less flexibile to use). Any suggestions? Thanks Manlio Perillo From ianb at colorstudy.com Mon Jul 7 22:45:40 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 07 Jul 2008 15:45:40 -0500 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <48726EF9.6060206@libero.it> References: <1215170037.17590.29.camel@easymoney> <486E19FD.7070207@pollenation.net> <1215177789.17590.49.camel@easymoney> <48721083.7060402@pollenation.net> <48721CBF.2020103@libero.it> <487240D1.8030403@colorstudy.com> <487245FE.30706@libero.it> <48725854.1060305@colorstudy.com> <48726EF9.6060206@libero.it> Message-ID: <48728074.50103@colorstudy.com> Manlio Perillo wrote: > Ian Bicking ha scritto: >> Manlio Perillo wrote: >> [...] >>> >>> As an example, in Paste you have choosed to using config dictionary >>> for middleware configuration, that is, you have middleware factories. >> >> I think this is a red herring. WebOb specifically doesn't do anything >> related to configuration or the setup of the stack. What it does do >> is stuff like: >> >> expires = http.format_time(0) >> http.generate_cookie( >> environ, headers, name, '', expires=expires, >> domain=cookie_domain(environ), path=path, >> max_age=0) >> >> which would be resp.delete_cookie(name) (well, cookie_domain seems to >> be derived from a setting, but that's mostly unrelated). This isn't a >> particularly substantial difference, but these small conveniences add up. >> > > As I have said, this is a personal taste, I don't like the > "architecture" used by WebOb and prefer to directly use the environ > dictionary without introducing other abstractions. > This is possible, I'm writing a "not simple" application using wsgix. > > > I'm still evaluating if I can reuse WebOb parsing functions (and this > would be a great thing: I think that we *really* need a package with > *only* low *level* parsing functions for the HTTP protocol). > > From what I can see, WebOb *does* not offer a low level interface for > the parsers: you *have* to use the Request object. > > I really like multilevel architectures, instead. This was the deliberate approach of Paste, and it does have several functions for doing things similar to how you describe. As I said, I went down exactly this path, but I think WebOb solves the problem better. You can think of WebOb as a way of currying functions. All the request functions take an environ argument, curried through instantiation of webob.Request. All response functions take status/headers/app_iter, curried through webob.Response. State is never held outside the environment or the status/headers/app_iter of the response. So think of webob.Request as the module of request-parsing routines, and webob.Response as the module of response-parsing routines. (There are underlying functions for things like parsing dates, but they are only exposed through those classes.) -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From pje at telecommunity.com Mon Jul 7 23:06:20 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 07 Jul 2008 17:06:20 -0400 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <48727583.2020302@libero.it> References: <48727583.2020302@libero.it> Message-ID: <20080707210538.E75E43A403A@sparrow.telecommunity.com> At 09:58 PM 7/7/2008 +0200, Manlio Perillo wrote: >In this case the first solution is to use this middleware as a >decorator, instead of a full middleware. This is the correct way to implement non-transparent middleware; i.e., so-called middleware which is in fact an application API. See: http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html for more about this. Basically, if a piece of middleware has to be there for the application to run, it's not really "middleware"; it's a misnamed decorator. In the original WSGI spec, I overestimated the usefulness of adding extension APIs to the environ... or more likely, I went along with some of Ian's overenthusiasm for the idea. ;-) Extension APIs in the environ just mean you have to write your code to handle the case where the API isn't there -- in which case you might as well have used a library. Extension APIs really only make sense if they are true *server* features, not application features; otherwise, you are better off using a library rather than "middleware" per se. Under WSGI 2.0, it's even easier since you don't need decorators to manipulate your response: you can just "return someapi(...)" where the "..." is whatever you were going to return directly. From manlio_perillo at libero.it Mon Jul 7 23:21:11 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 07 Jul 2008 23:21:11 +0200 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <20080707210538.E75E43A403A@sparrow.telecommunity.com> References: <48727583.2020302@libero.it> <20080707210538.E75E43A403A@sparrow.telecommunity.com> Message-ID: <487288C7.8060903@libero.it> Phillip J. Eby ha scritto: > At 09:58 PM 7/7/2008 +0200, Manlio Perillo wrote: >> In this case the first solution is to use this middleware as a >> decorator, instead of a full middleware. > > This is the correct way to implement non-transparent middleware; i.e., > so-called middleware which is in fact an application API. See: > > http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html > > for more about this. > > Basically, if a piece of middleware has to be there for the application > to run, it's not really "middleware"; it's a misnamed decorator. > Right, this what I thought (and yes, I have read your article). However as a "justification" I used the following argumentation: Ok, the application does not "fully" work without the middleware, however it "mainly" works, and it's not a big problem is messages are not actually sent to the client. Fortunately, in wsgix a "middleware" is very easy to use both in a full middleware stack and as a decorator (since all the state is maintained in the environ dictionary and there is no need for factory functions). In Nginx you can do, in server config: wsgi_middleware wsgix.contrib.messages; However I want to document that this is not a "good" middleware. "non-transparent middleware" is a good term, thanks. > In the original WSGI spec, I overestimated the usefulness of adding > extension APIs to the environ... or more likely, I went along with some > of Ian's overenthusiasm for the idea. ;-) Extension APIs in the > environ just mean you have to write your code to handle the case where > the API isn't there -- in which case you might as well have used a library. > > Extension APIs really only make sense if they are true *server* > features, not application features; otherwise, you are better off using > a library rather than "middleware" per se. > Yes. However my messages middleware does not "inject" an API into the WSGI environment. The API uses the environ to store state; the middleware is only required to "activate" the cookies to actually send messages to the client. So this is not a "bad" middleware, IMHO. By the way, a middleware that is responsible for user authentication: http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/auth/http_middleware.py is a good middleware? To keep it simple, the middleware check if there is an authorization header and the credentials are correct. If this is true, execute the WSGI application (setting environ['REMOTE_USER']), otherwise return a forbidden response. > Under WSGI 2.0, it's even easier since you don't need decorators to > manipulate your response: you can just "return someapi(...)" where the > "..." is whatever you were going to return directly. > return someapi() from inside the WSGI application? Thanks Manlio Perillo From ianb at colorstudy.com Mon Jul 7 23:22:37 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 07 Jul 2008 16:22:37 -0500 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> Message-ID: <4872891D.5040302@colorstudy.com> Donovan Preston wrote: >> To throw another wrench in things, with the Paste/WebError >> evalexception interactive exception handler, it restores this >> thread-local context so you can later execute expressions in the same >> context. > > It seems to me that what is really needed here is an extension of wsgi > that specifies how to get, set, and list request local storage, and for > people to use that instead of the threadlocal module. Of course, for > threaded servers, they will just use the threadlocal module, but for > Spawning running in single-threaded cooperative mode it would use a > greenlet-local implementation, and for a hypothetical Twisted server > running a hypothetical asynchronous wsgi application it would just use a > random request id. Well, it's really call-local, i.e., dynamic scoping. Another option would be something like attaching this dynamic scoping to the frame objects themselves, in a way that evalexception could be aware (restoring them when trying to execute code in the context of some frame) and potentially greenlets could do the same thing. It could be done in a WSGI-specific way, and that might be useful, but the general issue is applicable to more than WSGI. Generally the problems we are talking about only occur when some kind of (semi-)transparent concurrency other than threads are used. This includes greenlets, restoring a frame like in evalexception, and potentially generators with the app_iter. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From ianb at colorstudy.com Mon Jul 7 23:36:32 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 07 Jul 2008 16:36:32 -0500 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <20080707210538.E75E43A403A@sparrow.telecommunity.com> References: <48727583.2020302@libero.it> <20080707210538.E75E43A403A@sparrow.telecommunity.com> Message-ID: <48728C60.9000004@colorstudy.com> Phillip J. Eby wrote: > At 09:58 PM 7/7/2008 +0200, Manlio Perillo wrote: >> In this case the first solution is to use this middleware as a >> decorator, instead of a full middleware. > > This is the correct way to implement non-transparent middleware; i.e., > so-called middleware which is in fact an application API. See: > > http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html > > for more about this. > > Basically, if a piece of middleware has to be there for the application > to run, it's not really "middleware"; it's a misnamed decorator. > > In the original WSGI spec, I overestimated the usefulness of adding > extension APIs to the environ... or more likely, I went along with some > of Ian's overenthusiasm for the idea. ;-) Extension APIs in the > environ just mean you have to write your code to handle the case where > the API isn't there -- in which case you might as well have used a library. Eh, personally I remain unconvinced. Or, at least, while the possibility of abuse exists, the extensibility still has many valid uses, and we're better off with it than with a more object-based system (e.g., CherryPy hooks, Django middleware, Zope's Acquisition, and arguably even Zope 3's giant-ball-of-context). Also, using a *just* library supposes robust and transparent request-local storage in a manner that works comfortably with the WSGI call stack, which like any call stack can be recursive and complex. Lacking such storage, stuffing objects in the environment is better than the alternatives. > Extension APIs really only make sense if they are true *server* > features, not application features; otherwise, you are better off using > a library rather than "middleware" per se. What server features? Servers are dull. Often middleware is used to implement policy separate from the application. Libraries require another kind of abstraction, and implementing policy in libraries is, IMHO, messier than the middleware alternative for many important use cases. Also there exists no neutral ground for libraries in Python. Maybe egg entry points, but they aren't all that neutral, and aren't all that applicable either. zope.interface would like to be neutral ground, but of course is not. So multiple implementations can at least possibly congeal around a WSGI request. Also of course "server" is a vague term. Request in, response out, that's the minimal abstraction for HTTP, and there is no "server" in there. If we're talking about "things that call WSGI applications", well I have a ton of those that never use sockets and you'd be hard pressed to classify them as "servers". -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From dsposx at mac.com Mon Jul 7 23:12:40 2008 From: dsposx at mac.com (Donovan Preston) Date: Mon, 07 Jul 2008 14:12:40 -0700 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <486E75A3.6040905@colorstudy.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> Message-ID: <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> On Jul 4, 2008, at 12:10 PM, Ian Bicking wrote: > Iwan Vosloo wrote: >> Many web frameworks and ORM tools have the need to propagate data >> depending on some or other context within which a request is dealt >> with. >> Passing it all via parameters to every nook of your code is >> cumbersome. >> A lot of the frameworks use a thread local context to solve this >> problem. I'm assuming these are based on threading.local. (See, >> for example: >> http://www.sqlalchemy.org/docs/05/ >> session.html#unitofwork_contextual ) >> Such usage assumes that one request is served per thread. >> This is not necessarily the case. (Twisted would perhaps be an >> example, >> but I have not checked how the twisted people deal with the issue.) > > The Spawning server (http://ulaluma.com/pyx/archives/2008/06/spawning_01_rel.html > ) would indeed get things mixed up this way, as uses greenlets to > make (at least some) blocking calls async. So it would encounter > this problem full-force. With the latest version of Spawning (http://pypi.python.org/pypi/Spawning/0.6 ) this is only true if specifically configured to do so (by passing -- threads=0 or including num_threads = 0 in the ini file). In this case Spawning monkey-patches the threadlocal module with a version that stores things in greenlet-local storage. This makes Pylons applications and other applications that use thread-local storage work as long as the application does not do any blocking database operations. However, by default Spawning now uses a threadpool to execute wsgi applications, since the vast majority of wsgi applications probably block. This makes it functionally identical to the Twisted server which executes the actual wsgi application in a threadpool. > To throw another wrench in things, with the Paste/WebError > evalexception interactive exception handler, it restores this thread- > local context so you can later execute expressions in the same > context. It seems to me that what is really needed here is an extension of wsgi that specifies how to get, set, and list request local storage, and for people to use that instead of the threadlocal module. Of course, for threaded servers, they will just use the threadlocal module, but for Spawning running in single-threaded cooperative mode it would use a greenlet-local implementation, and for a hypothetical Twisted server running a hypothetical asynchronous wsgi application it would just use a random request id. Donovan From pje at telecommunity.com Tue Jul 8 03:05:25 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 07 Jul 2008 21:05:25 -0400 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <487288C7.8060903@libero.it> References: <48727583.2020302@libero.it> <20080707210538.E75E43A403A@sparrow.telecommunity.com> <487288C7.8060903@libero.it> Message-ID: <20080708010444.989553A403A@sparrow.telecommunity.com> At 11:21 PM 7/7/2008 +0200, Manlio Perillo wrote: >So this is not a "bad" middleware, IMHO. True, but it's part of the application, rather than being transparent. >By the way, a middleware that is responsible for user authentication: >http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/auth/http_middleware.py > >is a good middleware? > >To keep it simple, the middleware check if there is an authorization >header and the credentials are correct. > >If this is true, execute the WSGI application (setting >environ['REMOTE_USER']), otherwise return a forbidden response. Right - that's transparent middleware: the application doesn't need to know it's there. >>Under WSGI 2.0, it's even easier since you don't need decorators to >>manipulate your response: you can just "return someapi(...)" where >>the "..." is whatever you were going to return directly. > >return someapi() from inside the WSGI application? Yes. From pje at telecommunity.com Tue Jul 8 03:10:31 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 07 Jul 2008 21:10:31 -0400 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <48728C60.9000004@colorstudy.com> References: <48727583.2020302@libero.it> <20080707210538.E75E43A403A@sparrow.telecommunity.com> <48728C60.9000004@colorstudy.com> Message-ID: <20080708010950.4CE463A403A@sparrow.telecommunity.com> At 04:36 PM 7/7/2008 -0500, Ian Bicking wrote: >Phillip J. Eby wrote: >>At 09:58 PM 7/7/2008 +0200, Manlio Perillo wrote: >>>In this case the first solution is to use this middleware as a >>>decorator, instead of a full middleware. >>This is the correct way to implement non-transparent middleware; >>i.e., so-called middleware which is in fact an application API. See: >>http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html >>for more about this. >>Basically, if a piece of middleware has to be there for the >>application to run, it's not really "middleware"; it's a misnamed decorator. >>In the original WSGI spec, I overestimated the usefulness of adding >>extension APIs to the environ... or more likely, I went along with >>some of Ian's overenthusiasm for the idea. ;-) Extension APIs in >>the environ just mean you have to write your code to handle the >>case where the API isn't there -- in which case you might as well >>have used a library. > >Eh, personally I remain unconvinced. Or, at least, while the >possibility of abuse exists, the extensibility still has many valid >uses, and we're better off with it than with a more object-based >system (e.g., CherryPy hooks, Django middleware, Zope's Acquisition, >and arguably even Zope 3's giant-ball-of-context). > >Also, using a *just* library supposes robust and transparent >request-local storage in a manner that works comfortably with the >WSGI call stack, which like any call stack can be recursive and >complex. Lacking such storage, stuffing objects in the environment >is better than the alternatives. I don't object to stuffing things in the environment; I object to: 1. Putting APIs in there (the API should be regular functions or objects, thanks) 2. Wrapping middleware around an app to put in APIs that it's going to have to know about anyway. >>Extension APIs really only make sense if they are true *server* >>features, not application features; otherwise, you are better off >>using a library rather than "middleware" per se. > >What server features? Servers are dull. Which is why there's not much call for extension APIs. :) >Often middleware is used to implement policy separate from the application. And that kind of middleware is therefore (one hopes) transparent to the application. > Libraries require another kind of abstraction, and implementing > policy in libraries is, IMHO, messier than the middleware > alternative for many important use cases. Also there exists no > neutral ground for libraries in Python. Maybe egg entry points, > but they aren't all that neutral, and aren't all that applicable > either. zope.interface would like to be neutral ground, but of > course is not. So multiple implementations can at least possibly > congeal around a WSGI request. Standards for data in the environ may be a good idea. But APIs in the environ are generally *not* a good idea. >Also of course "server" is a vague term. Request in, response out, >that's the minimal abstraction for HTTP, and there is no "server" in >there. If we're talking about "things that call WSGI applications", Nope, I mean actual servers. From pje at telecommunity.com Tue Jul 8 03:11:59 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 07 Jul 2008 21:11:59 -0400 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> Message-ID: <20080708011118.768B83A403A@sparrow.telecommunity.com> At 02:12 PM 7/7/2008 -0700, Donovan Preston wrote: >It seems to me that what is really needed here is an extension of wsgi >that specifies how to get, set, and list request local storage, and >for people to use that instead of the threadlocal module. I don't follow why you wouldn't just put that in the environ. (If you need it to be carried back from the application, use mutable objects in the environ.) From ianb at colorstudy.com Tue Jul 8 04:42:24 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 07 Jul 2008 21:42:24 -0500 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <20080708010950.4CE463A403A@sparrow.telecommunity.com> References: <48727583.2020302@libero.it> <20080707210538.E75E43A403A@sparrow.telecommunity.com> <48728C60.9000004@colorstudy.com> <20080708010950.4CE463A403A@sparrow.telecommunity.com> Message-ID: <4872D410.4070604@colorstudy.com> Phillip J. Eby wrote: > I don't object to stuffing things in the environment; I object to: > > 1. Putting APIs in there (the API should be regular functions or > objects, thanks) > 2. Wrapping middleware around an app to put in APIs that it's going to > have to know about anyway. Well, sometimes this occurs because you want the middleware at a different level. E.g., something like the transaction handler in repoze.tm (http://svn.repoze.org/repoze.tm/trunk/) -- you expect it to be there, and for it to put an object with a certain API in the environment, and it implements an outer transaction boundary. It's something you can put in fairly speculatively, so that some consumer can make use of it. It's also a case where objects seemingly well outside the scope of the controller/web need access to some transaction manager, and that manager's most obvious scope is for the request, and so some common means to "get the current transaction manager" would be nice. Anyway, arguably a good example of both an API in the environment, and an API that would be nice if you could easily access without being bound to any particular framework's convention for how to get the current request. >> Often middleware is used to implement policy separate from the >> application. > > And that kind of middleware is therefore (one hopes) transparent to the > application. Often *some* implementation must be present. E.g., if you check REMOTE_USER you implicitly expect *something* to set REMOTE_USER. >> Libraries require another kind of abstraction, and implementing >> policy in libraries is, IMHO, messier than the middleware alternative >> for many important use cases. Also there exists no neutral ground for >> libraries in Python. Maybe egg entry points, but they aren't all that >> neutral, and aren't all that applicable either. zope.interface would >> like to be neutral ground, but of course is not. So multiple >> implementations can at least possibly congeal around a WSGI request. > > Standards for data in the environ may be a good idea. But APIs in the > environ are generally *not* a good idea. Yes, generally I agree. >> Also of course "server" is a vague term. Request in, response out, >> that's the minimal abstraction for HTTP, and there is no "server" in >> there. If we're talking about "things that call WSGI applications", > > Nope, I mean actual servers. Well, as I was implying, anything that calls an app is in some sense a server. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From manlio_perillo at libero.it Tue Jul 8 09:38:17 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 08 Jul 2008 09:38:17 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> Message-ID: <48731969.4040308@libero.it> Donovan Preston ha scritto: > [...] > It seems to me that what is really needed here is an extension of wsgi > that specifies how to get, set, and list request local storage, and for > people to use that instead of the threadlocal module. There seems to be something that I don't understand: why not just store the values inside the WSGI environ dictionary? It is a per request dictionary, so it is really what you want. > [...] Manlio Perillo From manlio_perillo at libero.it Tue Jul 8 09:55:13 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 08 Jul 2008 09:55:13 +0200 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <20080708010444.989553A403A@sparrow.telecommunity.com> References: <48727583.2020302@libero.it> <20080707210538.E75E43A403A@sparrow.telecommunity.com> <487288C7.8060903@libero.it> <20080708010444.989553A403A@sparrow.telecommunity.com> Message-ID: <48731D61.30006@libero.it> Phillip J. Eby ha scritto: > At 11:21 PM 7/7/2008 +0200, Manlio Perillo wrote: >> So this is not a "bad" middleware, IMHO. > > True, but it's part of the application, rather than being transparent. > Ok, I agree. Does this means that such non trasparent middlewares must not be inserted inside the "gateway middleware stack", even if this is done only as a convenience (so that you don't have to use a decorator for every functions)? >> By the way, a middleware that is responsible for user authentication: >> http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/auth/http_middleware.py >> >> is a good middleware? >> >> To keep it simple, the middleware check if there is an authorization >> header and the credentials are correct. >> >> If this is true, execute the WSGI application (setting >> environ['REMOTE_USER']), otherwise return a forbidden response. > > Right - that's transparent middleware: the application doesn't need to > know it's there. > I think that it's rather subtle. If you remove the middleware, the application is no more able to handle authenticated user. This is not a problem, the application is still able to work correctly, but the same applies to my messages middleware, IMHO. > >>> Under WSGI 2.0, it's even easier since you don't need decorators to >>> manipulate your response: you can just "return someapi(...)" where >>> the "..." is whatever you were going to return directly. >> >> return someapi() from inside the WSGI application? > > Yes. > Do you have a working example? Also, can you post an example of a middleware that needs to replace the environ dictionary? Thanks Manlio Perillo From dsposx at mac.com Tue Jul 8 20:35:36 2008 From: dsposx at mac.com (Donovan Preston) Date: Tue, 08 Jul 2008 11:35:36 -0700 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <20080708011118.768B83A403A@sparrow.telecommunity.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> <20080708011118.768B83A403A@sparrow.telecommunity.com> Message-ID: On Jul 7, 2008, at 6:11 PM, Phillip J. Eby wrote: > At 02:12 PM 7/7/2008 -0700, Donovan Preston wrote: >> It seems to me that what is really needed here is an extension of >> wsgi >> that specifies how to get, set, and list request local storage, and >> for people to use that instead of the threadlocal module. > > I don't follow why you wouldn't just put that in the environ. (If > you need it to be carried back from the application, use mutable > objects in the environ.) Yes, the logical place to store it is in the environ, but this whole thread is about having an api for doing request-local storage that doesn't involve passing the request everywhere. Here's what I am imagining: There's just a module, called requestlocal or something. It has an API just like threading.local(), except the implementation can be changed by the wsgi server. I personally don't like the idea of having magical context, but I think this is a practicality versus purity issue. Obviously plenty of people have a desire to have a place to store request-local data without passing the environment everywhere. Using threading.local is a good way to do that, unless the server is not using one thread per request. Giving people an interface to write to that doesn't specifically mention threads and is customizable by the wsgi server is what I am suggesting. Donovan From manlio_perillo at libero.it Tue Jul 8 20:45:00 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 08 Jul 2008 20:45:00 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> <20080708011118.768B83A403A@sparrow.telecommunity.com> Message-ID: <4873B5AC.3040904@libero.it> Donovan Preston ha scritto: > > On Jul 7, 2008, at 6:11 PM, Phillip J. Eby wrote: > >> At 02:12 PM 7/7/2008 -0700, Donovan Preston wrote: >>> It seems to me that what is really needed here is an extension of wsgi >>> that specifies how to get, set, and list request local storage, and >>> for people to use that instead of the threadlocal module. >> >> I don't follow why you wouldn't just put that in the environ. (If you >> need it to be carried back from the application, use mutable objects >> in the environ.) > > Yes, the logical place to store it is in the environ, but this whole > thread is about having an api for doing request-local storage that > doesn't involve passing the request everywhere. > > Here's what I am imagining: > > There's just a module, called requestlocal or something. It has an API > just like threading.local(), except the implementation can be changed by > the wsgi server. > Using greenlets, there is always a current greenlet, so you can use this for local storage. A library function can check if there is an active greenlet, and use it as data key; otherwise it will use the current thread id. However this will not work if you have an asynchronous server that does not make use of greenlets. > [...] Manlio Perillo From dsposx at mac.com Tue Jul 8 21:34:44 2008 From: dsposx at mac.com (Donovan Preston) Date: Tue, 08 Jul 2008 12:34:44 -0700 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <4873B5AC.3040904@libero.it> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> <20080708011118.768B83A403A@sparrow.telecommunity.com> <4873B5AC.3040904@libero.it> Message-ID: <8F57FB05-8563-4175-8D8A-82372BDC9510@mac.com> On Jul 8, 2008, at 11:45 AM, Manlio Perillo wrote: > Using greenlets, there is always a current greenlet, so you can use > this for local storage. > > A library function can check if there is an active greenlet, and use > it as data key; otherwise it will use the current thread id. Yes, this is exactly what I did in the wrap_threading_local_with_coro_local here: http://donovanpreston.com:8888/eventlet/file/b6f9627e88df/eventlet/util.py > However this will not work if you have an asynchronous server that > does not make use of greenlets. Exactly, which is why I am proposing just standardizing something that does exactly what people use threading.local for, but whose implementation is pluggable by the wsgi server. Donovan From manlio_perillo at libero.it Tue Jul 8 21:47:09 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 08 Jul 2008 21:47:09 +0200 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <8F57FB05-8563-4175-8D8A-82372BDC9510@mac.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> <20080708011118.768B83A403A@sparrow.telecommunity.com> <4873B5AC.3040904@libero.it> <8F57FB05-8563-4175-8D8A-82372BDC9510@mac.com> Message-ID: <4873C43D.4050506@libero.it> Donovan Preston ha scritto: > > On Jul 8, 2008, at 11:45 AM, Manlio Perillo wrote: > >> Using greenlets, there is always a current greenlet, so you can use >> this for local storage. >> >> A library function can check if there is an active greenlet, and use >> it as data key; otherwise it will use the current thread id. > > Yes, this is exactly what I did in the > wrap_threading_local_with_coro_local here: > > http://donovanpreston.com:8888/eventlet/file/b6f9627e88df/eventlet/util.py > Ok. >> However this will not work if you have an asynchronous server that >> does not make use of greenlets. > > Exactly, which is why I am proposing just standardizing something that > does exactly what people use threading.local for, but whose > implementation is pluggable by the wsgi server. > But this will be not easy to implement, especially if it should go in a separate module. Maybe its better to have something like: wsgiorg.local_scope a function that returns the current request id. The function itself is not bound to the current request, so it can be safely stored. Maybe this should be more easy to implement, I'm not sure. Manlio Perillo From pje at telecommunity.com Tue Jul 8 23:31:09 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 08 Jul 2008 17:31:09 -0400 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> <20080708011118.768B83A403A@sparrow.telecommunity.com> Message-ID: <20080708213027.75DDC3A404D@sparrow.telecommunity.com> At 11:35 AM 7/8/2008 -0700, Donovan Preston wrote: >On Jul 7, 2008, at 6:11 PM, Phillip J. Eby wrote: >>At 02:12 PM 7/7/2008 -0700, Donovan Preston wrote: >>>It seems to me that what is really needed here is an extension of >>>wsgi >>>that specifies how to get, set, and list request local storage, and >>>for people to use that instead of the threadlocal module. >> >>I don't follow why you wouldn't just put that in the environ. (If >>you need it to be carried back from the application, use mutable >>objects in the environ.) > >Yes, the logical place to store it is in the environ, but this whole >thread is about having an api for doing request-local storage that >doesn't involve passing the request everywhere. > >Here's what I am imagining: > >There's just a module, called requestlocal or something. It has an API >just like threading.local(), except the implementation can be changed >by the wsgi server. > >I personally don't like the idea of having magical context, but I >think this is a practicality versus purity issue. Yes... and the practicality of simply storing things in the environ wins. :) Don't get me wrong: I use "magical" contexts in my libraries, both thread-local and otherwise. Indeed, I've got one that solves the sort of problems you guys are talking about here, at least insofar as being able to handle Twisted or greenlets' context-swapping needs. But for stuff you could just put in a WSGI environ, it seems like ludicrous overkill to me. > Obviously plenty of >people have a desire to have a place to store request-local data >without passing the environment everywhere. Using threading.local is a >good way to do that, unless the server is not using one thread per >request. Giving people an interface to write to that doesn't >specifically mention threads and is customizable by the wsgi server is >what I am suggesting. Er, and how do you propose people *access* that interface rather than a specific implementation of it? Wouldn't we need to pass it in the environ, thereby rendering the whole thing even more obviously moot? :) From dsposx at mac.com Wed Jul 9 00:10:17 2008 From: dsposx at mac.com (Donovan Preston) Date: Tue, 08 Jul 2008 15:10:17 -0700 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <20080708213027.75DDC3A404D@sparrow.telecommunity.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> <20080708011118.768B83A403A@sparrow.telecommunity.com> <20080708213027.75DDC3A404D@sparrow.telecommunity.com> Message-ID: <323DBFD9-07D9-467A-857A-A83241AF4535@mac.com> On Jul 8, 2008, at 2:31 PM, Phillip J. Eby wrote: > At 11:35 AM 7/8/2008 -0700, Donovan Preston wrote: >> Obviously plenty of >> people have a desire to have a place to store request-local data >> without passing the environment everywhere. Using threading.local >> is a >> good way to do that, unless the server is not using one thread per >> request. Giving people an interface to write to that doesn't >> specifically mention threads and is customizable by the wsgi server >> is >> what I am suggesting. > > Er, and how do you propose people *access* that interface rather > than a specific implementation of it? Wouldn't we need to pass it > in the environ, thereby rendering the whole thing even more > obviously moot? :) You're right. A standard specific implementation is what I am suggesting. Here, code should help: ## requestlocal.py ## use thread-local storage as the default from threading import local def set_local_implementation(imp): global local local = imp If a wsgi server wants to implement request-local storage by using the environ, it would call set_local_implementation with an imp function that closes over the environ for each request. Donovan From fumanchu at aminus.org Wed Jul 9 00:19:15 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Tue, 8 Jul 2008 15:19:15 -0700 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <323DBFD9-07D9-467A-857A-A83241AF4535@mac.com> References: <1215170037.17590.29.camel@easymoney><486E75A3.6040905@colorstudy.com><53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com><20080708011118.768B83A403A@sparrow.telecommunity.com><20080708213027.75DDC3A404D@sparrow.telecommunity.com> <323DBFD9-07D9-467A-857A-A83241AF4535@mac.com> Message-ID: Donovan Preston wrote: > On Jul 8, 2008, at 2:31 PM, Phillip J. Eby wrote: > > Er, and how do you propose people *access* that interface rather > > than a specific implementation of it? Wouldn't we need to pass it > > in the environ, thereby rendering the whole thing even more > > obviously moot? :) > > You're right. A standard specific implementation is what I am > suggesting. Here, code should help: > > > ## requestlocal.py > > ## use thread-local storage as the default > from threading import local > > def set_local_implementation(imp): > global local > local = imp > > > If a wsgi server wants to implement request-local storage by using the > environ, it would call set_local_implementation with an imp function > that closes over the environ for each request. And what package does requestlocal.py live in? Robert Brewer fumanchu at aminus.org From ianb at colorstudy.com Wed Jul 9 05:46:07 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 08 Jul 2008 22:46:07 -0500 Subject: [Web-SIG] Alternative to threading.local, based on the stack In-Reply-To: <20080708213027.75DDC3A404D@sparrow.telecommunity.com> References: <1215170037.17590.29.camel@easymoney> <486E75A3.6040905@colorstudy.com> <53C67A5E-CD54-4B45-9316-CBEAF77EF49D@mac.com> <20080708011118.768B83A403A@sparrow.telecommunity.com> <20080708213027.75DDC3A404D@sparrow.telecommunity.com> Message-ID: <4874347F.1060002@colorstudy.com> Phillip J. Eby wrote: >> Obviously plenty of >> people have a desire to have a place to store request-local data >> without passing the environment everywhere. Using threading.local is a >> good way to do that, unless the server is not using one thread per >> request. Giving people an interface to write to that doesn't >> specifically mention threads and is customizable by the wsgi server is >> what I am suggesting. > > Er, and how do you propose people *access* that interface rather than a > specific implementation of it? Wouldn't we need to pass it in the > environ, thereby rendering the whole thing even more obviously moot? :) I can't decide what the question is here. You mean, how can a greenlet request-local provider indicate that they are providing a way of getting the current request? Or, how can a consumer get access, given that it can live in any module, and the consumer presumably doesn't have an environ? I imagine from what Donovan says that there would actually be one module, requestlocal, and one implementation, and that implementation would be awesome and support greenlets and threads, and whatever else comes along (which luckily is not much else), and I guess maybe has a middleware that would register the request on entry and deregister it on exit, and consumers would do: import requestlocal def whatever(): environ = requestlocal.get_request() and we'd just all agree on this singular implementation, because I don't see any way around that. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From tseaver at palladion.com Fri Jul 11 04:50:00 2008 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 10 Jul 2008 22:50:00 -0400 Subject: [Web-SIG] help with the implementation of a WSGI middleware In-Reply-To: <4872D410.4070604@colorstudy.com> References: <48727583.2020302@libero.it> <20080707210538.E75E43A403A@sparrow.telecommunity.com> <48728C60.9000004@colorstudy.com> <20080708010950.4CE463A403A@sparrow.telecommunity.com> <4872D410.4070604@colorstudy.com> Message-ID: <4876CA58.90808@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Bicking wrote: > Phillip J. Eby wrote: >> I don't object to stuffing things in the environment; I object to: >> >> 1. Putting APIs in there (the API should be regular functions or >> objects, thanks) >> 2. Wrapping middleware around an app to put in APIs that it's going to >> have to know about anyway. > > Well, sometimes this occurs because you want the middleware at a > different level. E.g., something like the transaction handler in > repoze.tm (http://svn.repoze.org/repoze.tm/trunk/) -- you expect it to > be there, and for it to put an object with a certain API in the > environment, and it implements an outer transaction boundary. I have occasionally left repoze.tm out of the stack in front of an application which would normally be configured to use it, precisely to guarantee that no transactions can be implicitly committed: I think this is a perfect case of "transparent" middleware: - The application uses the 'transaction' library (if desired) to annotate transactions, and to register "dirty" objects; note that the library will silently start a thread-local transaction if one is not already present. - The middleware, if present, begins a transaction on entry, then commits the transaction on non-exceptional exit, or aborts it on exceptional exit. - The application doesn't need to change at all if the middleware is absent. It doesn't use or expect any API to be jammed into the WSGI environment at all. > It's > something you can put in fairly speculatively, so that some consumer can > make use of it. It's also a case where objects seemingly well outside > the scope of the controller/web need access to some transaction manager, > and that manager's most obvious scope is for the request, and so some > common means to "get the current transaction manager" would be nice. > Anyway, arguably a good example of both an API in the environment, and > an API that would be nice if you could easily access without being bound > to any particular framework's convention for how to get the current request. > >>> Often middleware is used to implement policy separate from the >>> application. >> And that kind of middleware is therefore (one hopes) transparent to the >> application. > > Often *some* implementation must be present. E.g., if you check > REMOTE_USER you implicitly expect *something* to set REMOTE_USER. As long as the application does the Right Thing if it is missing (raising Unauthroized, or whatever), any middleware to set that variable is purely optional. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIdspY+gerLs4ltQ4RAoofAKCTIHPfnfDjuOrVkTgvvKB1nmndygCfT4C1 Wvs9oj5YLR5uv4NCsKbYrRk= =9udz -----END PGP SIGNATURE----- From robillard.etienne at gmail.com Mon Jul 14 22:09:18 2008 From: robillard.etienne at gmail.com (Etienne Robillard) Date: Mon, 14 Jul 2008 16:09:18 -0400 Subject: [Web-SIG] Using decorators to add objects in a thread-local store.. Message-ID: <20080714160918.19519101@fluke> Hi all, I'd like to have your input and comments on using decorators functions for adding extra options to the request.environ object. For instance, here's a decorator whichs adds a "scoped" session object into request.environ: def with_session(engine=None): """ Decorator function for attaching a `Session` instance as a keyword argument in `request.environ`. """ def decorator(view_func): def _wrapper(request, *args, **kwargs): scoped_session.set_session(engine) request.environ['_scoped_session'] = getattr(scoped_session, 'sessio return view_func(request, *args, **kwargs) return wraps(view_func)(_wrapper) return decorator Then it can be used as follows: @with_session(engine=engine): def view_blog_list(request, *args, **kwargs): # get the local session object for this # request (thread-local) sess = request.environ['_scoped_session'] # do stuff with the Session object here... ... Is this a good approach, or can this be adapted to work in multithreaded environments ? For details, you can checkout the source code of notmm, which holds the current implementation of the with_session decorator: $ hg clone -r tip http://gthc.org/projects/notmm/repo/ notmm For more details about notmm, please see here: http://gthc.org/projects/notmm/ Thanks and Regards, Etienne From ianb at colorstudy.com Wed Jul 16 04:32:39 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 15 Jul 2008 21:32:39 -0500 Subject: [Web-SIG] Using decorators to add objects in a thread-local store.. In-Reply-To: <20080714160918.19519101@fluke> References: <20080714160918.19519101@fluke> Message-ID: <487D5DC7.1070802@colorstudy.com> Etienne Robillard wrote: > > Hi all, > > I'd like to have your input and comments on using decorators > functions for adding extra options to the request.environ object. > > For instance, here's a decorator whichs adds a "scoped" session > object into request.environ: > > def with_session(engine=None): > """ > Decorator function for attaching a `Session` instance > as a keyword argument in `request.environ`. > """ > def decorator(view_func): > def _wrapper(request, *args, **kwargs): > scoped_session.set_session(engine) > request.environ['_scoped_session'] = getattr(scoped_session, 'sessio You should always use a namespace, e.g., request.environ['something._scoped_session'] = ... In the context of a Pylons controller you could do it this way. Of course with just WSGI it would be better to wrap it via WSGI, which is almost equivalent to a decorator: def with_session(engine=None): def decorator(app): def engine_wsgi_app(environ, start_response): environ['...'] = ... return app(environ, start_response) return engine_wsgi_app return decorator Pylons controllers aren't *quite* WSGI applications, but instances of those controller classes are. So wrapping an individual controller with middleware requires a bit more work. > return view_func(request, *args, **kwargs) > return wraps(view_func)(_wrapper) > return decorator > > Then it can be used as follows: > > @with_session(engine=engine): > def view_blog_list(request, *args, **kwargs): > # get the local session object for this > # request (thread-local) > sess = request.environ['_scoped_session'] > # do stuff with the Session object here... > ... > > Is this a good approach, or can this be adapted to work > in multithreaded environments ? Since you are passing around arguments to functions it should be fine in threaded environments. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From robillard.etienne at gmail.com Wed Jul 16 18:12:22 2008 From: robillard.etienne at gmail.com (Etienne Robillard) Date: Wed, 16 Jul 2008 09:12:22 -0700 (PDT) Subject: [Web-SIG] Using decorators to add objects in a thread-local store.. In-Reply-To: <6e9196d20807151344s436603d0s11f150767ccb40c3@mail.gmail.com> References: <20080714160918.19519101@fluke> <20080715164202.7906520f@fluke> <6e9196d20807151344s436603d0s11f150767ccb40c3@mail.gmail.com> Message-ID: <82812c99-dbca-4d43-9f0d-87a47d0d3a86@59g2000hsb.googlegroups.com> On Jul 15, 4:44 pm, "Mike Orr" wrote: > On Tue, Jul 15, 2008 at 1:42 PM, Etienne Robillard > > > > wrote: > > > On Mon, 14 Jul 2008 16:09:18 -0400 > > Etienne Robillard wrote: > > >> Hi all, > > >> I'd like to have your input and comments on using decorators > >> functions for adding extra options to the request.environ object. > > >> For instance, here's a decorator whichs adds a "scoped" session > >> object into request.environ: > > >> def with_session(engine=None): > >> """ > >> Decorator function for attaching a `Session` instance > >> as a keyword argument in `request.environ`. > >> """ > >> def decorator(view_func): > >> def _wrapper(request, *args, **kwargs): > >> scoped_session.set_session(engine) > >> request.environ['_scoped_session'] = getattr(scoped_session, 'sessio > >> return view_func(request, *args, **kwargs) > >> return wraps(view_func)(_wrapper) > >> return decorator > > >> Then it can be used as follows: > > >> @with_session(engine=engine): > >> def view_blog_list(request, *args, **kwargs): > >> # get the local session object for this > >> # request (thread-local) > >> sess = request.environ['_scoped_session'] > >> # do stuff with the Session object here... > >> ... > > >> Is this a good approach, or can this be adapted to work > >> in multithreaded environments ? > > >> For details, you can checkout the source code of notmm, which > >> holds the current implementation of the with_session decorator: > > >> $ hg clone -r tiphttp://gthc.org/projects/notmm/repo/notmm > >> For more details about notmm, please see here:http://gthc.org/projects/notmm/ > > >> Thanks and Regards, > > >> Etienne > > > Hi, > > > I'm forwarding this on pylons-discuss. I'd be interested in > > feedback on how to integrate SQLAlchemy in Pylons. Can this > > decorator (with_session) works on/with Pylons controllers too ? > > This is the "standard" way.http://wiki.pylonshq.com/display/pylonsdocs/Using+SQLAlchemy+with+Pylons Ah yes. I forgot that document, it explains really closely what I was trying to do with the with_session decorator... Some notable differences: - myapp/model/meta.py: I just throw that stuff in a file named myapp/ config/environment.py. - myapp/model/__init__.py : Likewise, I defined a get_model() which is essentially a clone of init_model. It just returns a `Table` instance for a given table_name. > It puts a scoped session object at myapp.model.meta.Session Interesting. To put this in constrast with the with_session object, the only difference I see is the place to store and retrieve the Session object. (request.environ vs meta..) > I suppose the decorator would work, but it's not typical for an action > to read things directly from the environment unless it's something > Pylons doesn't support any other way. Well, I like refering to a web project by its name, rather than refering to it as a framework X project. For that reason I think 'typical' doesn't apply here, since supporting Pylons might not be it. I just read Pylons code for inspiration and technical guidance... :) > -- > Mike Orr Thanks! Etienne From robillard.etienne at gmail.com Wed Jul 16 18:55:41 2008 From: robillard.etienne at gmail.com (Etienne Robillard) Date: Wed, 16 Jul 2008 12:55:41 -0400 Subject: [Web-SIG] Using decorators to add objects in a thread-local store.. In-Reply-To: <487D5DC7.1070802@colorstudy.com> References: <20080714160918.19519101@fluke> <487D5DC7.1070802@colorstudy.com> Message-ID: <20080716125541.1893c483@fluke> On Tue, 15 Jul 2008 21:32:39 -0500 Ian Bicking wrote: > Etienne Robillard wrote: > > > > Hi all, > > > > I'd like to have your input and comments on using decorators > > functions for adding extra options to the request.environ object. > > > > For instance, here's a decorator whichs adds a "scoped" session > > object into request.environ: > > > > def with_session(engine=None): > > """ > > Decorator function for attaching a `Session` instance > > as a keyword argument in `request.environ`. > > """ > > def decorator(view_func): > > def _wrapper(request, *args, **kwargs): > > scoped_session.set_session(engine) > > request.environ['_scoped_session'] = getattr(scoped_session, 'sessio > > You should always use a namespace, e.g., > request.environ['something._scoped_session'] = ... > > In the context of a Pylons controller you could do it this way. Of > course with just WSGI it would be better to wrap it via WSGI, which is > almost equivalent to a decorator: > > def with_session(engine=None): > def decorator(app): > def engine_wsgi_app(environ, start_response): > environ['...'] = ... > return app(environ, start_response) > return engine_wsgi_app > return decorator Tried, but its giving me headaches.. I think I will stick with the former... Plus, in webob, it is my assumption that Request objects returns a wsgi application with get_response(). This indicates most likely that both approaches are the same. :) > Pylons controllers aren't *quite* WSGI applications, but instances of > those controller classes are. So wrapping an individual controller with > middleware requires a bit more work. Ah. I just have a conflicting view regarding the definition of a 'middleware'. For me, middlewares shouldn't even exists, as I'm not even sure where they should fit in the WSGI pipeline.. In fact, I consider middlewares as a potential security hole... > > @with_session(engine=engine): > > def view_blog_list(request, *args, **kwargs): > > # get the local session object for this > > # request (thread-local) > > sess = request.environ['_scoped_session'] > > # do stuff with the Session object here... > > ... > > > > Is this a good approach, or can this be adapted to work > > in multithreaded environments ? > > Since you are passing around arguments to functions it should be fine in > threaded environments. Thanks. I'd definitely like to see more discussion regarding threads in the next version of WSGI. > -- > Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org Kind Regards, Etienne From tibor at infinit.sk Mon Jul 21 17:40:03 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Mon, 21 Jul 2008 17:40:03 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> Message-ID: <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> Hi, I'm quite new to python and I ran into a performance problem with wsgiref.simple_server. I'm running this little program. from wsgiref import simple_server def app(environ, start_response): start_response('200 OK', [('content-type', 'text/html')]) return ['*'*50000] httpd = simple_server.make_server('',8080,app) try: httpd.serve_forever() except KeyboardInterrupt: pass I get many hundreds of responses/second on my local computer, which is fine. But when I access this server through our VPN it performs very bad. I get 0.33 requests/second as compared to 7 responses/second when accessing 50kB static file served by IIS. I also tried the same little program using paste.httpserver and that version works fast as expected. I cannot really understand this behavior. My only thought is that the wsgiref version is sending the data in many chunks, and therefore the latency of the VPN comes into play. But I don't really know how to test this. This is Python 2.5.2 on Windows Server 2003 (same behavior on Windows XP), testing with Apache AB as well as Firefox... Any help would be appriciated. Tibor From fumanchu at aminus.org Mon Jul 21 18:28:46 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 21 Jul 2008 09:28:46 -0700 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> Message-ID: Tibor Arpas wrote: > I'm quite new to python and I ran into a performance problem with > wsgiref.simple_server. I'm running this little program. > > from wsgiref import simple_server > > def app(environ, start_response): > start_response('200 OK', [('content-type', 'text/html')]) > return ['*'*50000] > > httpd = simple_server.make_server('',8080,app) > try: > httpd.serve_forever() > except KeyboardInterrupt: > pass > > > I get many hundreds of responses/second on my local computer, which is > fine. > But when I access this server through our VPN it performs very bad. > > I get 0.33 requests/second as compared to 7 responses/second when > accessing 50kB static file served by IIS. > > I also tried the same little program using paste.httpserver and that > version works fast as expected. > > I cannot really understand this behavior. My only thought is that the > wsgiref version is sending the data in many chunks, and therefore the > latency of the VPN comes into play. But I don't really know how to > test this. One possible answer is that wsgiref doesn't disable the Nagle algorithm [1]. Try changing WSGIServer.server_bind to read: def server_bind(self): """Override server_bind to store the server name.""" import socket self.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) HTTPServer.server_bind(self) self.setup_environ() Robert Brewer fumanchu at aminus.org [1] http://en.wikipedia.org/wiki/Nagle's_algorithm From tibor at infinit.sk Mon Jul 21 19:58:02 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Mon, 21 Jul 2008 19:58:02 +0200 Subject: [Web-SIG] Fwd: Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807211043m6d8efae2s73932ae0e77056b2@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <19a7c8c20807211043m6d8efae2s73932ae0e77056b2@mail.gmail.com> Message-ID: <19a7c8c20807211058o5b9b26b2j35245875c1707826@mail.gmail.com> I can see the 3 seconds / 150 miliseconds difference in a browser and to get exact numbers I use Apache ab . On Mon, Jul 21, 2008 at 6:26 PM, Ionel Maries Cristian wrote: > how are you benchmarking? > > On Mon, Jul 21, 2008 at 18:40, Tibor Arpas wrote: >> >> Hi, >> I'm quite new to python and I ran into a performance problem with >> wsgiref.simple_server. I'm running this little program. >> >> from wsgiref import simple_server >> >> def app(environ, start_response): >> start_response('200 OK', [('content-type', 'text/html')]) >> return ['*'*50000] >> >> httpd = simple_server.make_server('',8080,app) >> try: >> httpd.serve_forever() >> except KeyboardInterrupt: >> pass >> >> >> I get many hundreds of responses/second on my local computer, which is >> fine. >> But when I access this server through our VPN it performs very bad. >> >> I get 0.33 requests/second as compared to 7 responses/second when >> accessing 50kB static file served by IIS. >> >> I also tried the same little program using paste.httpserver and that >> version works fast as expected. >> >> I cannot really understand this behavior. My only thought is that the >> wsgiref version is sending the data in many chunks, and therefore the >> latency of the VPN comes into play. But I don't really know how to >> test this. >> >> This is Python 2.5.2 on Windows Server 2003 (same behavior on Windows >> XP), testing with Apache AB as well as Firefox... >> >> Any help would be appriciated. >> >> Tibor >> _______________________________________________ >> Web-SIG mailing list >> Web-SIG at python.org >> Web SIG: http://www.python.org/sigs/web-sig >> Unsubscribe: >> http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com > > > > -- > ionel maries cristian > From tibor at infinit.sk Mon Jul 21 19:58:41 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Mon, 21 Jul 2008 19:58:41 +0200 Subject: [Web-SIG] Fwd: Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807211054u761fc228h9bb74f7f594e9df9@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <19a7c8c20807211054u761fc228h9bb74f7f594e9df9@mail.gmail.com> Message-ID: <19a7c8c20807211058m6ebf77a3o6ccc19d0b9581280@mail.gmail.com> Thanks Robert, I tried this but no difference :-(. I made sure I changed the right source code. On Mon, Jul 21, 2008 at 6:28 PM, Robert Brewer wrote: > Tibor Arpas wrote: >> I'm quite new to python and I ran into a performance problem with >> wsgiref.simple_server. I'm running this little program. >> >> from wsgiref import simple_server >> >> def app(environ, start_response): >> start_response('200 OK', [('content-type', 'text/html')]) >> return ['*'*50000] >> >> httpd = simple_server.make_server('',8080,app) >> try: >> httpd.serve_forever() >> except KeyboardInterrupt: >> pass >> >> >> I get many hundreds of responses/second on my local computer, which is >> fine. >> But when I access this server through our VPN it performs very bad. >> >> I get 0.33 requests/second as compared to 7 responses/second when >> accessing 50kB static file served by IIS. >> >> I also tried the same little program using paste.httpserver and that >> version works fast as expected. >> >> I cannot really understand this behavior. My only thought is that the >> wsgiref version is sending the data in many chunks, and therefore the >> latency of the VPN comes into play. But I don't really know how to >> test this. > > One possible answer is that wsgiref doesn't disable the Nagle algorithm > [1]. > Try changing WSGIServer.server_bind to read: > > def server_bind(self): > """Override server_bind to store the server name.""" > import socket > self.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, > 1) > HTTPServer.server_bind(self) > self.setup_environ() > > > > Robert Brewer > fumanchu at aminus.org > > [1] http://en.wikipedia.org/wiki/Nagle's_algorithm > > From graham.dumpleton at gmail.com Tue Jul 22 01:51:30 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 22 Jul 2008 09:51:30 +1000 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> Message-ID: <88e286470807211651g206087c0i8634b37d05c4b1a2@mail.gmail.com> And what happens if you actually supply a content length in your response? 2008/7/22 Tibor Arpas : > Hi, > I'm quite new to python and I ran into a performance problem with > wsgiref.simple_server. I'm running this little program. > > from wsgiref import simple_server > > def app(environ, start_response): > start_response('200 OK', [('content-type', 'text/html')]) > return ['*'*50000] > > httpd = simple_server.make_server('',8080,app) > try: > httpd.serve_forever() > except KeyboardInterrupt: > pass > > > I get many hundreds of responses/second on my local computer, which is fine. > But when I access this server through our VPN it performs very bad. > > I get 0.33 requests/second as compared to 7 responses/second when > accessing 50kB static file served by IIS. > > I also tried the same little program using paste.httpserver and that > version works fast as expected. > > I cannot really understand this behavior. My only thought is that the > wsgiref version is sending the data in many chunks, and therefore the > latency of the VPN comes into play. But I don't really know how to > test this. > > This is Python 2.5.2 on Windows Server 2003 (same behavior on Windows > XP), testing with Apache AB as well as Firefox... > > Any help would be appriciated. > > Tibor > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > From tibor at infinit.sk Tue Jul 22 09:58:06 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Tue, 22 Jul 2008 09:58:06 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <88e286470807211651g206087c0i8634b37d05c4b1a2@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <88e286470807211651g206087c0i8634b37d05c4b1a2@mail.gmail.com> Message-ID: <19a7c8c20807220058s34d48650vc680c76828ad434b@mail.gmail.com> I added the Content-Length and no difference. Important thing I noticed is that I get the same request/response rate with only ONE byte of content. So it looks like a constant delay of 3 seconds per request.. Now my script reads: from wsgiref import simple_server def app(environ, start_response): send = '*'*1 start_response('200 OK', [('content-type', 'text/html'),('Content-Length',str(len(send)))]) return send port = 8080 httpd = simple_server.WSGIServer(('',port), simple_server.WSGIRequestHandler,) httpd.set_app(app) try: httpd.serve_forever() except KeyboardInterrupt: pass #import paste.httpserver #paste.httpserver.serve(app, host='10.0.0.230', port='8079') #this works fast! On Tue, Jul 22, 2008 at 1:51 AM, Graham Dumpleton wrote: > And what happens if you actually supply a content length in your response? > > 2008/7/22 Tibor Arpas : >> Hi, >> I'm quite new to python and I ran into a performance problem with >> wsgiref.simple_server. I'm running this little program. >> >> from wsgiref import simple_server >> >> def app(environ, start_response): >> start_response('200 OK', [('content-type', 'text/html')]) >> return ['*'*50000] >> >> httpd = simple_server.make_server('',8080,app) >> try: >> httpd.serve_forever() >> except KeyboardInterrupt: >> pass >> >> >> I get many hundreds of responses/second on my local computer, which is fine. >> But when I access this server through our VPN it performs very bad. >> >> I get 0.33 requests/second as compared to 7 responses/second when >> accessing 50kB static file served by IIS. >> >> I also tried the same little program using paste.httpserver and that >> version works fast as expected. >> >> I cannot really understand this behavior. My only thought is that the >> wsgiref version is sending the data in many chunks, and therefore the >> latency of the VPN comes into play. But I don't really know how to >> test this. >> >> This is Python 2.5.2 on Windows Server 2003 (same behavior on Windows >> XP), testing with Apache AB as well as Firefox... >> >> Any help would be appriciated. >> >> Tibor >> _______________________________________________ >> Web-SIG mailing list >> Web-SIG at python.org >> Web SIG: http://www.python.org/sigs/web-sig >> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com >> > From exarkun at divmod.com Tue Jul 22 15:54:59 2008 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Tue, 22 Jul 2008 09:54:59 -0400 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807220058s34d48650vc680c76828ad434b@mail.gmail.com> Message-ID: <20080722135459.29191.490789256.divmod.quotient.1037@ohm> On Tue, 22 Jul 2008 09:58:06 +0200, Tibor Arpas wrote: >I added the Content-Length and no difference. Important thing I >noticed is that I get the same request/response rate with only ONE >byte of content. So it looks like a constant delay of 3 seconds per >request.. wsgiref seems to run an HTTP 1.0 server without persistent connections. Perhaps paste is running an HTTP server with persistent connections. High latency will tank performance of TCP connections. Jean-Paul From tibor at infinit.sk Tue Jul 22 17:50:33 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Tue, 22 Jul 2008 17:50:33 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <20080722135459.29191.490789256.divmod.quotient.1037@ohm> References: <19a7c8c20807220058s34d48650vc680c76828ad434b@mail.gmail.com> <20080722135459.29191.490789256.divmod.quotient.1037@ohm> Message-ID: <19a7c8c20807220850h730b6e04o36e2a9760addc530@mail.gmail.com> Mhm.. No, That doesn't seem to be THE reason. Paste is HTTP/1.0 too. See the detailed server-client communication below. BTW the VPN is not that slow. It's 4Mb/s with pings of 5-7 ms. Thanks guys for the suggestions, I appreciate it. If you run out of them, the most effective way would probably be to strip down the script even further and use the underlying lower level libraries directly. I'll try to get back to it later once I have more time... Benchmarking 10.0.0.230 (be patient)...INFO: POST header == --- GET / HTTP/1.0 Host: 10.0.0.230:8079 User-Agent: ApacheBench/2.3 Accept: */* --- LOG: header received: HTTP/1.0 200 OK Server: PasteWSGIServer/0.5 Python/2.5.1 Date: Tue, 22 Jul 2008 15:36:53 GMT content-type: text/html Content-Length: 1 * LOG: Response code = 200 ..done =================================================================== Benchmarking 10.0.0.230 (be patient)...INFO: POST header == --- GET / HTTP/1.0 Host: 10.0.0.230:8078 User-Agent: ApacheBench/2.3 Accept: */* --- LOG: header received: HTTP/1.0 200 OK LOG: header received: HTTP/1.0 200 OK Date: Tue, 22 Jul 2008 15:33:57 GMT Server: WSGIServer/0.1 Python/2.5.1 content-type: text/html Content-Length: 1 * LOG: Response code = 200 ..done On Tue, Jul 22, 2008 at 3:54 PM, Jean-Paul Calderone wrote: > On Tue, 22 Jul 2008 09:58:06 +0200, Tibor Arpas wrote: >> >> I added the Content-Length and no difference. Important thing I >> noticed is that I get the same request/response rate with only ONE >> byte of content. So it looks like a constant delay of 3 seconds per >> request.. > > wsgiref seems to run an HTTP 1.0 server without persistent connections. > Perhaps paste is running an HTTP server with persistent connections. > High latency will tank performance of TCP connections. > > Jean-Paul > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/tibor%40infinit.sk > From manlio_perillo at libero.it Tue Jul 22 18:02:35 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 22 Jul 2008 18:02:35 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> Message-ID: <4886049B.9050408@libero.it> Tibor Arpas ha scritto: > Hi, > I'm quite new to python and I ran into a performance problem with > wsgiref.simple_server. I'm running this little program. > > from wsgiref import simple_server > > def app(environ, start_response): > start_response('200 OK', [('content-type', 'text/html')]) > return ['*'*50000] > > httpd = simple_server.make_server('',8080,app) > try: > httpd.serve_forever() > except KeyboardInterrupt: > pass > > > I get many hundreds of responses/second on my local computer, which is fine. > But when I access this server through our VPN it performs very bad. > wsgiref is an iterative server, if I not wrong; it serves only one request at a time. On the loopback interface this is not a problem, but on Internet the latency of the connection make a single request time high. paste.httpserver uses a thread pool. > [...] Manlio Perillo From fumanchu at aminus.org Tue Jul 22 18:03:52 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Tue, 22 Jul 2008 09:03:52 -0700 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807220850h730b6e04o36e2a9760addc530@mail.gmail.com> References: <19a7c8c20807220058s34d48650vc680c76828ad434b@mail.gmail.com><20080722135459.29191.490789256.divmod.quotient.1037@ohm> <19a7c8c20807220850h730b6e04o36e2a9760addc530@mail.gmail.com> Message-ID: A tcpdump would be more helpful at this point, but I'm not sure the ML is the right place for that. Robert Brewer fumanchu at aminus.org > -----Original Message----- > From: web-sig-bounces+fumanchu=aminus.org at python.org [mailto:web-sig- > bounces+fumanchu=aminus.org at python.org] On Behalf Of Tibor Arpas > Sent: Tuesday, July 22, 2008 8:51 AM > To: Jean-Paul Calderone > Cc: web-sig at python.org > Subject: Re: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network > > Mhm.. No, That doesn't seem to be THE reason. Paste is HTTP/1.0 too. > See the detailed server-client communication below. BTW the VPN is > not that slow. It's 4Mb/s with pings of 5-7 ms. Thanks guys for the > suggestions, I appreciate it. If you run out of them, the most > effective way would probably be to strip down the script even further > and use the underlying lower level libraries directly. I'll try to get > back to it later once I have more time... > > Benchmarking 10.0.0.230 (be patient)...INFO: POST header == > --- > GET / HTTP/1.0 > Host: 10.0.0.230:8079 > User-Agent: ApacheBench/2.3 > Accept: */* > > > --- > LOG: header received: > HTTP/1.0 200 OK > Server: PasteWSGIServer/0.5 Python/2.5.1 > Date: Tue, 22 Jul 2008 15:36:53 GMT > content-type: text/html > Content-Length: 1 > > * > LOG: Response code = 200 > ..done > > =================================================================== > Benchmarking 10.0.0.230 (be patient)...INFO: POST header == > --- > GET / HTTP/1.0 > Host: 10.0.0.230:8078 > User-Agent: ApacheBench/2.3 > Accept: */* > > > --- > LOG: header received: > HTTP/1.0 200 OK > > LOG: header received: > HTTP/1.0 200 OK > Date: Tue, 22 Jul 2008 15:33:57 GMT > Server: WSGIServer/0.1 Python/2.5.1 > content-type: text/html > Content-Length: 1 > > * > LOG: Response code = 200 > ..done > > > On Tue, Jul 22, 2008 at 3:54 PM, Jean-Paul Calderone > wrote: > > On Tue, 22 Jul 2008 09:58:06 +0200, Tibor Arpas > wrote: > >> > >> I added the Content-Length and no difference. Important thing I > >> noticed is that I get the same request/response rate with only ONE > >> byte of content. So it looks like a constant delay of 3 seconds per > >> request.. > > > > wsgiref seems to run an HTTP 1.0 server without persistent > connections. > > Perhaps paste is running an HTTP server with persistent connections. > > High latency will tank performance of TCP connections. > > > > Jean-Paul > > _______________________________________________ > > Web-SIG mailing list > > Web-SIG at python.org > > Web SIG: http://www.python.org/sigs/web-sig > > Unsubscribe: > > http://mail.python.org/mailman/options/web-sig/tibor%40infinit.sk > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web- > sig/fumanchu%40aminus.org From irmen at xs4all.nl Wed Jul 23 00:21:00 2008 From: irmen at xs4all.nl (Irmen de Jong) Date: Wed, 23 Jul 2008 00:21:00 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> Message-ID: <48865D4C.9090307@xs4all.nl> Tibor Arpas wrote: > Hi, > I'm quite new to python and I ran into a performance problem with > wsgiref.simple_server. I'm running this little program. > [...] > > I get many hundreds of responses/second on my local computer, which is fine. > But when I access this server through our VPN it performs very bad. Could it be that the wsgiref is doing a reverse DNS lookup for every incoming call? (for instance to determine the remote server hostname for some reason such as logging) That could be a very slow operation. Just an idea, I have no idea about the workings of wsgiref, I've just seen this happening in other situations. --irmen From pje at telecommunity.com Wed Jul 23 01:34:42 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 22 Jul 2008 19:34:42 -0400 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <48865D4C.9090307@xs4all.nl> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <48865D4C.9090307@xs4all.nl> Message-ID: <20080722233434.44CC23A409B@sparrow.telecommunity.com> At 12:21 AM 7/23/2008 +0200, Irmen de Jong wrote: >Tibor Arpas wrote: >>Hi, >>I'm quite new to python and I ran into a performance problem with >>wsgiref.simple_server. I'm running this little program. >[...] >>I get many hundreds of responses/second on my local computer, which is fine. >>But when I access this server through our VPN it performs very bad. > >Could it be that the wsgiref is doing a reverse DNS lookup for every >incoming call? >(for instance to determine the remote server hostname for some >reason such as logging) >That could be a very slow operation. > >Just an idea, I have no idea about the workings of wsgiref, I've >just seen this happening in other situations. It isn't really even wsgiref-related. First, wsgiref.simple_server is based on the other stdlib HTTP server modules. Second, it's not intended for production use. Third, it's not multi-threaded, which is likely to be a factor if the performance tests are done using concurrent requests. From tibor at infinit.sk Wed Jul 23 09:41:15 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Wed, 23 Jul 2008 09:41:15 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <48865D4C.9090307@xs4all.nl> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <48865D4C.9090307@xs4all.nl> Message-ID: <19a7c8c20807230041o1591ce4agcb70035aa4cbf7e7@mail.gmail.com> Reverse DNS lookup is THE reason. Thank you very much Irmen. I put my remote computer into windows/system32/drivers/etc/hosts and the problem disapeared. The DNS name is indeed in the log which is written to the console. Thanks again. Is there a way to disable the reverse DNS lookup in the wsgiref.simple_server? Quick googling didn't reveal much. Tibor On Wed, Jul 23, 2008 at 12:21 AM, Irmen de Jong wrote: > Tibor Arpas wrote: >> >> Hi, >> I'm quite new to python and I ran into a performance problem with >> wsgiref.simple_server. I'm running this little program. >> > [...] >> >> I get many hundreds of responses/second on my local computer, which is >> fine. >> But when I access this server through our VPN it performs very bad. > > Could it be that the wsgiref is doing a reverse DNS lookup for every > incoming call? > (for instance to determine the remote server hostname for some reason such > as logging) > That could be a very slow operation. > > Just an idea, I have no idea about the workings of wsgiref, I've just seen > this happening in other situations. > > --irmen > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/tibor%40infinit.sk > From tibor at infinit.sk Wed Jul 23 10:12:18 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Wed, 23 Jul 2008 10:12:18 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: <20080722233434.44CC23A409B@sparrow.telecommunity.com> References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <48865D4C.9090307@xs4all.nl> <20080722233434.44CC23A409B@sparrow.telecommunity.com> Message-ID: <19a7c8c20807230112k5553091ci74d72bd199180c27@mail.gmail.com> A quick comment after we know where is the problem.(see previous mail) Being a newbie I wrote to this list in the first place. I couldn't really tell if it is or not wsgiref related. Also sorry not to mention it sooner, at first i tried both the single-threaded/multithreaded scenarios and it made no difference. So I based the test cases using single-threaded server and single-threaded client only. I bumped into this problem using a library called Tilecache (tilecache.org), which provides little standalone script to try things out using simple_server.It's library to speed things up so the 3 seconds/request overhead was very bad. I think It would be nice to have a production quality HTTP-WSGI server in the standard library at least for low traffic sites. Tibor >> Just an idea, I have no idea about the workings of wsgiref, I've just seen >> this happening in other situations. > > It isn't really even wsgiref-related. First, wsgiref.simple_server is based > on the other stdlib HTTP server modules. > > Second, it's not intended for production use. > > Third, it's not multi-threaded, which is likely to be a factor if the > performance tests are done using concurrent requests. > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/tibor%40infinit.sk > From tibor at infinit.sk Wed Jul 23 12:58:57 2008 From: tibor at infinit.sk (Tibor Arpas) Date: Wed, 23 Jul 2008 12:58:57 +0200 Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network In-Reply-To: References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <48865D4C.9090307@xs4all.nl> <19a7c8c20807230041o1591ce4agcb70035aa4cbf7e7@mail.gmail.com> Message-ID: <19a7c8c20807230358t88aa253ofd52f71a707ff28d@mail.gmail.com> Dear all, it think the method BaseHTTPServer.BaseHTTPRequestHandler.address_string() is to blame for the delay. DNS reverse lookup is a feature not a bug. I think on most production servers it's turned off by default. And after my experience I would prefer that it's turned off in BaseHTTPServer. It's already documented in the very last lines of http://www.python.org/doc/lib/module-BaseHTTPServer.html If I can help any further let me know. Unfortunately I'll not be able to put more time into this before weekend. Cheers, Tibor On Wed, Jul 23, 2008 at 11:54 AM, Massimo Di Pierro wrote: > Hi Tibor, > > Could you send me a detailed paragraph about this? I believe it should go on > the manaual! > > Massimo > > > On Jul 23, 2008, at 2:41 AM, Tibor Arpas wrote: > >> Reverse DNS lookup is THE reason. Thank you very much Irmen. I put my >> remote computer into windows/system32/drivers/etc/hosts and the >> problem disapeared. The DNS name is indeed in the log which is written >> to the console. Thanks again. >> >> Is there a way to disable the reverse DNS lookup in the >> wsgiref.simple_server? Quick googling didn't reveal much. >> >> Tibor >> >> >> >> On Wed, Jul 23, 2008 at 12:21 AM, Irmen de Jong wrote: >>> >>> Tibor Arpas wrote: >>>> >>>> Hi, >>>> I'm quite new to python and I ran into a performance problem with >>>> wsgiref.simple_server. I'm running this little program. >>>> >>> [...] >>>> >>>> I get many hundreds of responses/second on my local computer, which is >>>> fine. >>>> But when I access this server through our VPN it performs very bad. >>> >>> Could it be that the wsgiref is doing a reverse DNS lookup for every >>> incoming call? >>> (for instance to determine the remote server hostname for some reason >>> such >>> as logging) >>> That could be a very slow operation. >>> >>> Just an idea, I have no idea about the workings of wsgiref, I've just >>> seen >>> this happening in other situations. >>> >>> --irmen >>> >>> _______________________________________________ >>> Web-SIG mailing list >>> Web-SIG at python.org >>> Web SIG: http://www.python.org/sigs/web-sig >>> Unsubscribe: >>> http://mail.python.org/mailman/options/web-sig/tibor%40infinit.sk >>> >> _______________________________________________ >> Web-SIG mailing list >> Web-SIG at python.org >> Web SIG: http://www.python.org/sigs/web-sig >> Unsubscribe: >> http://mail.python.org/mailman/options/web-sig/mdipierro%40cti.depaul.edu > > From manlio_perillo at libero.it Wed Jul 23 15:45:43 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 23 Jul 2008 15:45:43 +0200 Subject: [Web-SIG] problem with wsgiref.util.request_uri and decoded uri Message-ID: <48873607.3030307@libero.it> I'm having a nightmare with encoded/decoded uri and request_uri function: >>> from wsgiref.util import request_uri >>> environ = { ... 'HTTP_HOST': 'www.test.org', ... 'SCRIPT_NAME': '', ... 'PATH_INFO': '/b%40x/', ... 'wsgi.url_scheme': 'http' ... } >>> print request_uri(environ) http://www.test.org/b%2540x/ Here I'm assuming that the WSGI gateway *does* not decode the uri. The result of request_uri is incorrect, in this case. On the other hand, if the WSGI gateway *do* decode the uri, I can no more handle '/' in uri. I can usually avoid to have '/' in uri, but right now I'm implementing a WSGI application that implement a restfull interface to an SQL database: http://hg.mperillo.ath.cx/wsgix/file/tip/wsgix/contrib/sqltables.py so I can not avoid fields with '/' character in it. The proposed solution in a previous thread http://mail.python.org/pipermail/web-sig/2008-January/003122.html is to implement a custom encoding scheme (like done in MoinMoin). There are really no other good solutions? Assuming that WSGI requires the uri to not be encoded, then the solution is to do modify the request_uri function replacing: quote(SCRIPT_NAME) with: quote(unquote(SCRIPT_NAME)) ? Where can I find informations about alternate encoding scheme? Thanks Manlio Perillo From wilk at flibuste.net Wed Jul 23 23:06:13 2008 From: wilk at flibuste.net (William Dode) Date: Wed, 23 Jul 2008 21:06:13 +0000 (UTC) Subject: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network References: <19a7c8c20807210837u61e57a8cm3479bbb382f64635@mail.gmail.com> <19a7c8c20807210840t408dd89ahcb671e07e0a45a3d@mail.gmail.com> <48865D4C.9090307@xs4all.nl> <19a7c8c20807230041o1591ce4agcb70035aa4cbf7e7@mail.gmail.com> Message-ID: On 23-07-2008, Tibor Arpas wrote: > Reverse DNS lookup is THE reason. Thank you very much Irmen. I put my > remote computer into windows/system32/drivers/etc/hosts and the > problem disapeared. The DNS name is indeed in the log which is written > to the console. Thanks again. > > Is there a way to disable the reverse DNS lookup in the > wsgiref.simple_server? Quick googling didn't reveal much. I had this kind of problem on windows, try this : def getfqdn(name=''): return name import socket socket.getfqdn=getfqdn -- William Dod? - http://flibuste.net Informaticien ind?pendant From manlio_perillo at libero.it Mon Jul 28 20:48:35 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 28 Jul 2008 20:48:35 +0200 Subject: [Web-SIG] parsing of urlencoded data and Unicode Message-ID: <488E1483.2010204@libero.it> Hi. In my WSGI framework: http://hg.mperillo.ath.cx/wsgix I have, in the `http` module, the functions `parse_query_string` and `parse_simple_post_data`. The first parse the query string and return a dictionary of strings, the latter parse the application/x-www-form-urlencoded client body and return a dictionary of strings and the charset used by the client for the unicode encoding. Now, I'm thinking if these two function should instead return Unicode strings instead of plain strings. I think that Unicode strings should be returned, but I would like to know what other web frameworks do. Django seems to convert to Unicode, but the Python standard library does not (and I would like to know if changes are planned for Python 3.x). Thanks Manlio Perillo From janssen at parc.com Mon Jul 28 21:08:43 2008 From: janssen at parc.com (Bill Janssen) Date: Mon, 28 Jul 2008 12:08:43 PDT Subject: [Web-SIG] some much-deferred admin of web-sig list... Message-ID: <08Jul28.120843pdt."58698"@synergy1.parc.xerox.com> I've just cleared the queue of admin tasks for the Web-SIG list, so don't be surprised to see some old messages appear... Bill From robillard.etienne at gmail.com Mon Jul 28 21:52:17 2008 From: robillard.etienne at gmail.com (Etienne Robillard) Date: Mon, 28 Jul 2008 15:52:17 -0400 Subject: [Web-SIG] Could WSGI handle Asynchronous response? In-Reply-To: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> References: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> Message-ID: <20080728155217.6c6c1175@fluke> On Mon, 18 Feb 2008 04:23:38 -0800 (PST) est wrote: > I am writing a small 'comet'-like app using flup, something like > this: > > def myapp(environ, start_response): > start_response('200 OK', [('Content-Type', 'text/plain')]) > return ['Flup works!\n'] <-------------Could this be part > of response output? Could I time.sleep() for a while then write other > outputs? > > > if __name__ == '__main__': > from flup.server.fcgi import WSGIServer > WSGIServer(myapp, multiplexed=True, bindAddress=('0.0.0.0', > 8888)).run() > > > So is WSGI really synchronous? How can I handle asynchronous outputs > with flup/WSGI ? maybe start by looking here: http://twistedmatrix.com/trac/browser/trunk/twisted/web2/wsgi.py Regards, Etienne From ianb at colorstudy.com Mon Jul 28 22:40:42 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 28 Jul 2008 15:40:42 -0500 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488E1483.2010204@libero.it> References: <488E1483.2010204@libero.it> Message-ID: <488E2ECA.1020008@colorstudy.com> Manlio Perillo wrote: > Hi. > > In my WSGI framework: > http://hg.mperillo.ath.cx/wsgix > > I have, in the `http` module, the functions `parse_query_string` and > `parse_simple_post_data`. > > The first parse the query string and return a dictionary of strings, the > latter parse the application/x-www-form-urlencoded client body and > return a dictionary of strings and the charset used by the client for > the unicode encoding. > > > Now, I'm thinking if these two function should instead return Unicode > strings instead of plain strings. > > I think that Unicode strings should be returned, but I would like to > know what other web frameworks do. > > Django seems to convert to Unicode, but the Python standard library does > not (and I would like to know if changes are planned for Python 3.x). WebOb decodes to request data to str, then lazily decodes to unicode based on the request encoding. The request encoding is a bit fuzzy to calculate, which is part of why the decoding is lazy, so that the request encoding can be set or changed at any time. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From manlio_perillo at libero.it Mon Jul 28 23:40:00 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 28 Jul 2008 23:40:00 +0200 Subject: [Web-SIG] Could WSGI handle Asynchronous response? In-Reply-To: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> References: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> Message-ID: <488E3CB0.4050402@libero.it> est ha scritto: > I am writing a small 'comet'-like app using flup, something like > this: > > def myapp(environ, start_response): > start_response('200 OK', [('Content-Type', 'text/plain')]) > return ['Flup works!\n'] <-------------Could this be part > of response output? What do you mean by "part of response output"? > Could I time.sleep() for a while then write other > outputs? > Not with flup. > > if __name__ == '__main__': > from flup.server.fcgi import WSGIServer > WSGIServer(myapp, multiplexed=True, bindAddress=('0.0.0.0', > 8888)).run() > > > So is WSGI really synchronous? Not really. Since you can return a generator, it's possible to support asynchronous programming, but the WSGI gateway must support it, as an example with Nginx mod_wsgi and some other implementations (search in the mailing list archive). But this support has not been standardized. > How can I handle asynchronous outputs > with flup/WSGI ? Regards Manlio Perillo From manlio_perillo at libero.it Mon Jul 28 23:42:26 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 28 Jul 2008 23:42:26 +0200 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488E2ECA.1020008@colorstudy.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> Message-ID: <488E3D42.5010707@libero.it> Ian Bicking ha scritto: > Manlio Perillo wrote: >> Hi. >> >> In my WSGI framework: >> http://hg.mperillo.ath.cx/wsgix >> >> I have, in the `http` module, the functions `parse_query_string` and >> `parse_simple_post_data`. >> >> The first parse the query string and return a dictionary of strings, the >> latter parse the application/x-www-form-urlencoded client body and >> return a dictionary of strings and the charset used by the client for >> the unicode encoding. >> >> >> Now, I'm thinking if these two function should instead return Unicode >> strings instead of plain strings. >> >> I think that Unicode strings should be returned, but I would like to >> know what other web frameworks do. >> >> Django seems to convert to Unicode, but the Python standard library >> does not (and I would like to know if changes are planned for Python >> 3.x). > > WebOb decodes to request data to str, then lazily decodes to unicode > based on the request encoding. The request encoding is a bit fuzzy to > calculate, which is part of why the decoding is lazy, so that the > request encoding can be set or changed at any time. > Ok, thanks. In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset specified in the POST'ed data (utf-8 or the charset found in the special _charset_ field). Manlio Perillo From dsposx at mac.com Tue Jul 29 01:57:19 2008 From: dsposx at mac.com (Donovan Preston) Date: Mon, 28 Jul 2008 16:57:19 -0700 Subject: [Web-SIG] Could WSGI handle Asynchronous response? In-Reply-To: <20080728155217.6c6c1175@fluke> References: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> <20080728155217.6c6c1175@fluke> Message-ID: On Jul 28, 2008, at 12:52 PM, Etienne Robillard wrote: > On Mon, 18 Feb 2008 04:23:38 -0800 (PST) > est wrote: > >> I am writing a small 'comet'-like app using flup, something like >> this: >> So is WSGI really synchronous? How can I handle asynchronous outputs >> with flup/WSGI ? WSGI says that the entire body should be written by the time the wsgi application returns. So yes it is really synchronous; as Manlio Perillo said in another message it is possible to abuse generators to allow a wsgi application to operate in the fashion you desire, but both the server and the application have to know how to do this and there is no standardization yet. > maybe start by looking here: http://twistedmatrix.com/trac/browser/trunk/twisted/web2/wsgi.py web2.wsgi's server doesn't really get around the problem. While it does non-blocking i/o for the http request and response, it actually calls the wsgi application in a threadpool, because there's no way for the wsgi application to return before having generated all of the response, and even if there were people's wsgi applications don't work this way. You might want to check out orbited (http://www.orbited.org/), which doesn't have anything to do with wsgi, but is a Python comet server implemented entirely with non-blocking i/o (using libevent). However, if you are willing to spend some time getting a custom comet server up and running, you could take a look at eventlet (http://pypi.python.org/pypi/eventlet/ ) and spawning (http://pypi.python.org/pypi/Spawning/). I've been working on eventlet for a couple of years precisely to make implementing scalable and easy to maintain comet applications possible. Here's a simple Comet server that uses spawning and eventlet. This will give you a comet server that scales to tons of simultaneous connections, because eventlet mashes together greenlet (coroutines, or light-weight cooperative threads) with non-blocking i/o (select, poll, libevent, or libev). This is how Spawning can be used to get around the wsgi restriction that the entire body should be written by the time the wsgi application returns; since spawning uses greenlets instead of posix threads for each wsgi request when --threads=0 is passed, many simultaneous wsgi applications can be running waiting for Comet events with very little memory and CPU overhead. Save it in a file called spawningcomet.py and run it with: spawn spawningcomet.wsgi_application --threads=0 Then, visit http://localhost:8080 in your browser and run this in another terminal: python spawningcomet.py hello world ## spawningcomet.py import struct import sys import uuid from eventlet import api from eventlet import coros SEND_EVENT_INTERFACE = '' SEND_EVENT_PORT = 4200 HTML_TEMPLATE = """

Dynamic content will appear below

""" class Comet(object): def __init__(self): api.spawn( api.tcp_server, api.tcp_listener((SEND_EVENT_INTERFACE, SEND_EVENT_PORT)), self.read_events_forever) self.current_event = {'event': coros.event(), 'next': None} self.first_event_id = str(uuid.uuid1()) self.events = {self.first_event_id: self.current_event} def read_events_forever(self, (sock, addr)): reader = sock.makefile('r') try: while True: ## Read the next event value out of the socket valuelen = reader.read(4) if not valuelen: break valuelen, = struct.unpack('!L', valuelen) value = reader.read(valuelen) ## Make a new event and link the current event to it old_event = self.current_event old_event['next'] = str(uuid.uuid1()) self.current_event = { 'event': coros.event(), 'next': None} self.events[old_event['next']] = self.current_event ## Send the event value to any waiting http requests old_event['event'].send(value) finally: reader.close() sock.close() def __call__(self, env, start_response): if env['REQUEST_METHOD'] != 'GET': start_response('405 Method Not Allowed', [('Content- type', 'text/plain')]) return ['Method Not Allowed\n'] if not env['PATH_INFO'] or env['PATH_INFO'] == '/': start_response('200 OK', [('Content-type', 'text/html')]) return HTML_TEMPLATE % (self.first_event_id, ) event = self.events.get(env['PATH_INFO'][1:], None) if event is None: start_response('404 Not Found', [('Content-type', 'text/ plain')]) return ['Not Found\n'] value = event['event'].wait() start_response('200 OK', [ ('Content-type', 'text/plain'), ('X-Next-Event', event['next'])]) return [value, '\n'] def send_event(where, value): sock = api.connect_tcp(where) writer = sock.makefile('w') writer.write('%s%s' % (struct.pack('!L', len(value)), value)) if __name__ == '__main__': if len(sys.argv) > 1: value = ' '.join(sys.argv[1:]) else: value = sys.stdin.read() send_event((SEND_EVENT_INTERFACE, SEND_EVENT_PORT), value) else: wsgi_application = Comet() From pje at telecommunity.com Tue Jul 29 03:16:26 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 28 Jul 2008 21:16:26 -0400 Subject: [Web-SIG] Could WSGI handle Asynchronous response? In-Reply-To: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegro ups.com> References: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> Message-ID: <20080729011539.8DE773A40A0@sparrow.telecommunity.com> At 04:23 AM 2/18/2008 -0800, est wrote: >I am writing a small 'comet'-like app using flup, something like >this: > >def myapp(environ, start_response): > start_response('200 OK', [('Content-Type', 'text/plain')]) > return ['Flup works!\n'] <-------------Could this be part >of response output? Could I time.sleep() for a while then write other >outputs? > > >if __name__ == '__main__': > from flup.server.fcgi import WSGIServer > WSGIServer(myapp, multiplexed=True, bindAddress=('0.0.0.0', >8888)).run() > > >So is WSGI really synchronous? How can I handle asynchronous outputs >with flup/WSGI ? You are confusing "asynchronous" with "streaming". WSGI is synchronous, but allows streaming and "server push". Instead of returning a sequence, code your application as an iterator that yields output chunks. It is "synchronous" in the sense that if you sleep or do processing in between yielded output chunks, you will prevent the server from freeing any resources associated with your application, or from doing any other work in the current thread. A properly-designed WSGI server should continue to function, as long as all available resources aren't consumed... which in the case of "push" apps could easily make your box fall over, regardless of whether WSGI is involved. :) From pje at telecommunity.com Tue Jul 29 03:21:18 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 28 Jul 2008 21:21:18 -0400 Subject: [Web-SIG] Could WSGI handle Asynchronous response? In-Reply-To: References: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> <20080728155217.6c6c1175@fluke> Message-ID: <20080729012029.1221B3A40A0@sparrow.telecommunity.com> At 04:57 PM 7/28/2008 -0700, Donovan Preston wrote: >On Jul 28, 2008, at 12:52 PM, Etienne Robillard wrote: > >>On Mon, 18 Feb 2008 04:23:38 -0800 (PST) >>est wrote: >> >>>I am writing a small 'comet'-like app using flup, something like >>>this: > >>>So is WSGI really synchronous? How can I handle asynchronous outputs >>>with flup/WSGI ? > >WSGI says that the entire body should be written by the time the wsgi >application returns. No, it doesn't. It says that all your write() calls must be done by then, which is not at all the same thing. If the application returns an iterator, that iterator can keep yielding outputs until the (figurative) cows come home. > So yes it is really synchronous; as Manlio >Perillo said in another message it is possible to abuse generators to >allow a wsgi application to operate in the fashion you desire, but >both the server and the application have to know how to do this and >there is no standardization yet. This is confusing asynchronous APIs, non-blocking behavior, and streaming output. A WSGI application can avoid blocking the server by yielding empty strings until it is ready to produce more output. (This may not provide any performance benefit over sleep() however, and may in some circumstances be worse.) There is no async API that's part of WSGI itself, and it's unlikely there will ever be one unless there ends up being an async API for Python as well. (By the way, using a generator to produce streaming output is not abuse: it is the *intended* use of iterables in WSGI!) From janssen at parc.com Tue Jul 29 03:25:54 2008 From: janssen at parc.com (Bill Janssen) Date: Mon, 28 Jul 2008 18:25:54 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488E1483.2010204@libero.it> References: <488E1483.2010204@libero.it> Message-ID: <08Jul28.182555pdt."58698"@synergy1.parc.xerox.com> > The first parse the query string and return a dictionary of strings, the > latter parse the application/x-www-form-urlencoded client body and > return a dictionary of strings and the charset used by the client for > the unicode encoding. > Now, I'm thinking if these two function should instead return Unicode > strings instead of plain strings. I'd say, yes. I do this in my framework, which also decodes query strings and post bodies (and handles multipart/form-data as well as x-www-form-urlencoded). Note that while x-www-form-urlencoded is generally restricted to ASCII values by the HTML 4.01 spec, multipart/form-data can contain arbitrary Unicode strings. In Python 3.x, strings are all Unicode. Bill From janssen at parc.com Tue Jul 29 03:32:44 2008 From: janssen at parc.com (Bill Janssen) Date: Mon, 28 Jul 2008 18:32:44 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488E3D42.5010707@libero.it> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> Message-ID: <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> > In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset > specified in the POST'ed data (utf-8 or the charset found in the special > _charset_ field). That's probably wrong. We went through this recently on the python-dev list. While it's possible to tell the encoding of multipart/form-data, the query_string and x-www-form-urlencoded data may be in arbitary character set encodings (see RFC 3986). It's probably best to not try to map them to strings; instead, return byte arrays for the value, and only return strings for data that can be correctly decoded. Otherwise, you lose information that the app cannot recover. Bill From dsposx at mac.com Tue Jul 29 04:16:27 2008 From: dsposx at mac.com (Donovan Preston) Date: Mon, 28 Jul 2008 19:16:27 -0700 Subject: [Web-SIG] Could WSGI handle Asynchronous response? In-Reply-To: <20080729012029.1221B3A40A0@sparrow.telecommunity.com> References: <6c3c7df3-4ad3-4917-853c-b40050a6a7da@h11g2000prf.googlegroups.com> <20080728155217.6c6c1175@fluke> <20080729012029.1221B3A40A0@sparrow.telecommunity.com> Message-ID: <28A0D38D-93D1-4791-9B80-9EA6D7E4B42E@mac.com> On Jul 28, 2008, at 6:21 PM, Phillip J. Eby wrote: > At 04:57 PM 7/28/2008 -0700, Donovan Preston wrote: > >> On Jul 28, 2008, at 12:52 PM, Etienne Robillard wrote: >> >>> On Mon, 18 Feb 2008 04:23:38 -0800 (PST) >>> est wrote: >>> >>>> I am writing a small 'comet'-like app using flup, something like >>>> this: >> >>>> So is WSGI really synchronous? How can I handle asynchronous >>>> outputs >>>> with flup/WSGI ? >> >> WSGI says that the entire body should be written by the time the wsgi >> application returns. > > No, it doesn't. It says that all your write() calls must be done by > then, which is not at all the same thing. If the application > returns an iterator, that iterator can keep yielding outputs until > the (figurative) cows come home. Hmm, I see what you are saying. I hadn't thought about returning an iterable instead of just using a generator. Cool. >> So yes it is really synchronous; as Manlio >> Perillo said in another message it is possible to abuse generators to >> allow a wsgi application to operate in the fashion you desire, but >> both the server and the application have to know how to do this and >> there is no standardization yet. > > This is confusing asynchronous APIs, non-blocking behavior, and > streaming output. A WSGI application can avoid blocking the server > by yielding empty strings until it is ready to produce more output. > (This may not provide any performance benefit over sleep() however, > and may in some circumstances be worse.) You're right. But continually yielding empty strings is basically busy- waiting, which would result in terrible performance, as you mention. > There is no async API that's part of WSGI itself, and it's unlikely > there will ever be one unless there ends up being an async API for > Python as well. I know this has been discussed before on the list and I wasn't really paying attention enough to know what was proposed, but it seems to me that just having a well-defined way for the application to tell the server when to resume the iterable is possible. Manlio has come up with an API for this in his nginx mod_wsgi. For example, something like the interface to select could be used: def foo(env, start_response): my_sock = socket.socket() my_sock.setblocking(0) my_sock.connect((...)) r, w, e = yield [[my_sock.fileno()], [], [my_sock.fileno()]] if e: ... bytes = my_sock.recv(4096) This requires 2.5's extended generators, but the file descriptor readiness lists could be put in the environ before resuming the iterator for people who don't want to or can't move to 2.5. This is just an example, I think Manlio's api (which is more like poll if I remember correctly) is better. Really I don't actually care, since eventlet and greenlet let me mash together wsgi applications written with blocking i/o style with an http server that does non-blocking i/o. > (By the way, using a generator to produce streaming output is not > abuse: it is the *intended* use of iterables in WSGI!) Nice. Donovan From manlio_perillo at libero.it Tue Jul 29 08:06:17 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 29 Jul 2008 08:06:17 +0200 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> Message-ID: <488EB359.5090509@libero.it> Bill Janssen ha scritto: >> In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset >> specified in the POST'ed data (utf-8 or the charset found in the special >> _charset_ field). > > That's probably wrong. We went through this recently on the > python-dev list. While it's possible to tell the encoding of > multipart/form-data, With multipart/form-data the problem should be the same. The content type is defined only for file fields. > the query_string and x-www-form-urlencoded data > may be in arbitary character set encodings (see RFC 3986). It's > probably best to not try to map them to strings; instead, return byte > arrays for the value, and only return strings for data that can be > correctly decoded. Otherwise, you lose information that the app > cannot recover. > Interesting, thanks. I have read Django code and, as far as I can tell, it always decode data to strings, but using "replace" error handling. Can you point me to the discussion on python-dev list? > Bill > Manlio Perillo From janssen at parc.com Tue Jul 29 18:21:09 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jul 2008 09:21:09 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488EB359.5090509@libero.it> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> Message-ID: <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> > > That's probably wrong. We went through this recently on the > > python-dev list. While it's possible to tell the encoding of > > multipart/form-data, > > With multipart/form-data the problem should be the same. > The content type is defined only for file fields. Actually, it's defined for all fields, isn't it? From RFC 2388: ``As with all multipart MIME types, each part has an optional "Content-Type", which defaults to text/plain.'' So the type is "text/plain" unless it says something else. And, according to RFC 2046, the default charset for "text/plain" is "US-ASCII". > Can you point me to the discussion on python-dev list? See http://mail.python.org/pipermail/python-dev/2008-July/081013.html and the subsequent conversation. And http://mail.python.org/pipermail/python-dev/2008-July/081066.html and the reply to that. Bill From manlio_perillo at libero.it Tue Jul 29 18:39:10 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 29 Jul 2008 18:39:10 +0200 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> Message-ID: <488F47AE.7000708@libero.it> Bill Janssen ha scritto: >>> That's probably wrong. We went through this recently on the >>> python-dev list. While it's possible to tell the encoding of >>> multipart/form-data, >> With multipart/form-data the problem should be the same. >> The content type is defined only for file fields. > > Actually, it's defined for all fields, isn't it? From RFC 2388: > > ``As with all multipart MIME types, each part has an optional > "Content-Type", which defaults to text/plain.'' > > So the type is "text/plain" unless it says something else. And, > according to RFC 2046, the default charset for "text/plain" is > "US-ASCII". > Ok with theory. But in practice:

Content-Type: multipart/form-data; boundary=abcde abcde Content-Disposition: form-data; name="Title" hello abcde Content-Disposition: form-data; name="body" ? ???????? abcde In theory I should assume ascii encoded data for the body field; and since this data can not be decoded, I should assume it as byte string. However the body field is encoded in utf-8, and if I add an hidden _charset_ field, FF and IE add this field in the response, with the charset used in the encoding. I think that it is safe to decode data from the QUERY_STRING and POST data to Unicode, and to return Bad Request in case of errors. If the user have specialized needs, he can use low level parsing functions. In wsgix the "high" level functions are parse_query_string and parse_simple_post_data; the "low" level function is parse_qs. > [...] Thanks Manlio Perillo From janssen at parc.com Tue Jul 29 19:14:05 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jul 2008 10:14:05 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488F47AE.7000708@libero.it> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> <488F47AE.7000708@libero.it> Message-ID: <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> > Ok with theory. > But in practice: Seems like you're looking at a broken browser there. Can anyone point to where a W3C standard or IETF RFC describes this behavior? > I think that it is safe to decode data from the QUERY_STRING and POST=20 > data to Unicode, and to return Bad Request in case of errors. It's clearly not safe to do so generally. If you do decide to do this, please tell me what framework you're building so that I can avoid it :-). Bill From janssen at parc.com Tue Jul 29 19:40:25 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jul 2008 10:40:25 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> <488F47AE.7000708@libero.it> <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> Message-ID: <08Jul29.104027pdt."58698"@synergy1.parc.xerox.com> > > Ok with theory. > > But in practice: > > Seems like you're looking at a broken browser there. Ah, I see that the Firefox people, at least, are aware that this is a bug in Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=116346 But they haven't found a fix for it yet, because of the large number of badly implemented server frameworks that are out there. Bill From manlio_perillo at libero.it Tue Jul 29 19:55:11 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 29 Jul 2008 19:55:11 +0200 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> <488F47AE.7000708@libero.it> <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> Message-ID: <488F597F.3000102@libero.it> Bill Janssen ha scritto: >> Ok with theory. >> But in practice: > > Seems like you're looking at a broken browser there. > Right. It's Firefox. But it's the same with IE 6 and Opera. > Can anyone point to where a W3C standard or IETF RFC describes this > behavior? > >> I think that it is safe to decode data from the QUERY_STRING and POST=20 >> data to Unicode, and to return Bad Request in case of errors. > > It's clearly not safe to do so generally. If you do decide to do > this, please tell me what framework you're building so that I can > avoid it :-). > No, wait. I don't blindly guess the encoding. I first try the content-type header, then the special _charset_ field, and finally utf-8. If there is a problem in the decoding, the client is broken (or there is a bug in the application). So the correct response is Bad Request, IMHO. > Bill > Manlio Perillo From manlio_perillo at libero.it Tue Jul 29 20:41:52 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 29 Jul 2008 20:41:52 +0200 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> <488F47AE.7000708@libero.it> <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> Message-ID: <488F6470.3000404@libero.it> James Y Knight ha scritto: > On Jul 29, 2008, at 1:14 PM, Bill Janssen wrote: > >>> Ok with theory. >>> But in practice: >> >> Seems like you're looking at a broken browser there. >> >> Can anyone point to where a W3C standard or IETF RFC describes this >> behavior? > > You seem to be under the mistaken impression that form post content is > MIME. It is not. It looks kinda like it should be, and maybe it's even > specified to be [rfc2388], but actually treating it as MIME is a rather > critical error. RFC2388 is just wrong, don't believe a thing it says. > But, at this point, can one consider the content of form post to be encoded "text" string? Or it should be considered encoded "byte" string? > [...] Manlio Perillo From deron.meranda at gmail.com Tue Jul 29 20:58:26 2008 From: deron.meranda at gmail.com (Deron Meranda) Date: Tue, 29 Jul 2008 14:58:26 -0400 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488F47AE.7000708@libero.it> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> Message-ID: <5c06fa770807291158l37c09464kc5c61aa66e5e66f5@mail.gmail.com> On Tue, Jul 29, 2008 at 12:39 PM, Manlio Perillo wrote: > Bill Janssen ha scritto: >> Actually, it's defined for all fields, isn't it? From RFC 2388: >> >> ``As with all multipart MIME types, each part has an optional >> "Content-Type", which defaults to text/plain.'' >> >> So the type is "text/plain" unless it says something else. And, >> according to RFC 2046, the default charset for "text/plain" is >> "US-ASCII". > > Ok with theory. > But in practice: > > enctype="multipart/form-data"> > [...] > > In theory I should assume ascii encoded data for the body field; and since > this data can not be decoded, I should assume it as byte string. > > However the body field is encoded in utf-8, and if I add an hidden _charset_ > field, FF and IE add this field in the response, with the charset used in > the encoding. >From what I've seen, most user agents fail to send a Content-Type, much less a charset parameter. Many will also ignore the accept-charset attribute. However most browsers will respectfully send the text fields in a POST response in the same character set that the page which contained the element was sent to the browser to begin with. So if you output HTML pages in UTF-8, the text portions of post messages will be returned in UTF-8. It's not following any standard, but its the way things seem to work. I would think it most useful if the decoding framework would strictly follow the RFC and assume "text/plain; charset=US-ASCII"; but also allow the caller some means of indicating a different default. Obviously, if a user agent does provide a complete Content-Type, it should be used. -- Deron Meranda From foom at fuhm.net Tue Jul 29 20:31:51 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 29 Jul 2008 14:31:51 -0400 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> <488F47AE.7000708@libero.it> <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> Message-ID: <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> On Jul 29, 2008, at 1:14 PM, Bill Janssen wrote: >> Ok with theory. >> But in practice: > > Seems like you're looking at a broken browser there. > > Can anyone point to where a W3C standard or IETF RFC describes this > behavior? You seem to be under the mistaken impression that form post content is MIME. It is not. It looks kinda like it should be, and maybe it's even specified to be [rfc2388], but actually treating it as MIME is a rather critical error. RFC2388 is just wrong, don't believe a thing it says. At this point, calling it a bug in any particular browser is rather foolish, since none of them actually write proper MIME output. It should really be considered a bug in the RFC, instead. James From deron.meranda at gmail.com Tue Jul 29 21:18:43 2008 From: deron.meranda at gmail.com (Deron Meranda) Date: Tue, 29 Jul 2008 15:18:43 -0400 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488F6470.3000404@libero.it> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> <488F6470.3000404@libero.it> Message-ID: <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> On Tue, Jul 29, 2008 at 2:41 PM, Manlio Perillo wrote: > James Y Knight ha scritto: >> You seem to be under the mistaken impression that form post content is >> MIME. It is not. It looks kinda like it should be, and maybe it's even >> specified to be [rfc2388], but actually treating it as MIME is a rather >> critical error. RFC2388 is just wrong, don't believe a thing it says. In what way is RFC 2388 wrong or not MIME? Per RFC 2388 sect. 3: "The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in [RFC 2046]." So it is MIME, right? You may be referring to the much older "experimental" RFC 1867, upon which 2388 is based. It merely said it was a "MIME compatible representation". But even then the intent was clearly to be MIME. Now you can successfully argue that many user agents do not follow the RFC carefully enough. But that's not a problem with the RFC itself. > But, at this point, can one consider the content of form post to be encoded > "text" string? > > Or it should be considered encoded "byte" string? Both/either. I'd say follow the RFC, but perhaps allow a caller to provide an override default. So yes, you should assume an encoded string if the subpart has a text/* Content-Type, or if it has no content type at all (which must then be assumed to be text/plain US-ASCII). That is the intent of the MIME text/* media type after all; that it should be interpreted as a character string and not a byte string. In other cases, I would say returning a byte string is the correct thing to do. Also I'd say that if you're dealing with text (text/*) and no charset is provided (or the caller hasn't given an override default charset); then you must assume US-ASCII. And you should allow any UnicodeDecodeErrors to bubble up to the caller. In other words if a user agent sent text in ISO-8859-x and didn't say it was doing so, then an error should be raised when non-ASCII data is seen. -- Deron Meranda From manlio_perillo at libero.it Tue Jul 29 21:50:45 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 29 Jul 2008 21:50:45 +0200 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> <488F6470.3000404@libero.it> <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> Message-ID: <488F7495.5010500@libero.it> Deron Meranda ha scritto: > [...] >> But, at this point, can one consider the content of form post to be encoded >> "text" string? >> >> Or it should be considered encoded "byte" string? > > Both/either. > > I'd say follow the RFC, but perhaps allow a caller to provide > an override default. So yes, you should assume an encoded > string if the subpart has a text/* Content-Type, or if it has no > content type at all (which must then be assumed to be text/plain > US-ASCII). That is the intent of the MIME text/* media type > after all; that it should be interpreted as a character string > and not a byte string. > > In other cases, I would say returning a byte string is the > correct thing to do. > I'm not sure to understand. If you want non text data in the POST request body, you can use the file control. I can't really see use cases of normal input fields having byte strings. > [...] Manlio Perillo From foom at fuhm.net Tue Jul 29 22:04:04 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 29 Jul 2008 16:04:04 -0400 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> <488F6470.3000404@libero.it> <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> Message-ID: On Jul 29, 2008, at 3:18 PM, Deron Meranda wrote: > In what way is RFC 2388 wrong or not MIME? > > Per RFC 2388 sect. 3: > "The media-type multipart/form-data follows the rules of all > multipart > MIME data streams as outlined in [RFC 2046]." > > So it is MIME, right? No: RFC 2388 says it is MIME, but in real life it is not. RFC 2388 is wrong. > > Now you can successfully argue that many user agents do not > follow the RFC carefully enough. But that's not a problem with > the RFC itself. Common practice is by now long established, and cannot simply be changed 10 years after the fact to conform to what the standard says it should've been. Therefore, it *is* now a problem with the standard: the standard is wrong. If you follow it, you're going to create totally broken software. For instance, treating form posts as being 7bit unless they have a Content-Transfer-Encoding. The RFC says you should do that. But it's an absolutely nonsensical thing to do. Your code would not work with any existing web browser if you did. Or, if you're writing a web browser: don't even think of using Content-Transfer-Encoding to encode your response. Few servers/frameworks would understand your submission if you tried. > But, at this point, can one consider the content of form post to be > encoded "text" string? > > Or it should be considered encoded "byte" string? I'd recommend that it should be, certainly at the lower levels. A higher level API can look at the hints available to figure out how to decode the non-file fields: e.g.: if the magic _charset_ parameter is present, use that, otherwise use what the developer tells you they put in accept-charset / what encoding they sent the page in. James From deron.meranda at gmail.com Tue Jul 29 22:12:10 2008 From: deron.meranda at gmail.com (Deron Meranda) Date: Tue, 29 Jul 2008 16:12:10 -0400 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488F7495.5010500@libero.it> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> <488F6470.3000404@libero.it> <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> <488F7495.5010500@libero.it> Message-ID: <5c06fa770807291312q15fc40eaj6e79392edc5d9b2a@mail.gmail.com> On Tue, Jul 29, 2008 at 3:50 PM, Manlio Perillo wrote: > Deron Meranda ha scritto: >> >> [...] >>> >>> But, at this point, can one consider the content of form post to be >>> encoded >>> "text" string? >>> >>> Or it should be considered encoded "byte" string? >> >> Both/either. >> >> I'd say follow the RFC, but perhaps allow a caller to provide >> an override default. So yes, you should assume an encoded >> string if the subpart has a text/* Content-Type, or if it has no >> content type at all (which must then be assumed to be text/plain >> US-ASCII). That is the intent of the MIME text/* media type >> after all; that it should be interpreted as a character string >> and not a byte string. >> >> In other cases, I would say returning a byte string is the >> correct thing to do. >> > > I'm not sure to understand. > If you want non text data in the POST request body, you can use the file > control. I don't think we're disagreeing. In HTML, an input element with type=file will result in non-text; e.g., should result in a byte stream (ignoring the possibility of uploading text files, which are permitted but not required to have a text/* content type). But on the other hand an input with type=text or type=password should definitely result in a character string, not a byte string. Same with a textarea element. It's less clear what input type=checkbox or type=radio should give, but I think it's safe to assume a character string. Either way, the parser of the multipart/form-data has no idea what the original HTML looked like; it only has the posted MIME structure and headers to go by. In my suggestion, only if there is a Content-Type header on the subpart, and only then if it is not of text/*, then you would return a byte string. Everything else should result in a character string. But you just can't only pick one return type; sometimes you have bytes and other times you have characters. > I can't really see use cases of normal input fields having byte strings. In HTML, no. Only input with type=file should ever result in a content type other than text. However don't forget that not all POSTs with multipart/form-data have to be the result of an HTML page. So a generic consumer of multipart/form-data can't make such assumptions; hence why it should just follow the RFC; with possible caller-specified overrides to compensate for the real-world not matching the RFC spec. -- Deron Meranda From janssen at parc.com Tue Jul 29 22:18:51 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jul 2008 13:18:51 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <5c06fa770807291158l37c09464kc5c61aa66e5e66f5@mail.gmail.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <5c06fa770807291158l37c09464kc5c61aa66e5e66f5@mail.gmail.com> Message-ID: <08Jul29.131858pdt."58698"@synergy1.parc.xerox.com> > I would think it most useful if the decoding framework would strictly > follow the RFC and assume "text/plain; charset=US-ASCII"; but > also allow the caller some means of indicating a different default. > Obviously, if a user agent does provide a complete Content-Type, > it should be used. Yes, I agree. Bill From janssen at parc.com Tue Jul 29 22:17:53 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jul 2008 13:17:53 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <488F597F.3000102@libero.it> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <08Jul28.183250pdt."58698"@synergy1.parc.xerox.com> <488EB359.5090509@libero.it> <08Jul29.092116pdt."58698"@synergy1.parc.xerox.com> <488F47AE.7000708@libero.it> <08Jul29.101407pdt."58698"@synergy1.parc.xerox.com> <488F597F.3000102@libero.it> Message-ID: <08Jul29.131802pdt."58698"@synergy1.parc.xerox.com> > I first try the content-type header, Right. > then the special _charset_ field, I don't know what that is. Can you explain a bit more? > and finally utf-8. That's wrong. Should be ASCII. You could add an "encoding" field to let the application override this, though. But the default is ASCII. > If there is a problem in the decoding, the client is broken (or there is > a bug in the application). > So the correct response is Bad Request, IMHO. Yes, I think that's right. Bill From janssen at parc.com Tue Jul 29 22:20:17 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jul 2008 13:20:17 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> <488F6470.3000404@libero.it> <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> Message-ID: <08Jul29.132023pdt."58698"@synergy1.parc.xerox.com> > Also I'd say that if you're dealing with text (text/*) and no > charset is provided (or the caller hasn't given an override > default charset); then you must assume US-ASCII. And > you should allow any UnicodeDecodeErrors to bubble > up to the caller. In other words if a user agent sent text > in ISO-8859-x and didn't say it was doing so, then an > error should be raised when non-ASCII data is seen. Yep. Bill From janssen at parc.com Tue Jul 29 22:22:51 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jul 2008 13:22:51 PDT Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> <488F6470.3000404@libero.it> <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> Message-ID: <08Jul29.132258pdt."58698"@synergy1.parc.xerox.com> > Common practice is by now long established, and cannot simply be > changed 10 years after the fact to conform to what the standard says > it should've been. Therefore, it *is* now a problem with the standard: > the standard is wrong. If you follow it, you're going to create > totally broken software. > > For instance, treating form posts as being 7bit unless they have a > Content-Transfer-Encoding. The RFC says you should do that. But it's > an absolutely nonsensical thing to do. Your code would not work with > any existing web browser if you did. Or, if you're writing a web > browser: don't even think of using Content-Transfer-Encoding to encode > your response. Few servers/frameworks would understand your submission > if you tried. I had lots of various charset errors with UpLib, as people tried various broken browsers, because I was trying to guess "common practice" and follow it. Until I actually read the RFCs and made the server follow them. Now that it does, almost all of those errors have gone away. So, my experience seems to differ from yours. Bill From deron.meranda at gmail.com Tue Jul 29 23:02:10 2008 From: deron.meranda at gmail.com (Deron Meranda) Date: Tue, 29 Jul 2008 17:02:10 -0400 Subject: [Web-SIG] parsing of urlencoded data and Unicode In-Reply-To: References: <488E1483.2010204@libero.it> <488E2ECA.1020008@colorstudy.com> <488E3D42.5010707@libero.it> <488EB359.5090509@libero.it> <488F47AE.7000708@libero.it> <648398F9-6D51-49C8-965F-2175BBCFAFB7@fuhm.net> <488F6470.3000404@libero.it> <5c06fa770807291218v6ebb0436o4436a6e13e75ae7c@mail.gmail.com> Message-ID: <5c06fa770807291402v26cc5935t81d29d1451d6daf6@mail.gmail.com> On Tue, Jul 29, 2008 at 4:04 PM, James Y Knight wrote: >> So it is MIME, right? > > No: RFC 2388 says it is MIME, but in real life it is not. RFC 2388 is wrong. I think this is a problem of semantics; what you mean by "wrong". The RFC is not wrong, in terms of it having a technical inaccuracy or needing a errata. Which by the way none have been issued so far, http://www.rfc-editor.org/errata_search.php?rfc=2388 It may be "wrong" only in terms of it being ignored by the authors of software. I'd tend to use a different less-misleading term though. I think it more appropriate to call the software which purports to adhere to HTTP 1.1 (and hence it's dependent specs like RFC 2388) to be "wrong". >> Now you can successfully argue that many user agents do not >> follow the RFC carefully enough. But that's not a problem with >> the RFC itself. > > Common practice is by now long established, and cannot simply be changed 10 > years after the fact to conform to what the standard says it should've been. I'm not so sure. Granted this is a problem for the browser guys and not us Python people. Ragarding timelines; the multipart/form-data RFC 2388 was written in 1988. The HTTP 1.1 came after that. And both of these specs are around 10 years old, while most browsers today are in fact the newcomers; not the other way around. The RFC isn't trying to rewrite facts; it came first. I'm sure there's lots of other places where browsers today do not adhere to the RFC specs; so do we say the specs are wrong or that the browsers have bugs? (I'm not talking about W3C stuff; that's clearly not as straight forward as RFCs) > Therefore, it *is* now a problem with the standard: the standard is wrong. > If you follow it, you're going to create totally broken software. I don't think we're there. Although many real world browsers may not conform strictly to the RFC; I fail to see why that means that the server can't be in this case. I just don't see "totally broken" as an inevitable outcome. > For instance, treating form posts as being 7bit unless they have a > Content-Transfer-Encoding. The RFC says you should do that. Um, no. HTTP 1.1 specifically grants an exemption to that 7-bit restriction in MIME. The tricky part is that with web software you're dealing with a whole bunch of standards, and even ignoring W3C stuff, there's even a whole bunch of RFCs. Sometimes one RFC will override part of another; and that's what the HTTP RFC does to the MIME RFCs. Yes, its confusing and prone to interpretation errors. > But it's an > absolutely nonsensical thing to do. Your code would not work with any > existing web browser if you did. Or, if you're writing a web browser: don't > even think of using Content-Transfer-Encoding to encode your response. Again, the RFCs already account for that. In web software, the primary RFC is the HTTP 1.1 spec; not the MIME spec. This can be confusing because HTTP borrows say 90% of MIME, but overrides other parts of it. So I guess in a pedantic way, yes, this is not strictly "MIME". If it were you'd be dealing with email, not web. But in as much as its the parts of MIME that the HTTP spec says to use, it is still MIME. And the parts we're dealing with; the multipart/form-data type and what to do with the presence or absence of content-type headers on the subparts; well, that is pretty explicitly stated. >> Or it should be considered encoded "byte" string? > > I'd recommend that it should be, certainly at the lower levels. A higher > level API can look at the hints available to figure out how to decode the > non-file fields: e.g.: if the magic _charset_ parameter is present, use > that, otherwise use what the developer tells you they put in accept-charset > / what encoding they sent the page in. I don't think any library should be applying those heuristics. Hasn't everybody been annoyed by IE's content type sniffing heuristics; this would be the same idea but on the server side. Heuristics though may be a perfectly suitable thing for some applications to do. But you also have to remember that not all HTTP transactions involve browsers, or even HTML, and that deviations from the RFC should have explicit consequences in those cases in terms of a standard library. I think that perhaps allowing the application to provide an override (default content type) as input might be enough in this case; although even that could be argued. It might be sufficient that the library follow the RFC strictly; and well, if the posted data doesn't follow the spec we raise an error along with the original byte string and let the application deal with it. An override is I think a reasonable compromise to allow one to deal with real-world non-conforming browsers; while not throwing out the RFC or adding complex fragile heuristics into the library. You certainly don't want to break when/if you get a user agent that DOES follow the RFC. -- Deron Meranda