From jeffpeery at yahoo.com Mon Nov 5 02:14:31 2007 From: jeffpeery at yahoo.com (Jeff Peery) Date: Sun, 4 Nov 2007 17:14:31 -0800 (PST) Subject: [Web-SIG] how to post from a cgi script and not a html form?? Message-ID: <130077.49372.qm@web43134.mail.sp1.yahoo.com> hello, I'm pretty new to using python on the web, I've got a bit of code that works pretty well to get form inputs and such. Now I need to post some info to a gateway service (for credit card processing) and then receive their response and do something with it. I can do this no problem... except that I'm not sure how to post my dictionary (name value pairs from form inputs i.e., credit card num, expire dates etc) from the cgi script. I had been using the html forms to submit data to the server, but now I need to do this from my cgi script. I think this is pretty straight forward but I didn't see anything in the cgi module. where do I start, or does anyone have some sample code? thanks!! Jeff __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20071104/6b20a670/attachment.htm From alex at puddlejumper.foxybanana.com Mon Nov 5 03:38:02 2007 From: alex at puddlejumper.foxybanana.com (Alex Botero-Lowry) Date: Sun, 4 Nov 2007 18:38:02 -0800 Subject: [Web-SIG] how to post from a cgi script and not a html form?? In-Reply-To: <130077.49372.qm@web43134.mail.sp1.yahoo.com> References: <130077.49372.qm@web43134.mail.sp1.yahoo.com> Message-ID: <20071105023801.GA99643@puddlejumper.foxybanana.com> On Sun, Nov 04, 2007 at 05:14:31PM -0800, Jeff Peery wrote: > hello, > I'm pretty new to using python on the web, I've got a bit of code > that works pretty well to get form inputs and such. Now I need to post > some info to a gateway service (for credit card processing) and then > receive their response and do something with it. I can do this no > problem... except that I'm not sure how to post my dictionary (name > value pairs from form inputs i.e., credit card num, expire dates etc) > from the cgi script. I had been using the html forms to submit data to > the server, but now I need to do this from my cgi script. I think this is > pretty straight forward but I didn't see anything in the cgi module. > where do I start, or does anyone have some sample code? thanks!! You'll need httplib which luckily come with the stdlib so no need to install anything. Something like this should get you going: conn = httplib.HTTPConnection(remote_server) values = '&'.join([ '%s=%s' % a for a in values.items() ]) headers={'Content-Type':'application/x-www-form-urlencoded'} conn.request(method, url, values, headers=headers) res = conn.getresponse() data = res.read() return (res.status, res.reason, output) the important bits here are our crappy makeshift application/x-www-form-urlencoded rncoder which is the values line and our setting of the content-type. We also need to make sure the method passed to conn.request is 'POST' or 'PUT' (almost certainly POST) as these are the only ones that accept a body. I think the cgi module may have a better way of doing the encoding, but i've never found it. Alex From graham.dumpleton at gmail.com Mon Nov 5 04:08:25 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Mon, 5 Nov 2007 14:08:25 +1100 Subject: [Web-SIG] how to post from a cgi script and not a html form?? In-Reply-To: <20071105023801.GA99643@puddlejumper.foxybanana.com> References: <130077.49372.qm@web43134.mail.sp1.yahoo.com> <20071105023801.GA99643@puddlejumper.foxybanana.com> Message-ID: <88e286470711041908k111932eegae11144f56643454@mail.gmail.com> On 05/11/2007, Alex Botero-Lowry wrote: > On Sun, Nov 04, 2007 at 05:14:31PM -0800, Jeff Peery wrote: > > hello, > > I'm pretty new to using python on the web, I've got a bit of code > > that works pretty well to get form inputs and such. Now I need to post > > some info to a gateway service (for credit card processing) and then > > receive their response and do something with it. I can do this no > > problem... except that I'm not sure how to post my dictionary (name > > value pairs from form inputs i.e., credit card num, expire dates etc) > > from the cgi script. I had been using the html forms to submit data to > > the server, but now I need to do this from my cgi script. I think this is > > pretty straight forward but I didn't see anything in the cgi module. > > where do I start, or does anyone have some sample code? thanks!! > > You'll need httplib which luckily come with the stdlib so no need to install > anything. > > Something like this should get you going: > > conn = httplib.HTTPConnection(remote_server) > values = '&'.join([ '%s=%s' % a for a in values.items() ]) >From memory, better off using urllib.urlencode() for this as it will properly quote and convert special characters. > headers={'Content-Type':'application/x-www-form-urlencoded'} > conn.request(method, url, values, headers=headers) > res = conn.getresponse() > data = res.read() > return (res.status, res.reason, output) > > the important bits here are our crappy makeshift > application/x-www-form-urlencoded rncoder which > is the values line and our setting of the content-type. We also > need to make sure the method passed to conn.request is 'POST' or > 'PUT' (almost certainly POST) as these are the only ones that accept > a body. I think the cgi module may have a better way of doing the > encoding, but i've never found it. > > Alex > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > From jeffpeery at yahoo.com Mon Nov 5 05:43:23 2007 From: jeffpeery at yahoo.com (Jeff Peery) Date: Sun, 4 Nov 2007 20:43:23 -0800 (PST) Subject: [Web-SIG] how to post from a cgi script and not a html form?? In-Reply-To: <88e286470711041908k111932eegae11144f56643454@mail.gmail.com> Message-ID: <740853.48341.qm@web43141.mail.sp1.yahoo.com> Thanks, thats a big help! only two things I don't understand well. when I create a http object with HTTPConnection() do I want this to be to my web host server (hostway.com) or to the server I'm posting to (authorize.net)? and what are the headers used for? again, thanks! Jeff Graham Dumpleton wrote: On 05/11/2007, Alex Botero-Lowry wrote: > On Sun, Nov 04, 2007 at 05:14:31PM -0800, Jeff Peery wrote: > > hello, > > I'm pretty new to using python on the web, I've got a bit of code > > that works pretty well to get form inputs and such. Now I need to post > > some info to a gateway service (for credit card processing) and then > > receive their response and do something with it. I can do this no > > problem... except that I'm not sure how to post my dictionary (name > > value pairs from form inputs i.e., credit card num, expire dates etc) > > from the cgi script. I had been using the html forms to submit data to > > the server, but now I need to do this from my cgi script. I think this is > > pretty straight forward but I didn't see anything in the cgi module. > > where do I start, or does anyone have some sample code? thanks!! > > You'll need httplib which luckily come with the stdlib so no need to install > anything. > > Something like this should get you going: > > conn = httplib.HTTPConnection(remote_server) > values = '&'.join([ '%s=%s' % a for a in values.items() ]) >From memory, better off using urllib.urlencode() for this as it will properly quote and convert special characters. > headers={'Content-Type':'application/x-www-form-urlencoded'} > conn.request(method, url, values, headers=headers) > res = conn.getresponse() > data = res.read() > return (res.status, res.reason, output) > > the important bits here are our crappy makeshift > application/x-www-form-urlencoded rncoder which > is the values line and our setting of the content-type. We also > need to make sure the method passed to conn.request is 'POST' or > 'PUT' (almost certainly POST) as these are the only ones that accept > a body. I think the cgi module may have a better way of doing the > encoding, but i've never found it. > > Alex > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20071104/9f4bf088/attachment.htm From alex at puddlejumper.foxybanana.com Mon Nov 5 06:10:08 2007 From: alex at puddlejumper.foxybanana.com (Alex Botero-Lowry) Date: Sun, 4 Nov 2007 21:10:08 -0800 Subject: [Web-SIG] how to post from a cgi script and not a html form?? In-Reply-To: <740853.48341.qm@web43141.mail.sp1.yahoo.com> References: <88e286470711041908k111932eegae11144f56643454@mail.gmail.com> <740853.48341.qm@web43141.mail.sp1.yahoo.com> Message-ID: <20071105051008.GB99643@puddlejumper.foxybanana.com> On Sun, Nov 04, 2007 at 08:43:23PM -0800, Jeff Peery wrote: > Thanks, thats a big help! > > only two things I don't understand well. when I create a http object with HTTPConnection() do I want this to be to my web host server (hostway.com) or to the server I'm posting to (authorize.net)? > The server you are posting to, it's just basically an HTTP client. > and what are the headers used for? > It tells it that the content type is application/x-www-form-urlencoded, which will be checked for on the remote side. Alex > again, thanks! > > Jeff > > Graham Dumpleton wrote: > On 05/11/2007, Alex Botero-Lowry wrote: > > On Sun, Nov 04, 2007 at 05:14:31PM -0800, Jeff Peery wrote: > > > hello, > > > I'm pretty new to using python on the web, I've got a bit of code > > > that works pretty well to get form inputs and such. Now I need to post > > > some info to a gateway service (for credit card processing) and then > > > receive their response and do something with it. I can do this no > > > problem... except that I'm not sure how to post my dictionary (name > > > value pairs from form inputs i.e., credit card num, expire dates etc) > > > from the cgi script. I had been using the html forms to submit data to > > > the server, but now I need to do this from my cgi script. I think this is > > > pretty straight forward but I didn't see anything in the cgi module. > > > where do I start, or does anyone have some sample code? thanks!! > > > > You'll need httplib which luckily come with the stdlib so no need to install > > anything. > > > > Something like this should get you going: > > > > conn = httplib.HTTPConnection(remote_server) > > values = '&'.join([ '%s=%s' % a for a in values.items() ]) > > >From memory, better off using urllib.urlencode() for this as it will > properly quote and convert special characters. > > > headers={'Content-Type':'application/x-www-form-urlencoded'} > > conn.request(method, url, values, headers=headers) > > res = conn.getresponse() > > data = res.read() > > return (res.status, res.reason, output) > > > > the important bits here are our crappy makeshift > > application/x-www-form-urlencoded rncoder which > > is the values line and our setting of the content-type. We also > > need to make sure the method passed to conn.request is 'POST' or > > 'PUT' (almost certainly POST) as these are the only ones that accept > > a body. I think the cgi module may have a better way of doing the > > encoding, but i've never found it. > > > > Alex > > _______________________________________________ > > Web-SIG mailing list > > Web-SIG at python.org > > Web SIG: http://www.python.org/sigs/web-sig > > Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/alex%40foxybanana.com From mikeal at osafoundation.org Mon Nov 5 21:04:02 2007 From: mikeal at osafoundation.org (Mikeal Rogers) Date: Mon, 5 Nov 2007 12:04:02 -0800 Subject: [Web-SIG] Windmill -- Automated WebUI testing framework Message-ID: <1C3BEAF3-D9FC-4D38-B72B-CDE2505DA9E7@osafoundation.org> The QA Developers at OSAF have been working for some time on Windmill, a framework for complete automated WebUI testing across all target browsers and all target operating systems. It is course 100% open source ( Apache 2 License ) and maintained at the Open Source Applications Foundation where it is used to test the Chandler Server web interface. Windmill is implemented in Python (we're a heavy consumer of WSGI ) and JavaScript. We just reached 0.2.6 and think it's stable enough for the web world at large. Tomorrow we'll be hosting an IRC Sprint in #windmill of irc.freenode.org from 10am to 5pm PST. Please come and join us, we'd like to gather your feedback and help with any issues you might encounter from install to continuous integration. http://windmill.osafoundation.org Hope to see you tomorrow. -Mikeal From ianb at colorstudy.com Sat Nov 10 19:05:06 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 10 Nov 2007 12:05:06 -0600 Subject: [Web-SIG] [Paste] GeneratorExit In-Reply-To: <4735B65B.6090909@nwsnet.de> References: <4735B65B.6090909@nwsnet.de> Message-ID: <4735F2D2.8070604@colorstudy.com> Jochen Kupperschmidt wrote: > Hi Ian, > > when using your Paste suite and its HTTP server, I sporadically come > across a traceback related to a GeneratorExit. It does not seem to break > stuff, but it confuses me and fills up my log. > > I put the traceback, together with some description and related links > that might help examining and fixing it, at > http://paste.pocoo.org/show/9976/ > It should be easy to fix, as far as I can tell. Please let me know what > you think. I'm guessing this is some interaction between the extensions to the generator protocol in Python 2.5, and its overlap with app_iter.close() in WSGI. I'm not sure what the proper behavior here is. Just swallow the error? Maybe PJE has an idea of what should happen here. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From pje at telecommunity.com Wed Nov 14 14:46:22 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 14 Nov 2007 08:46:22 -0500 Subject: [Web-SIG] [Paste] GeneratorExit In-Reply-To: <4735F2D2.8070604@colorstudy.com> References: <4735B65B.6090909@nwsnet.de> <4735F2D2.8070604@colorstudy.com> Message-ID: <20071114134621.F36833A40AE@sparrow.telecommunity.com> At 12:05 PM 11/10/2007 -0600, Ian Bicking wrote: >Jochen Kupperschmidt wrote: > > Hi Ian, > > > > when using your Paste suite and its HTTP server, I sporadically come > > across a traceback related to a GeneratorExit. It does not seem to break > > stuff, but it confuses me and fills up my log. > > > > I put the traceback, together with some description and related links > > that might help examining and fixing it, at > > http://paste.pocoo.org/show/9976/ > > It should be easy to fix, as far as I can tell. Please let me know what > > you think. > >I'm guessing this is some interaction between the extensions to the >generator protocol in Python 2.5, and its overlap with app_iter.close() >in WSGI. I'm not sure what the proper behavior here is. Just swallow >the error? Maybe PJE has an idea of what should happen here. What should happen here is that the person who wrote the generator such that it catches and ignores GeneratorExit needs to fix it. The error shown in that traceback is: "RuntimeError: generator ignored GeneratorExit" Which means it's the generator that's broken. It's presumably got a try/except block that either doesn't re-raise the error, or that contains a yield. From MDiPierro at cti.depaul.edu Thu Nov 15 00:08:55 2007 From: MDiPierro at cti.depaul.edu (Massimo Di Pierro) Date: Wed, 14 Nov 2007 17:08:55 -0600 Subject: [Web-SIG] Gluon 1.12 Message-ID: <14F760D1-5BE7-4265-831A-5429D44099E8@cti.depaul.edu> Hello everybody. Just wanted to let you know that Gluon 1.12 (GPL2) is out with lots of new stuff: better database administrative interface, JSON, CSV, RTF, RSS, etc. (find examples in the web page) http://mdp.cti.depaul.edu We also have a google group: http://groups.google.com/group/gluon?hl=en a wiki: http://www.bithawk.net/cgi-bin/moin.cgi/GluonNotes a youtube video: http://www.youtube.com/watch?v=VBjja6N6IYk and a cookbook tutorial: http://mdp.cti.depaul.edu/examples/static/ cookbook.pdf and a sample controller for registration/authentication: http:// gluon.googlegroups.com/web/identity.py Notice: The Linux version requires Python2.5 and sqlite3 or postgresql. The Windows and Mac binary version should have no dependencies. Thanks to everybody who is contributing. Massimo From manlio_perillo at libero.it Fri Nov 16 21:16:07 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 16 Nov 2007 21:16:07 +0100 Subject: [Web-SIG] about WSGI adoption Message-ID: <473DFA87.402@libero.it> In these days I have to install a Trac instance. Trac needs a TRAC_ENV variable, and it seems that the only ways to set this variable is to: 1) Set the TRAC_ENV environment variable (CGI) 2) Use the TracEnv mod_wsgi option I found it remarkable that Trac does not has support for WSGI (as an example defining a `trac.trac_env` WSGI variable). This problem is not only present in Trac; Mercurial too uses enviroment variables only (as far as I know). Moreover Trac and Mercurial have a .cgi and a .fcgi script, but not a .wsgi script. What's the cause of this? Regards Manlio Perillo From manlio_perillo at libero.it Fri Nov 16 21:26:17 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 16 Nov 2007 21:26:17 +0100 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <473DFA87.402@libero.it> References: <473DFA87.402@libero.it> Message-ID: <473DFCE9.4010608@libero.it> Manlio Perillo ha scritto: > In these days I have to install a Trac instance. > > Trac needs a TRAC_ENV variable, and it seems that the only ways to set > this variable is to: > 1) Set the TRAC_ENV environment variable (CGI) > 2) Use the TracEnv mod_wsgi option > > > I found it remarkable that Trac does not has support for WSGI (as an > example defining a `trac.trac_env` WSGI variable). > My bad, Trac *has* such a variable: trac.env_path Unfortunately this is not documented. In fact http://trac.edgewall.org/wiki/TracModWSGI suggests to set the environment in the application script. Manlio Perillo From graham.dumpleton at gmail.com Fri Nov 16 22:47:05 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 17 Nov 2007 08:47:05 +1100 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <473DFCE9.4010608@libero.it> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> Message-ID: <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> On 17/11/2007, Manlio Perillo wrote: > Manlio Perillo ha scritto: > > In these days I have to install a Trac instance. > > > > Trac needs a TRAC_ENV variable, and it seems that the only ways to set > > this variable is to: > > 1) Set the TRAC_ENV environment variable (CGI) > > 2) Use the TracEnv mod_wsgi option > > > > > > I found it remarkable that Trac does not has support for WSGI (as an > > example defining a `trac.trac_env` WSGI variable). > > > > My bad, Trac *has* such a variable: > trac.env_path > > > Unfortunately this is not documented. > In fact http://trac.edgewall.org/wiki/TracModWSGI suggests to set the > environment in the application script. FWIW, the Trac instructions on the Apache mod_wsgi site go into a lot more detail than those on the Trac site, including mentioning all the WSGI environment specific options. Graham From manlio_perillo at libero.it Sun Nov 18 18:59:50 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sun, 18 Nov 2007 18:59:50 +0100 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> Message-ID: <47407D96.6000904@libero.it> Graham Dumpleton ha scritto: > On 17/11/2007, Manlio Perillo wrote: >> Manlio Perillo ha scritto: >>> In these days I have to install a Trac instance. >>> >>> Trac needs a TRAC_ENV variable, and it seems that the only ways to set >>> this variable is to: >>> 1) Set the TRAC_ENV environment variable (CGI) >>> 2) Use the TracEnv mod_wsgi option >>> >>> >>> I found it remarkable that Trac does not has support for WSGI (as an >>> example defining a `trac.trac_env` WSGI variable). >>> >> My bad, Trac *has* such a variable: >> trac.env_path >> >> >> Unfortunately this is not documented. >> In fact http://trac.edgewall.org/wiki/TracModWSGI suggests to set the >> environment in the application script. > > FWIW, the Trac instructions on the Apache mod_wsgi site go into a lot > more detail than those on the Trac site, including mentioning all the > WSGI environment specific options. > > Graham > Thanks, very good guide. However I still consider remarkable that there is not a "trac.wsgi" script. Can this be caused by the lack of a standardized deployment of WSGI applications? Regards Manlio Perillo From titus at caltech.edu Sun Nov 18 20:05:39 2007 From: titus at caltech.edu (Titus Brown) Date: Sun, 18 Nov 2007 11:05:39 -0800 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <47407D96.6000904@libero.it> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> <47407D96.6000904@libero.it> Message-ID: <20071118190539.GB4173@caltech.edu> -> Thanks, very good guide. -> -> -> However I still consider remarkable that there is not a "trac.wsgi" script. -> -> Can this be caused by the lack of a standardized deployment of WSGI -> applications? What would a trac.wsgi script contain? WSGI is a programming interface, not a script interface like CGI. Are you talking about a paste-compatible script or some such? cheers, --titus From manlio_perillo at libero.it Sun Nov 18 21:03:23 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sun, 18 Nov 2007 21:03:23 +0100 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <20071118190539.GB4173@caltech.edu> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> <47407D96.6000904@libero.it> <20071118190539.GB4173@caltech.edu> Message-ID: <47409A8B.60100@libero.it> Titus Brown ha scritto: > -> Thanks, very good guide. > -> > -> > -> However I still consider remarkable that there is not a "trac.wsgi" script. > -> > -> Can this be caused by the lack of a standardized deployment of WSGI > -> applications? > > What would a trac.wsgi script contain? import trac.web.main application = trac.web.main.dispatch_request > WSGI is a programming interface, > not a script interface like CGI. > Right, but a WSGI server/gateway just needs a simple script to execute the WSGI application. > Are you talking about a paste-compatible script or some such? > No. Regards Manlio Perillo From titus at caltech.edu Sun Nov 18 21:10:22 2007 From: titus at caltech.edu (Titus Brown) Date: Sun, 18 Nov 2007 12:10:22 -0800 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <47409A8B.60100@libero.it> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> <47407D96.6000904@libero.it> <20071118190539.GB4173@caltech.edu> <47409A8B.60100@libero.it> Message-ID: <20071118201021.GC25792@caltech.edu> On Sun, Nov 18, 2007 at 09:03:23PM +0100, Manlio Perillo wrote: -> Titus Brown ha scritto: -> > -> -> > -> However I still consider remarkable that there is not a "trac.wsgi" script. -> > -> Can this be caused by the lack of a standardized deployment of WSGI -> > -> applications? -> > -> > What would a trac.wsgi script contain? -> -> import trac.web.main -> -> application = trac.web.main.dispatch_request So this is something that can be 'execfile'd, I guess... -> > WSGI is a programming interface, -> > not a script interface like CGI. -> -> Right, but a WSGI server/gateway just needs a simple script to execute -> the WSGI application. That might be useful for some WSGI deployment techniques and less useful for others. For example, if you're using an SCGI-based WSGI server, you need a command-line executable; for mod_python, you probably need an importable module with a function; for CGI, you need a CGI script; etc. So I think you're talking about something that is very specific to your own deployment technique. This is out of the scope of the WSGI proposal, for good reasons -- there are many ways of configuring and deploying WSGI apps and I don't know that we've settled on only one way. Paste is an effort to standardize deployment of WSGI applications, I think. cheers, --titus From manlio_perillo at libero.it Sun Nov 18 22:56:01 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sun, 18 Nov 2007 22:56:01 +0100 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <20071118201021.GC25792@caltech.edu> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> <47407D96.6000904@libero.it> <20071118190539.GB4173@caltech.edu> <47409A8B.60100@libero.it> <20071118201021.GC25792@caltech.edu> Message-ID: <4740B4F1.2020309@libero.it> Titus Brown ha scritto: > On Sun, Nov 18, 2007 at 09:03:23PM +0100, Manlio Perillo wrote: > -> Titus Brown ha scritto: > -> > -> > -> > -> However I still consider remarkable that there is not a "trac.wsgi" script. > -> > -> Can this be caused by the lack of a standardized deployment of WSGI > -> > -> applications? > -> > > -> > What would a trac.wsgi script contain? > -> > -> import trac.web.main > -> > -> application = trac.web.main.dispatch_request > > So this is something that can be 'execfile'd, I guess... > No. It provides an application callable that the WSGI gateway/server can execute. > -> > WSGI is a programming interface, > -> > not a script interface like CGI. > -> > -> Right, but a WSGI server/gateway just needs a simple script to execute > -> the WSGI application. > > That might be useful for some WSGI deployment techniques and less useful > for others. For example, if you're using an SCGI-based WSGI server, you > need a command-line executable; This is not fully correct. The sample script I have posted can be used by a SCGI-based WSGI server too. I think that the "deployment" must be done by the WSGI gateway/server and not by the application. That is, the "application" should only expose the callable object, and should not "start a server", opening logging and configuration files, or stacking middlewares. > for mod_python, you probably need an > importable module with a function; for CGI, you need a CGI script; etc. > So I think you're talking about something that is very specific to your > own deployment technique. This is out of the scope of the WSGI > proposal, for good reasons -- there are many ways of configuring and > deploying WSGI apps and I don't know that we've settled on only one way. > Right. But in the WSGI spec there is a propose to standardize a deployment method. As an example, WSGI says nothing about what happens when an application module is imported (and the Python application process is created). It can be useful if the gateway can execute an init_application(enviroment) function, where environment contains the same objects of the request enviroment, excluding the HTTP headers and the input object, and with a separate errors object. Logging is another thing that should be clarified. How should an application do logging? As an example for a WSGI gateway embedded in an existing server (like Apache and Nginx) it can be useful and convenient to keep logging in an unique log file. And if the server logging system uses "log levels", this should be usable by the WSGI application. The same is valid for application configuration. > Paste is an effort to standardize d eployment of WSGI applications, I > think. > Regards Manlio Perillo From pje at telecommunity.com Sun Nov 18 23:51:54 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 18 Nov 2007 17:51:54 -0500 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <47409A8B.60100@libero.it> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> <47407D96.6000904@libero.it> <20071118190539.GB4173@caltech.edu> <47409A8B.60100@libero.it> Message-ID: <20071118225157.E04B23A405E@sparrow.telecommunity.com> At 09:03 PM 11/18/2007 +0100, Manlio Perillo wrote: >Titus Brown ha scritto: > > -> Thanks, very good guide. > > -> > > -> > > -> However I still consider remarkable that there is not a > "trac.wsgi" script. > > -> > > -> Can this be caused by the lack of a standardized deployment of WSGI > > -> applications? > > > > What would a trac.wsgi script contain? > >import trac.web.main > >application = trac.web.main.dispatch_request What's the point of that, if it's already importable from trac.web.main? >Right, but a WSGI server/gateway just needs a simple script to execute >the WSGI application. Or it can just import an object from a module. No script needed. From graham.dumpleton at gmail.com Sun Nov 18 23:56:01 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Mon, 19 Nov 2007 09:56:01 +1100 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <4740B4F1.2020309@libero.it> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> <47407D96.6000904@libero.it> <20071118190539.GB4173@caltech.edu> <47409A8B.60100@libero.it> <20071118201021.GC25792@caltech.edu> <4740B4F1.2020309@libero.it> Message-ID: <88e286470711181456u7c4ff677xa25da44c66b355cf@mail.gmail.com> On 19/11/2007, Manlio Perillo wrote: > Titus Brown ha scritto: > > On Sun, Nov 18, 2007 at 09:03:23PM +0100, Manlio Perillo wrote: > > -> Titus Brown ha scritto: > > -> > -> > > -> > -> However I still consider remarkable that there is not a "trac.wsgi" script. > > -> > -> Can this be caused by the lack of a standardized deployment of WSGI > > -> > -> applications? > > -> > > > -> > What would a trac.wsgi script contain? > > -> > > -> import trac.web.main > > -> > > -> application = trac.web.main.dispatch_request > > > > So this is something that can be 'execfile'd, I guess... > > > > No. > It provides an application callable that the WSGI gateway/server can > execute. > > > -> > WSGI is a programming interface, > > -> > not a script interface like CGI. > > -> > > -> Right, but a WSGI server/gateway just needs a simple script to execute > > -> the WSGI application. > > > > That might be useful for some WSGI deployment techniques and less useful > > for others. For example, if you're using an SCGI-based WSGI server, you > > need a command-line executable; > > This is not fully correct. > The sample script I have posted can be used by a SCGI-based WSGI server too. > > I think that the "deployment" must be done by the WSGI gateway/server > and not by the application. > > That is, the "application" should only expose the callable object, and > should not "start a server", opening logging and configuration files, or > stacking middlewares. This would require the WSGI adapter layer to encompass the means of loading the script file (as Python module) when required the first time. The only thing that really does it that way at present is mod_wsgi. Current CGI-WSGI adapters expect the WSGI application entry point to effectively be in the same file as the main for the CGI script. Ie., #!/usr/bin/python def application(environ, start_response): status = '200 OK' output = 'Hello World!\n' response_headers = [('Content-Type', 'text/plain'), ('Content-Length', str(len(output)))] start_response(status, response_headers) return [output] if __name__ == '__main__': from paste.script.cgi_server import run_with_cgi run_with_cgi(application) This doesn't mean though that you couldn't develop a CGI-WSGI adapter which separated the two parts, but not really easy to make it completely transparent. This is because you still have to at least create the CGI file which refers to the application script file in a different location. The Action directive in Apache can be made to make it a bit more transparent by mapping a .wsgi or .py extension to a single CGI script, with that script looking at the filename target of the request to work out actual application file to load. Similar thing could be done for FASTGCI and SCGI with Apache. Problem with this though is that using Action directive in Apache in this way looses the correct value for SCRIPT_NAME from memory. There is also no equivalent in other servers such as lighttpd and nginx. Anyway, hope this at least half illustrates that it isn't necessarily that simple to come up with one concept of having a single WSGI application script file which knows nothing about the means in which it is launched. In mod_wsgi it has made this as seamless as possible, but with other hosting mechanisms such as CGI, FASTCGI and SCGI where the WSGI adapter isn't actually embedded within the web server itself, but is within the process launched, it is much harder to make it transparent to the point where one could just throw a whole lot of WSGI application scripts in a directory and have it work. In Python based web servers it gets more complicated again as in that case it is the Python web server that is providing the top level URL mapping to a WSGI application entry point, whereas in Apache, Apache can do that automatically at least down to the initial entry point before things go into Python code. One could technically write a Python based web server whose top level URL to application mapping was file system based like Apache is, but most probably wouldn't see the point of it. > > for mod_python, you probably need an > > importable module with a function; for CGI, you need a CGI script; etc. > > So I think you're talking about something that is very specific to your > > own deployment technique. This is out of the scope of the WSGI > > proposal, for good reasons -- there are many ways of configuring and > > deploying WSGI apps and I don't know that we've settled on only one way. > > > > Right. > But in the WSGI spec there is a propose to standardize a deployment method. > > As an example, WSGI says nothing about what happens when an application > module is imported (and the Python application process is created). And it can't easily do so as the differences in hosting technology make it hard to come up with one system which would work for everything. For some ideas put up previously, see thread about Web Site Process bus in: http://mail.python.org/pipermail/web-sig/2007-June/thread.html Some of the things that make it difficult are multi process web servers, plus web servers that only load applications on demand and not at the start when the processes are started up. Some hosting technologies from memory allow a logical application to be stopped and started within the context of the same process, whereas others don't. So, where as atexit() may be a reasonable of doing shutdown actions for some hosting technologies, it isn't for others. > It can be useful if the gateway can execute an > > init_application(enviroment) > > function, where environment contains the same objects of the request > enviroment, excluding the HTTP headers and the input object, and with a > separate errors object. The closest you can probably get to portable application initialisation is for the application itself to track whether it has been called before and do something special if it hasn't. Even this is tricky because of multithreading issues. > Logging is another thing that should be clarified. > How should an application do logging? > > As an example for a WSGI gateway embedded in an existing server (like > Apache and Nginx) it can be useful and convenient to keep logging in an > unique log file. > And if the server logging system uses "log levels", this should be > usable by the WSGI application. There is always the Python 'logging' module. Where things get interesting with this is how to configure the logging. In Pylons, provided you use 'paster', it will note that the .ini file mentions 'loggers' and so will push the config automatically to the 'logging' module. Run a Pylons application under mod_wsgi though and this doesn't happen so Pylons logging doesn't work. Thus need to make the magic Pylons call to get it to push the config to the 'logging' module manually. Use of log levels is almost impossible. If using CGI your only logging mechanism is sys.stderr and that gets logged as ERR in Apache. Same for mod_wsgi, and similar for SCGI and FASTCGI I think. In mod_python its sys.stderr is broken in that output isn't automatically flush. Yes WSGI specification says that error output needs to be flushed to ensure it is displayed, but usually isn't done. > The same is valid for application configuration. And you will probably never get everyone to agree on that. The whole thing with WSGI was that it defined as little as possible so it left enough room for people to experiment with how to do all the other issues. I doubt you will ever seem a single solution, instead, you will though see different ways come together into a number of different frameworks. (or no frameworks). Overall, that probably isn't a bad thing. Graham From manlio_perillo at libero.it Mon Nov 19 13:06:07 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 19 Nov 2007 13:06:07 +0100 Subject: [Web-SIG] about WSGI adoption In-Reply-To: <88e286470711181456u7c4ff677xa25da44c66b355cf@mail.gmail.com> References: <473DFA87.402@libero.it> <473DFCE9.4010608@libero.it> <88e286470711161347v1c79f061i127214c509f94547@mail.gmail.com> <47407D96.6000904@libero.it> <20071118190539.GB4173@caltech.edu> <47409A8B.60100@libero.it> <20071118201021.GC25792@caltech.edu> <4740B4F1.2020309@libero.it> <88e286470711181456u7c4ff677xa25da44c66b355cf@mail.gmail.com> Message-ID: <47417C2F.3070008@libero.it> Graham Dumpleton ha scritto: > [...] >> I think that the "deployment" must be done by the WSGI gateway/server >> and not by the application. >> >> That is, the "application" should only expose the callable object, and >> should not "start a server", opening logging and configuration files, or >> stacking middlewares. > > This would require the WSGI adapter layer to encompass the means of > loading the script file (as Python module) when required the first > time. The only thing that really does it that way at present is > mod_wsgi. > Right. > Current CGI-WSGI adapters expect the WSGI application entry point to > effectively be in the same file as the main for the CGI script. Ie., > Ok. > [...] > > Anyway, hope this at least half illustrates that it isn't necessarily > that simple to come up with one concept of having a single WSGI > application script file which knows nothing about the means in which > it is launched. In mod_wsgi it has made this as seamless as possible, > but with other hosting mechanisms such as CGI, FASTCGI and SCGI where > the WSGI adapter isn't actually embedded within the web server itself, > but is within the process launched, it is much harder to make it > transparent to the point where one could just throw a whole lot of > WSGI application scripts in a directory and have it work. > Not sure here. As an example, in the trac.fcgi example, the code that run the server can be moved to a separate file. It is true, however, that this make things more complicated, but maybe one can write a generic flup server "launcher" script: flup_run -p 4030 -b 127.0.0.1 -.script=/usr/local/bin/myapp.wsgi \ --application=application --protocol=fastcgi --daemon \ --user=x --group=x --log=/var/log/myapp.log > [...] > >> As an example, WSGI says nothing about what happens when an application >> module is imported (and the Python application process is created). > > And it can't easily do so as the differences in hosting technology > make it hard to come up with one system which would work for > everything. For some ideas put up previously, see thread about Web > Site Process bus in: > > http://mail.python.org/pipermail/web-sig/2007-June/thread.html > Thanks for the link. However a function called at module first import should suffice, for now. > Some of the things that make it difficult are multi process web > servers, plus web servers that only load applications on demand and > not at the start when the processes are started up. The server can just execute the function when the module is imported (the problem is what should be done when the module script is reloaded in the same process). An application can execute startup code at module level, but a function is necessary since the application may need more informations from the web server (the log object, as an example). I don't see any problems with multiprocess web servers. > Some hosting > technologies from memory allow a logical application to be stopped and > started within the context of the same process, whereas others don't. > So, where as atexit() may be a reasonable of doing shutdown actions > for some hosting technologies, it isn't for others. > Ok. >> It can be useful if the gateway can execute an >> >> init_application(enviroment) >> >> function, where environment contains the same objects of the request >> enviroment, excluding the HTTP headers and the input object, and with a >> separate errors object. > > The closest you can probably get to portable application > initialisation is for the application itself to track whether it has > been called before and do something special if it hasn't. Even this is > tricky because of multithreading issues. > >> Logging is another thing that should be clarified. >> How should an application do logging? >> >> As an example for a WSGI gateway embedded in an existing server (like >> Apache and Nginx) it can be useful and convenient to keep logging in an >> unique log file. >> And if the server logging system uses "log levels", this should be >> usable by the WSGI application. > > There is always the Python 'logging' module. Where things get > interesting with this is how to configure the logging. Right, this is exactly the problem. But there is one more bigger problem: if I want to use the server logging (for Apache or Nginx) I have to use a non portable solution. > In Pylons, > provided you use 'paster', it will note that the .ini file mentions > 'loggers' and so will push the config automatically to the 'logging' > module. Run a Pylons application under mod_wsgi though and this > doesn't happen so Pylons logging doesn't work. This is the reason why I think is it necessary to standardize a deployment method. > Thus need to make the > magic Pylons call to get it to push the config to the 'logging' module > manually. Use of log levels is almost impossible. If using CGI your > only logging mechanism is sys.stderr and that gets logged as ERR in > Apache. Same for mod_wsgi, and similar for SCGI and FASTCGI I think. > In mod_python its sys.stderr is broken in that output isn't > automatically flush. Yes WSGI specification says that error output > needs to be flushed to ensure it is displayed, but usually isn't done. > Here is an idea. First of all, the wsgi.errors should have an additional log_level attribute. It must be an integer, and its value is not specified (but should use the log levels value from the standard logging module?). For mod_wsgi, we can add a log_level directive (instead of using a fixed value) and a log_level_map to map server error levels to Python logging levels. As an example (for nginx): wsgi_log_level NGX_LOG_INFO; wsgi_log_level_map 7:20; wsgi_log_level_order asc; The second directive maps NGX_LOG_INFO to logging.INFO and the last directive is necessary since in nginx the more criticals error have low values (but I'm not sure if this information is needed) A WSGI application now can setup the logging module: logging.basicConfig(level=wsgi.errors.log_level, format=%(levelname)s %(message)s', stream=wsgi.errors.) This is not perfect, since a log entry will be something like: 2007/10/29 19:41:09 [info] 29902#0: *1 CRITICAL ops That is: the log level is duplicated. There one more problem here: the log object *MUST* be stored in the wsgi dictionary, since it is defined on a per request basis. What happens if an "external" module in the application make use of the global logging object? And what should I do if, as an example, I use SQLAlchemy and want to enable logging for the connection pool? >> The same is valid for application configuration. > > And you will probably never get everyone to agree on that. > Options should be stored in the wsgi dictionary. For mod_wsgi, options can be set from the server configuration file, paste can instead read a config file and copy the values to the wsgi environment dictionary: [aaa] x = 2 { y = 5 'aaa.x': '2', ===> 'aaa.y': '5', [bbb] 'bbb.a': '1', a = 1 'bbb.b': 2 b = 2 } > The whole thing with WSGI was that it defined as little as possible so > it left enough room for people to experiment with how to do all the > other issues. I doubt you will ever seem a single solution, instead, > you will though see different ways come together into a number of > different frameworks. (or no frameworks). Overall, that probably isn't > a bad thing. > This is true, but this has some problems. Good logging is one of these problems, IMHO. Regards Manlio Perillo From manlio_perillo at libero.it Fri Nov 23 11:57:28 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 23 Nov 2007 11:57:28 +0100 Subject: [Web-SIG] again about logging in WSGI Message-ID: <4746B218.7040700@libero.it> Hi. As I have written in a previous thread, I would like to use nginx logging system in a WSGI application (of course the same is valid for Apache) A first problem is that the wsgi.errors stream defined in the environment dictionary is valid only for the current request, but I want to use a stream valid for the entire process lifetime. I think that there are two solutions: 1) call an application supplied `init_application(environ)` callable, where the environ dictionary contains the "right" wsgi.errors stream object 2) add to the environ dictionary a `wsgi.global_errors` stream object Any suggestions? Thanks Manlio Perillo From pje at telecommunity.com Fri Nov 23 13:18:27 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 23 Nov 2007 07:18:27 -0500 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4746B218.7040700@libero.it> References: <4746B218.7040700@libero.it> Message-ID: <20071123121832.D73723A40AC@sparrow.telecommunity.com> At 11:57 AM 11/23/2007 +0100, Manlio Perillo wrote: >Hi. > >As I have written in a previous thread, I would like to use nginx >logging system in a WSGI application (of course the same is valid for >Apache) > >A first problem is that the wsgi.errors stream defined in the >environment dictionary is valid only for the current request, but I want >to use a stream valid for the entire process lifetime. > >I think that there are two solutions: >1) call an application supplied `init_application(environ)` callable, > where the environ dictionary contains the "right" wsgi.errors stream > object >2) add to the environ dictionary a `wsgi.global_errors` stream object > > >Any suggestions? Yes: provide an 'nginx.global_errors' stream object, as a server-specific extension. From manlio_perillo at libero.it Fri Nov 23 13:25:27 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 23 Nov 2007 13:25:27 +0100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <20071123121832.D73723A40AC@sparrow.telecommunity.com> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> Message-ID: <4746C6B7.2040808@libero.it> Phillip J. Eby ha scritto: > At 11:57 AM 11/23/2007 +0100, Manlio Perillo wrote: >> Hi. >> >> As I have written in a previous thread, I would like to use nginx >> logging system in a WSGI application (of course the same is valid for >> Apache) >> >> A first problem is that the wsgi.errors stream defined in the >> environment dictionary is valid only for the current request, but I want >> to use a stream valid for the entire process lifetime. >> >> I think that there are two solutions: >> 1) call an application supplied `init_application(environ)` callable, >> where the environ dictionary contains the "right" wsgi.errors stream >> object >> 2) add to the environ dictionary a `wsgi.global_errors` stream object >> >> >> Any suggestions? > > Yes: provide an 'nginx.global_errors' stream object, as a > server-specific extension. > Ok, thanks. I think that I will use the `mod_wsgi` "namespace", since this same interface can be used by other WSGI gateway implementations embeded in a web server. By the way: any proposal for "standardize" common "namespaces"? Manlio Perillo From pje at telecommunity.com Fri Nov 23 14:09:57 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 23 Nov 2007 08:09:57 -0500 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4746C6B7.2040808@libero.it> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> <4746C6B7.2040808@libero.it> Message-ID: <20071123130958.1EBA63A40AC@sparrow.telecommunity.com> At 01:25 PM 11/23/2007 +0100, Manlio Perillo wrote: >Phillip J. Eby ha scritto: >>At 11:57 AM 11/23/2007 +0100, Manlio Perillo wrote: >>>Hi. >>> >>>As I have written in a previous thread, I would like to use nginx >>>logging system in a WSGI application (of course the same is valid for >>>Apache) >>> >>>A first problem is that the wsgi.errors stream defined in the >>>environment dictionary is valid only for the current request, but I want >>>to use a stream valid for the entire process lifetime. >>> >>>I think that there are two solutions: >>>1) call an application supplied `init_application(environ)` callable, >>> where the environ dictionary contains the "right" wsgi.errors stream >>> object >>>2) add to the environ dictionary a `wsgi.global_errors` stream object >>> >>> >>>Any suggestions? >>Yes: provide an 'nginx.global_errors' stream object, as a >>server-specific extension. > >Ok, thanks. > >I think that I will use the `mod_wsgi` "namespace", since this same >interface can be used by other WSGI gateway implementations embeded >in a web server. Er, no, that's precisely why you should NOT use that namespace. That goes against the very reason for having namespaces in the first place -- to ensure that each project is free to add its own extensions without colliding with those created by another project. >By the way: any proposal for "standardize" common "namespaces"? Yes: use your own private namespaces for anything you create. Once you've implemented your extension under your private name, and published a spec for it, *then*, if other people commit to implementing that spec, then it can begin the process for getting a wsgi.org standardized name. Until then, however, extensions must be kept in a private, project-specific namespace, as per the WSGI spec. From graham.dumpleton at gmail.com Sat Nov 24 02:14:16 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 24 Nov 2007 12:14:16 +1100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4746C6B7.2040808@libero.it> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> <4746C6B7.2040808@libero.it> Message-ID: <88e286470711231714k3805c74eq82ac31092a18e642@mail.gmail.com> On 23/11/2007, Manlio Perillo wrote: > Phillip J. Eby ha scritto: > > At 11:57 AM 11/23/2007 +0100, Manlio Perillo wrote: > >> Hi. > >> > >> As I have written in a previous thread, I would like to use nginx > >> logging system in a WSGI application (of course the same is valid for > >> Apache) > >> > >> A first problem is that the wsgi.errors stream defined in the > >> environment dictionary is valid only for the current request, but I want > >> to use a stream valid for the entire process lifetime. > >> > >> I think that there are two solutions: > >> 1) call an application supplied `init_application(environ)` callable, > >> where the environ dictionary contains the "right" wsgi.errors stream > >> object > >> 2) add to the environ dictionary a `wsgi.global_errors` stream object > >> > >> > >> Any suggestions? > > > > Yes: provide an 'nginx.global_errors' stream object, as a > > server-specific extension. > > > > Ok, thanks. > > I think that I will use the `mod_wsgi` "namespace", since this same > interface can be used by other WSGI gateway implementations embeded in a > web server. Please don't use 'mod_wsgi', use 'nginx' as you originally said. There is already going to be enough confusion around you using the 'mod_wsgi' name when the Apache WSGI implementation came first. I really wish now that I had insisted you specifically call it 'nginx_wsgi' even though you based it on Apache mod_wsgi. To try to be compatible is one thing, to use the same name in a way that is confusing is just going to cause more and more problems down the track if nginx mod_wsgi does get to a point of being usuable. Whatever you do, please do not go releasing any distinct Python module/package called 'mod_wsgi' as the Apache mod_wsgi code is all set up around it being able to do that already, with the assumption that it has ownership of that namespace because it started using the name first. I am sure that others will possibly say that I shouldn't even be using it, but I am just following the tradition set by Apache in how it names its modules and it seems reasonable that I should be able to use that same name in Python module/package space much like mod_python already does. I would hope that Apache mod_wsgi is getting enough exposure now that it would be accepted that it has reasonable priority on the name. If people want to argue the point, then by all means suggest something else. At the moment, for Apache mod_wsgi 1.X it uses of the name is internal only and its use mainly symbolic, but in Apache mod_wsgi 2.0 release candidates it does extend outside of the process in as much as it will attempt to import a Python module/package of that name as a means of extending Apache mod_wsgi with additional optional extensions, albeit no such package of extensions has yet been publicly released. So, it isn't necessarily too late to change it. Note that 'apache' can't be used in place of 'mod_wsgi' for Python module/package name, as there is already Python code available using that for SWIG bindings for internal Apache APIs. Any suggestions on a consensus on how we resolve all this and avoid arguments down the track are more than welcome. Presuming that is that people want to object to Apache mod_wsgi assuming that it can use 'mod_wsgi' for the name of a Python module/package. :-) Graham From ianb at colorstudy.com Sat Nov 24 06:57:20 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 23 Nov 2007 23:57:20 -0600 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4746C6B7.2040808@libero.it> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> <4746C6B7.2040808@libero.it> Message-ID: <4747BD40.7070608@colorstudy.com> Manlio Perillo wrote: > By the way: any proposal for "standardize" common "namespaces"? Yes, see: http://wsgi.org/wsgi/Specifications Note that it also requires some enthusiasm from more than one person to actually move through this otherwise fairly casual process; several attempts to standardize pieces have kind of withered on the vine for lack of interest. So don't be too surprised if that happens. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From graham.dumpleton at gmail.com Sat Nov 24 08:40:56 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 24 Nov 2007 18:40:56 +1100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4746B218.7040700@libero.it> References: <4746B218.7040700@libero.it> Message-ID: <88e286470711232340l57480db9nd0f63a7fb161046d@mail.gmail.com> On 23/11/2007, Manlio Perillo wrote: > Hi. > > As I have written in a previous thread, I would like to use nginx > logging system in a WSGI application (of course the same is valid for > Apache) > > A first problem is that the wsgi.errors stream defined in the > environment dictionary is valid only for the current request, but I want > to use a stream valid for the entire process lifetime. > > I think that there are two solutions: > 1) call an application supplied `init_application(environ)` callable, > where the environ dictionary contains the "right" wsgi.errors stream > object > 2) add to the environ dictionary a `wsgi.global_errors` stream object > > Any suggestions? Getting back to what you were originally asking about, I don't really see why you need anything extra in the WSGI application environment. For starters, one would remap sys.stderr to send to the web server log system as level ERROR. This should include ensuring that output is flushed after each newline so that it appears promptly in the error logs. This does mean the wrapper has to do some buffering and be a bit more intelligent. Doing this remapping of sys.stderr ensures though that any modules which use it directly to output errors will still have their output go somewhere sensible. As for everything else, as suggested in a previous email, you would be better off basing anything else beyond sys.stderr and wsgi.errors off the Python 'logging' module. In doing this, nothing would be required in the WSGI environment. Any code would just log output using the 'logging' module. The magic missing bit would then be the WSGI adapter somehow making available a type of LogHandler class, eg., ApacheLogHandler, which can be referenced in configuration mechanism used to setup the 'logging' module. As necessary the LogHandler can be tied into the WSGI adapter dispatch mechanism so that it can know when something is logged in the context of a request handler, as opposed to at global scope on module import of WSGI application and correctly log details against the internal request structure, thereby enabling of client IP details against the request. In other words, it should be possible to do it all transparently without requiring extensions to be added to the WSGI environment. At most, may need a configuration option in the WSGI adapter to setup configuration of 'logging' module somehow. The benefit of this approach is that it is more easily portable to a different WSGI hosting environment. The most that would be required is a change in the 'logging' module configuration, no user code would need to be changed. Anyway, I'll have to think about it properly. I have been meaning to basically do just this for Apache mod_wsgi for a while, but just haven't got around to it. Graham From manlio_perillo at libero.it Sat Nov 24 10:02:42 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sat, 24 Nov 2007 10:02:42 +0100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <88e286470711231714k3805c74eq82ac31092a18e642@mail.gmail.com> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> <4746C6B7.2040808@libero.it> <88e286470711231714k3805c74eq82ac31092a18e642@mail.gmail.com> Message-ID: <4747E8B2.1080506@libero.it> Graham Dumpleton ha scritto: > [...] > > Please don't use 'mod_wsgi', use 'nginx' as you originally said. > > There is already going to be enough confusion around you using the > 'mod_wsgi' name when the Apache WSGI implementation came first. I > really wish now that I had insisted you specifically call it > 'nginx_wsgi' even though you based it on Apache mod_wsgi. T No, I called it mod_wsgi because it is "module wsgi". To reduce confusion, as an example, in my Mercurial repository, mod_wsgi is under the 'nginx' directory. I will try to make sure that the "nginx" prefix will always well visible. > o try to be > compatible is one thing, to use the same name in a way that is > confusing is just going to cause more and more problems down the track > if nginx mod_wsgi does get to a point of being usuable. > > Whatever you do, please do not go releasing any distinct Python > module/package called 'mod_wsgi' as the Apache mod_wsgi code is all > set up around it being able to do that already, with the assumption > that it has ownership of that namespace because it started using the > name first. But this is a "no problem" :). A WSGI application can be embedded in Apache *or* in nginx. I would like to use a common name/interface so applications can easily be ported from Nginx to Apache and viceversa. > [...] > > Any suggestions on a consensus on how we resolve all this and avoid > arguments down the track are more than welcome. Presuming that is that > people want to object to Apache mod_wsgi assuming that it can use > 'mod_wsgi' for the name of a Python module/package. :-) > > Graham > Manlio Perillo From manlio_perillo at libero.it Sat Nov 24 10:30:54 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sat, 24 Nov 2007 10:30:54 +0100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <88e286470711232340l57480db9nd0f63a7fb161046d@mail.gmail.com> References: <4746B218.7040700@libero.it> <88e286470711232340l57480db9nd0f63a7fb161046d@mail.gmail.com> Message-ID: <4747EF4E.8020506@libero.it> Graham Dumpleton ha scritto: > On 23/11/2007, Manlio Perillo wrote: >> Hi. >> >> As I have written in a previous thread, I would like to use nginx >> logging system in a WSGI application (of course the same is valid for >> Apache) >> >> A first problem is that the wsgi.errors stream defined in the >> environment dictionary is valid only for the current request, but I want >> to use a stream valid for the entire process lifetime. >> >> I think that there are two solutions: >> 1) call an application supplied `init_application(environ)` callable, >> where the environ dictionary contains the "right" wsgi.errors stream >> object >> 2) add to the environ dictionary a `wsgi.global_errors` stream object >> >> Any suggestions? > > Getting back to what you were originally asking about, I don't really > see why you need anything extra in the WSGI application environment. > > For starters, one would remap sys.stderr to send to the web server log > system as level ERROR. I have discarded any solution based on sys.stderr since the WSGI spec says nothing about the behaviour of stderr/stdout. But this seems a reasonable solution, thanks. > This should include ensuring that output is > flushed after each newline so that it appears promptly in the error > logs. This does mean the wrapper has to do some buffering and be a bit > more intelligent. Doing this remapping of sys.stderr ensures though > that any modules which use it directly to output errors will still > have their output go somewhere sensible. > > As for everything else, as suggested in a previous email, you would be > better off basing anything else beyond sys.stderr and wsgi.errors off > the Python 'logging' module. In doing this, nothing would be required > in the WSGI environment. Any code would just log output using the > 'logging' module. > This is what I want to do. But instead of setup the 'logging' module in nginx mod_wsgi, I would like to offer a minimal support so that this can be done by the WSGI application (or middleware). In this way, the application is not forced to use the standard logging system. Moreover I do not want to make nginx mod_wsgi too "complex". > The magic missing bit would then be the WSGI adapter somehow making > available a type of LogHandler class, eg., ApacheLogHandler, which can > be referenced in configuration mechanism used to setup the 'logging' > module. As necessary the LogHandler can be tied into the WSGI adapter > dispatch mechanism so that it can know when something is logged in the > context of a request handler, as opposed to at global scope on module > import of WSGI application and correctly log details against the > internal request structure, thereby enabling of client IP details > against the request. > Too complex, IMHO. I think that it is better to have two separate logging objects. wsgi.errors and sys.stderr, as an example. With these two objects a middleware/application can: - setup the global logging object using sys.stderr - setup(?) a logging object in the wsgi dictionary using wsgi.errors A first improvement is to add to the wsgi dictionary something like nginx.log_level (this must be done in the server configuration file, using, as an example, the `wsgi_param` directive) so that the middleware knows the log level used by the server. In this way, as an example, if the log level is 'NGX_ERROR' and a WSGI application do something like: log.debug("a debug message") this message does not will be written to the server log. A second improvement is to provide support, as you suggest, to a LogHandler. This too can be done by a middleware (IMHO), if the WSGI middleware expose an usable interface. However the first improvement is all I need, for now. > In other words, it should be possible to do it all transparently > without requiring extensions to be added to the WSGI environment. At > most, may need a configuration option in the WSGI adapter to setup > configuration of 'logging' module somehow. The benefit of this > approach is that it is more easily portable to a different WSGI > hosting environment. The most that would be required is a change in > the 'logging' module configuration, no user code would need to be > changed. > > Anyway, I'll have to think about it properly. I have been meaning to > basically do just this for Apache mod_wsgi for a while, but just > haven't got around to it. > > Graham > Manlio Perillo From graham.dumpleton at gmail.com Sat Nov 24 10:43:43 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 24 Nov 2007 20:43:43 +1100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4747E8B2.1080506@libero.it> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> <4746C6B7.2040808@libero.it> <88e286470711231714k3805c74eq82ac31092a18e642@mail.gmail.com> <4747E8B2.1080506@libero.it> Message-ID: <88e286470711240143g3ffaac66g98b30e4082ac500e@mail.gmail.com> On 24/11/2007, Manlio Perillo wrote: > Graham Dumpleton ha scritto: > > Whatever you do, please do not go releasing any distinct Python > > module/package called 'mod_wsgi' as the Apache mod_wsgi code is all > > set up around it being able to do that already, with the assumption > > that it has ownership of that namespace because it started using the > > name first. > > But this is a "no problem" :). > A WSGI application can be embedded in Apache *or* in nginx. > > I would like to use a common name/interface so applications can easily > be ported from Nginx to Apache and viceversa. I am not sure you understood what I meant. I already put out software called 'mod_wsgi'. This contains an Apache module only. I also already put out software called 'ap_swig_py'. The contents of this when compiled and installed, will create in Python site-packages directory a Python package called 'apache'. This contains SWIG bindings for internal Apache APIs. It is usable from WSGI applications running under Apache mod_wsgi. Technically the SWIG bindings are also usable from mod_python, giving it much better access to internal Apache APIs that it itself provides. I also am preparing some software called 'mod_wsgi_py'. The contents of this package when installed, will create in Python site-packages directory a corresponding Python package called 'mod_wsgi', matching the name of the Apache module. This package will contain a mix of generic WSGI components as well as some which will only work when used in the context of Apache mod_wsgi. In some cases the WSGI components will use a generic mechanism when not run under Apache mod_wsgi, but when Apache mod_wsgi is used a better more efficient mechanism will be used which hooks in to the internals of Apache to do stuff. When something hooks into Apache, it may be through using 'ap_swig_py' bindings, or through direct C extensions custom built for the purpose. However it is done, the intent is that it will be all transparent to the code using it. Examples of components that would be in the Python package installed by 'mod_wsgi_py' would be things like a WSGI component for performing a sub request back into the same Apache web server and then being able to filter the response just like it was returned from any other WSGI component. A component such as the ApacheLogHandler could also be placed in this package, with Apache mod_wsgi, when the corresponding Python mod_wsgi package was installed, triggering some initialisation code which would install that as the root handler for the 'logging' module, thus allowing the 'logging' module to log to Apache error logs with levels specified by user code. So, in other words, the 'mod_wsig_py' software would not be required to use Apache mod_wsgi, but if you have it present, you will have additional features. At the same time, you could use 'mod_wsgi_py' even if you aren't running under Apache mod_wsgi, but could use it in other WSGI hosting solutions, even under wsgiref server if you wanted to. It would either replace components with something equivalent that works outside of Apache, or the component wouldn't be available if no other choice. This would be useful for testing or development outside of Apache. My concerns are that if you were to separately produce a package which installs into site-packages a Python module or package called 'mod_wsgi' there will be a direct clash. The other concern is that if you have followed the Apache mod_wsgi approach of internally creating a 'mod_wsgi' Python module in sys.modules as a place holder and storing of version information etc, then anyone using the installed Python 'mod_wsgi' package from 'mod_wsgi_py' wouldn't be able to run their application with the nginx WSGI adapter. As your placeholder will prevent it from being imported. In latest Apache mod_wsgi I do various checks and only create the placeholder module if the real package in site-packages is present. Thus, to avoid problems you really want to be avoiding using the 'mod_wsgi' name in any Python context, whether that be in the names of Python modules or as a prefix in the WSGI environment passed to applications. Using the same name in your version as what Apache mod_wsgi uses isn't going to gain you anything and because there will inevitably be subtle differences or incompatibilities between the two, using the same naming schemes will just make it harder where user code does have to be able to distinguish between the two. Graham From manlio_perillo at libero.it Sat Nov 24 10:45:10 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sat, 24 Nov 2007 10:45:10 +0100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4747BD40.7070608@colorstudy.com> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> <4746C6B7.2040808@libero.it> <4747BD40.7070608@colorstudy.com> Message-ID: <4747F2A6.2040200@libero.it> Ian Bicking ha scritto: > Manlio Perillo wrote: >> By the way: any proposal for "standardize" common "namespaces"? > > Yes, see: http://wsgi.org/wsgi/Specifications > > Note that it also requires some enthusiasm from more than one person to > actually move through this otherwise fairly casual process; several > attempts to standardize pieces have kind of withered on the vine for > lack of interest. So don't be too surprised if that happens. > Lack of interest in sharing common solutions seems to be a distinguishing feature of the Python community :-). Maybe because it is so easy to write "good" software in the Python language? Manlio Perilo From manlio_perillo at libero.it Sat Nov 24 10:50:58 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sat, 24 Nov 2007 10:50:58 +0100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <88e286470711240143g3ffaac66g98b30e4082ac500e@mail.gmail.com> References: <4746B218.7040700@libero.it> <20071123121832.D73723A40AC@sparrow.telecommunity.com> <4746C6B7.2040808@libero.it> <88e286470711231714k3805c74eq82ac31092a18e642@mail.gmail.com> <4747E8B2.1080506@libero.it> <88e286470711240143g3ffaac66g98b30e4082ac500e@mail.gmail.com> Message-ID: <4747F402.1060800@libero.it> Graham Dumpleton ha scritto: > On 24/11/2007, Manlio Perillo wrote: >> Graham Dumpleton ha scritto: >>> Whatever you do, please do not go releasing any distinct Python >>> module/package called 'mod_wsgi' as the Apache mod_wsgi code is all >>> set up around it being able to do that already, with the assumption >>> that it has ownership of that namespace because it started using the >>> name first. >> But this is a "no problem" :). >> A WSGI application can be embedded in Apache *or* in nginx. >> >> I would like to use a common name/interface so applications can easily >> be ported from Nginx to Apache and viceversa. > > I am not sure you understood what I meant. > > I already put out software called 'mod_wsgi'. This contains an Apache > module only. > > I also already put out software called 'ap_swig_py'. The contents of > this when compiled and installed, will create in Python site-packages > directory a Python package called 'apache'. Ah, ok, sorry. No problem. I'm planning to release a software called `ngx_wsgi` for the same precise purpose as your mod_wsgi. > [...] Manlio Perillo From graham.dumpleton at gmail.com Sat Nov 24 11:16:16 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 24 Nov 2007 21:16:16 +1100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <4747EF4E.8020506@libero.it> References: <4746B218.7040700@libero.it> <88e286470711232340l57480db9nd0f63a7fb161046d@mail.gmail.com> <4747EF4E.8020506@libero.it> Message-ID: <88e286470711240216y81f5548p98e594ce0636e2c2@mail.gmail.com> On 24/11/2007, Manlio Perillo wrote: > > As for everything else, as suggested in a previous email, you would be > > better off basing anything else beyond sys.stderr and wsgi.errors off > > the Python 'logging' module. In doing this, nothing would be required > > in the WSGI environment. Any code would just log output using the > > 'logging' module. > > > > This is what I want to do. > But instead of setup the 'logging' module in nginx mod_wsgi, I would > like to offer a minimal support so that this can be done by the WSGI > application (or middleware). > > In this way, the application is not forced to use the standard logging > system. > Moreover I do not want to make nginx mod_wsgi too "complex". > > > The magic missing bit would then be the WSGI adapter somehow making > > available a type of LogHandler class, eg., ApacheLogHandler, which can > > be referenced in configuration mechanism used to setup the 'logging' > > module. As necessary the LogHandler can be tied into the WSGI adapter > > dispatch mechanism so that it can know when something is logged in the > > context of a request handler, as opposed to at global scope on module > > import of WSGI application and correctly log details against the > > internal request structure, thereby enabling of client IP details > > against the request. > > Too complex, IMHO. Actually, if you don't try and bind it back to a specific request, and just use the main error log file, it is actually quite easy. > I think that it is better to have two separate logging objects. > wsgi.errors and sys.stderr, as an example. Huh. These exist now. But you said you don't want to be using sys.stderr since WSGI PEP says nothing about it. > With these two objects a middleware/application can: > - setup the global logging object using sys.stderr > - setup(?) a logging object in the wsgi dictionary using wsgi.errors > > A first improvement is to add to the wsgi dictionary something like > nginx.log_level (this must be done in the server configuration file, > using, as an example, the `wsgi_param` directive) > > so that the middleware knows the log level used by the server. > > In this way, as an example, if the log level is 'NGX_ERROR' and a WSGI > application do something like: > log.debug("a debug message") > > this message does not will be written to the server log. I think I must be missing something about now nginx works. In Apache the user code doesn't make the decision as to whether it needs to log something or not. You log something, passing the notional log value that the code wants that message to be logged at. Internally, Apache will compare that notional log level to what it its threshold is and decide to allow it through or not. Thus, I don't understand why the log level threshold value which dictates what is filtered needs to be exposed in the Python code side of things. I guess I'll just have to wait until later to see what it is you are thinking of. :-) Graham From manlio_perillo at libero.it Sat Nov 24 11:53:09 2007 From: manlio_perillo at libero.it (Manlio Perillo) Date: Sat, 24 Nov 2007 11:53:09 +0100 Subject: [Web-SIG] again about logging in WSGI In-Reply-To: <88e286470711240216y81f5548p98e594ce0636e2c2@mail.gmail.com> References: <4746B218.7040700@libero.it> <88e286470711232340l57480db9nd0f63a7fb161046d@mail.gmail.com> <4747EF4E.8020506@libero.it> <88e286470711240216y81f5548p98e594ce0636e2c2@mail.gmail.com> Message-ID: <47480295.7060709@libero.it> Graham Dumpleton ha scritto: > [...] >> Too complex, IMHO. > > Actually, if you don't try and bind it back to a specific request, and > just use the main error log file, it is actually quite easy. > >> I think that it is better to have two separate logging objects. >> wsgi.errors and sys.stderr, as an example. > > Huh. These exist now. But you said you don't want to be using > sys.stderr since WSGI PEP says nothing about it. > No, sys.stderr is fine. >> With these two objects a middleware/application can: >> - setup the global logging object using sys.stderr >> - setup(?) a logging object in the wsgi dictionary using wsgi.errors >> >> A first improvement is to add to the wsgi dictionary something like >> nginx.log_level (this must be done in the server configuration file, >> using, as an example, the `wsgi_param` directive) >> >> so that the middleware knows the log level used by the server. >> >> In this way, as an example, if the log level is 'NGX_ERROR' and a WSGI >> application do something like: >> log.debug("a debug message") >> >> this message does not will be written to the server log. > > I think I must be missing something about now nginx works. In Apache > the user code doesn't make the decision as to whether it needs to log > something or not. You log something, passing the notional log value > that the code wants that message to be logged at. Internally, Apache > will compare that notional log level to what it its threshold is and > decide to allow it through or not. > It is the same with Nginx. > Thus, I don't understand why the log level threshold value which > dictates what is filtered needs to be exposed in the Python code side > of things. > I have posted an example. Let's suppose that the log levels in the server are: NGX_LOG_EMERG 1 NGX_LOG_ALERT 2 NGX_LOG_CRIT 3 NGX_LOG_ERR 4 NGX_LOG_WARN 5 NGX_LOG_NOTICE 6 NGX_LOG_INFO 7 NGX_LOG_DEBUG 8 Now let's suppose that the wsgi.errors and sys.stderr internally use the NGX_LOG_ERR log level (but this should be modificable by the user, IMHO). Finally, the WSGI application calls: log.debug("ops"). The problem with this is that, in the log file, it could be added: 2007/11/24 11:46:09 [err] 29902#0: *1 DEBUG ops This is not what I want. A message with DEBUG log level should not be written to the log file. The solution is to do: { # nginx config file ... wsgi_param nginx.log_level 40 # logging.ERROR ... } logging.basicConfig(level=environ['nginx.log_level'], format=%(levelname)s %(message)s', stream=sys.stderr) > I guess I'll just have to wait until later to see what it is you are > thinking of. :-) > Manlio Perillo From chris at simplistix.co.uk Mon Nov 26 11:44:32 2007 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 26 Nov 2007 10:44:32 +0000 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps Message-ID: <474AA390.2080209@simplistix.co.uk> Hey All, I hope I have the right list, if not please point me in the right direction... Likewise, if there are good docs that cover all of this, please send me their way ;-) Right, I'm curious as to how wsgi applications end up being multi-threaded or multi-process and if they are, how they share resources such as databases and configuration. There's a couple of reasons I'm asking... The first was something Chris McDonough said about one ofthe issues they're having with the repoze project: when using something like mod_wsgi, it's the first person to hit each thread that takes the hit of loading the configuration and opening up the zodb. Opening the ZODB, in particular, can take a lot of time. How should repoze be structured such that all the threads load their config and open their databases when apache is restarted rather than when each thread is first hit? The second is a problem I see an app I'm working on heading towards. The app has web-alterable configuration, so in a multi-threaded and particular multi-process environment, I need some way to get the other threads or processes to re-read their configuration when it has changed. Hope you guys can help! Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From tseaver at palladion.com Mon Nov 26 18:00:36 2007 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 26 Nov 2007 12:00:36 -0500 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474AA390.2080209@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> Message-ID: <474AFBB4.9060406@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Withers wrote: > I hope I have the right list, if not please point me in the right > direction... > > Likewise, if there are good docs that cover all of this, please send me > their way ;-) > > Right, I'm curious as to how wsgi applications end up being > multi-threaded or multi-process and if they are, how they share > resources such as databases and configuration. > > There's a couple of reasons I'm asking... > > The first was something Chris McDonough said about one ofthe issues > they're having with the repoze project: when using something like > mod_wsgi, it's the first person to hit each thread that takes the hit of > loading the configuration and opening up the zodb. Opening the ZODB, in > particular, can take a lot of time. How should repoze be structured such > that all the threads load their config and open their databases when > apache is restarted rather than when each thread is first hit? Note first that we use mod_wsgi's "daemon"-mode exclusively, which implies creating one or more dedicated subprocesses for each "process group" defined in the Apache config. In that mode, Apache may create new subprocesses at any time, and may destroy old ones (e.g., after reaching a max-requests threshhold). The real issue isn't opening the ZODB; it is populating a new connection cache. A second issue for multi-process configurations is doing all the product initialization dance (for a Zope2 app) or processing ZCML (for either Zope2 or Zope3). The "frist hit slow" problem is intrinsic to any lazy + scalable system. > The second is a problem I see an app I'm working on heading towards. The > app has web-alterable configuration, so in a multi-threaded and > particular multi-process environment, I need some way to get the other > threads or processes to re-read their configuration when it has changed. > > Hope you guys can help! Making the ZODB connection pool sharable across processes doesn't seem feasible. I have toyed with the idea of creating a sharable "L2" cache (the ZEO client cache), perhaps using something like memcache for the backing store. In that configuration, all running appservers would share the same "pickle cache", which could be distributed across a bunch of servers; they would still have to load the pickles as objects into their "LI" cache before using them. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHSvu0+gerLs4ltQ4RAn3eAJ9GrFNlbDeBZ+hShFlUUjclkWuJmwCeLQqm r6dTsrjtbI/QSre84ZR2glk= =OzgY -----END PGP SIGNATURE----- From ianb at colorstudy.com Mon Nov 26 18:15:12 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 26 Nov 2007 11:15:12 -0600 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474AA390.2080209@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> Message-ID: <474AFF20.4060406@colorstudy.com> Chris Withers wrote: > Hey All, > > I hope I have the right list, if not please point me in the right > direction... > > Likewise, if there are good docs that cover all of this, please send me > their way ;-) > > Right, I'm curious as to how wsgi applications end up being > multi-threaded or multi-process and if they are, how they share > resources such as databases and configuration. At least in Pylons apps, configuration is setup during instantiation. Configuration is generally copyable (consisting of stuff like strings, not open file objects), so it can be cloned across processes easily. Things like database connections are handled by libraries that do pooling on their own. > There's a couple of reasons I'm asking... > > The first was something Chris McDonough said about one ofthe issues > they're having with the repoze project: when using something like > mod_wsgi, it's the first person to hit each thread that takes the hit of > loading the configuration and opening up the zodb. Opening the ZODB, in > particular, can take a lot of time. How should repoze be structured such > that all the threads load their config and open their databases when > apache is restarted rather than when each thread is first hit? > > The second is a problem I see an app I'm working on heading towards. The > app has web-alterable configuration, so in a multi-threaded and > particular multi-process environment, I need some way to get the other > threads or processes to re-read their configuration when it has changed. In Paste/Pylons the configuration is stored in the environment (which is per-request), and put into a threadlocal object for access. Also, in general using the Paste Deploy style of factory for WSGI applications, *if* the factory is sufficiently fast you can dynamically or lazily instantiate applications. E.g.: def make_dynamic_configurable_application( global_conf, subapp_ep_name, config_source, **config_source_args): if subapp_ep_name.startswith('egg:'): subapp_ep_name = subapp_ep_name[4:] if '#' in subapp_ep_name: dist, ep_name = subapp_ep_name.split('#', 1) else: dist = subapp_ep_name ep_name = 'main' app_factory = pkg_resources.load_entry_point( 'paste.app_factory', dist, ep_name) # You might want to do something similar with config_source # for now we'll just imagine its a function that returns a # dictionary global_conf = global_conf.copy() global_conf['config_source'] = config_source app_cache = {} def application(environ, start_response): config = config_source(environ, **config_source_args) config_key = sorted(config.items()) if config_key not in app_cache: # Probably should do some locking here... app = app_factory(global_conf, **config) app_cache[config_key] = app else: app = app_cache[config_key] return app(environ, start_response) return application This all builds off what Paste Deploy already provides, and would allow you to apply dynamic configuration to any Paste Deploy-compatible application, if that application can also safely handle multiple loaded instances/configurations. Pylons applications work fine this way. Also note that the configuration loader itself is configured using the Paste Deploy interfaces. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From wilk at flibuste.net Mon Nov 26 22:30:26 2007 From: wilk at flibuste.net (William Dode) Date: Mon, 26 Nov 2007 21:30:26 +0000 (UTC) Subject: [Web-SIG] multi-threaded or multi-process wsgi apps References: <474AA390.2080209@simplistix.co.uk> Message-ID: On 26-11-2007, Chris Withers wrote: > Hey All, > > I hope I have the right list, if not please point me in the right > direction... > > Likewise, if there are good docs that cover all of this, please send me > their way ;-) > > Right, I'm curious as to how wsgi applications end up being > multi-threaded or multi-process and if they are, how they share > resources such as databases and configuration. > > There's a couple of reasons I'm asking... > > The first was something Chris McDonough said about one ofthe issues > they're having with the repoze project: when using something like > mod_wsgi, it's the first person to hit each thread that takes the hit of > loading the configuration and opening up the zodb. Opening the ZODB, in > particular, can take a lot of time. How should repoze be structured such > that all the threads load their config and open their databases when > apache is restarted rather than when each thread is first hit? What about using a distributed object system like pyro ? You could have the heavy loading of the database in a server independant of the wsgi application. http://pyro.sourceforge.net/ -- William Dod? - http://flibuste.net Informaticien ind?pendant From fumanchu at aminus.org Mon Nov 26 23:06:23 2007 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 26 Nov 2007 14:06:23 -0800 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474AA390.2080209@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> Message-ID: Chris Withers wrote: > Right, I'm curious as to how wsgi applications end up being > multi-threaded or multi-process and if they are, how they share > resources such as databases and configuration. > > There's a couple of reasons I'm asking... > > The first was something Chris McDonough said about one ofthe issues > they're having with the repoze project: when using something like > mod_wsgi, it's the first person to hit each thread that takes the hit > of loading the configuration and opening up the zodb. Opening the ZODB, > in particular, can take a lot of time. How should repoze be structured > such that all the threads load their config and open their databases > when apache is restarted rather than when each thread is first hit? If I were coding it, repoze would use a database connection pool that is populated at (sub)process startup. The main thread is the only one "loading config". That avoids any waits during the HTTP request, so your req/sec rate will go way up. It also allows the process to fail fast in the event of unreachable databases, so such errors during deployment will be found sooner and will be easier to debug if they occur outside of an HTTP request. It's like a stage production: you don't ask your actors to buy props and build the set during the show--instead, you buy/build all that and script/debug/automate the hell out of it before you have an audience. All long-running servers are a lot like that; do everything you can before the first request to make absolutely sure nothing slows or stops you during showtime. > The second is a problem I see an app I'm working on heading towards. > The app has web-alterable configuration, so in a multi-threaded and > particular multi-process environment, I need some way to get the other > threads or processes to re-read their configuration when it has > changed. In a multithreaded environment, I recommend apps read config only at process startup, parse the entries and use them to modify live objects, and then throw away the config. Then, if you need to make changes to settings while live, you just modify the live objects in the same way the config parsing step did (and then modify the config file only if desired). That avoids having to re-read the whole config file for each potential change. In a multiprocess environment, you can notify other process with any of various forms of IPC or shared state mechanisms. Robert Brewer fumanchu at aminus.org From graham.dumpleton at gmail.com Mon Nov 26 23:42:29 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 27 Nov 2007 09:42:29 +1100 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474AA390.2080209@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> Message-ID: <88e286470711261442y3b153906le827bb5c92e49a1a@mail.gmail.com> On 26/11/2007, Chris Withers wrote: > Hey All, > > I hope I have the right list, if not please point me in the right > direction... > > Likewise, if there are good docs that cover all of this, please send me > their way ;-) > > Right, I'm curious as to how wsgi applications end up being > multi-threaded or multi-process and if they are, how they share > resources such as databases and configuration. > > There's a couple of reasons I'm asking... > > The first was something Chris McDonough said about one ofthe issues > they're having with the repoze project: when using something like > mod_wsgi, it's the first person to hit each thread that takes the hit of > loading the configuration and opening up the zodb. Opening the ZODB, in > particular, can take a lot of time. How should repoze be structured such > that all the threads load their config and open their databases when > apache is restarted rather than when each thread is first hit? > > The second is a problem I see an app I'm working on heading towards. The > app has web-alterable configuration, so in a multi-threaded and > particular multi-process environment, I need some way to get the other > threads or processes to re-read their configuration when it has changed. > > Hope you guys can help! For those who haven't previously read it, some background reading on issues of data visibility when using Apache and specifically mod_wsgi (although also applies to mod_python to a degree), can be found at: http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading The problem with moving initialisation from the first request to server process initialisation when using Apache/mod_wsgi is that often Apache isn't hosting just the one Python web application. Because of this applications are usually separated to run in different Python sub interpreters, or processes by using mod_wsgi daemon mode. First issue therefore, albeit not a major one, is being able to indicate within which daemon process or Python sub interpreter one should do the server process initialisation. The second issue is what do you do if the server process initialisation fails. In worst case scenario, if it causes the server process to crash, then Apache will go and startup a new process straight away, and if it keeps crashing, then you will get in a loop of continual process restarts, possibly affecting machine performance. Even if the failure doesn't crash the process but still leaves the software at that point in an unusable state, what do you do. In mod_python, where PythonImport directive is available and can be used to do server process initialisation, most people don't even consider that startup could fail. Thus, what happens is that code works okay for some time, then they have a problem and the whole server grinds to a halt until someone notices and restarts it. A more reliable way is therefore to have it so that an individual request is able to trigger the server process initialisation if it hasn't previously succeeded. Thus, if a failure has previously occurred, when a new request arrives it can retrigger any initialisation and if it works everything can then keep going. The issue then is the delay of the server process initialisation until the first request and the consequent lag noticeable by the user. To ensure initialisation can be retriggered, but also avoid a delay, one could implement initialisation at process startup as well as it being triggered by the first request if it has previously failed. The question though is whether this will make a difference to what the user sees. If the server is lightly loaded then it probably would, as the infrequent nature of requests means that in all likelihood the server process initialisation would have completed before a request arrives. If however the machine is under load with a high hit rate, the user may still see a lag anyway. Whether this will be true will with mod_wsgi depend on whether embedded or daemon mode is being used, and it using daemon mode how many processes are in the daemon process group. The worst case scenario here is using mod_wsgi daemon mode with a single process for the application. If maximum requests is reached and the process restarted, irrespective of whether you do initialisation when the process starts, you get hit with a few sources of delays. The first, and possibly overlooked as a source of problems, is how long the existing process takes to shutdown. In mod_wsgi daemon mode it will not start a new process until the old one has shutdown. Rather than just kill the process immediately it will give existing requests which are running a chance to complete. The default length of time it will wait is 5 seconds. If the requests haven't completed in that time it will kill off the process and a new one will be restarted. Even if the requests complete promptly, mod_wsgi will trigger proper shutdown of the Python interpreters, including stopping of non daemon threads (not that there should be any) and running atexit registered functions. If this for some reason also takes a long time it can trigger the default 5 second timeout and process will be killed off. Once old process has been shutdown, you still need to start up new one. This is a fork from Apache child process so quick to create process, but you still need to load and initialisation the application. As the new process isn't started until old one has been shutdown, if you are only running one daemon process for the application, then any new incoming requests will queue up within the listener socket queue. These requests will not start to be processed until new process is ready. If application initialisation done at process start, then that will still delay any pending requests, just like if the first request triggered instead triggered initialisation. These delays in shutdown and startup aren't going to be as big an issue if running multiple mod_wsgi daemon processes, or if using embedded mode, as the other processes can take over servicing requests while process is being recycled, provided of course that all daemon processes aren't being recycled at the same time. Because of how process recycling works for mod_wsgi daemon mode, if using it, and your processes are slow to startup or shutdown, or you have long running requests, then recommended that you run multiple daemon processes. Obviously, if your application isn't multiprocess safe, that could be an issue. Next is then possibly to look at what may be stopping an application from shutting down promptly. Anyway, hope this explains a few issues and gives you some things to look at. I have cc'd this over to mod_wsgi list as discussion on how mod_wsgi does things more appropriate over there. Maybe go to mod_wsgi list if you want to discuss further any new feature for allowing server process startup. I haven't ruled it out completely, but also don't want to provide a mechanism which people will just end up using in the wrong way and so not consider steps that may still be required to trigger initialisation when requests arrive. As for general issues around best way to perform application initialisation, problem is that what is the most appropriate way may depend on the specific hosting mechanism. There isn't necessarily going to be one way that will suit all ways that WSGI can be hosted, thus why there partly isn't a standard on how to do it. Graham From graham.dumpleton at gmail.com Tue Nov 27 00:26:49 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 27 Nov 2007 10:26:49 +1100 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: References: <474AA390.2080209@simplistix.co.uk> Message-ID: <88e286470711261526v3896954fp4608aa6a03eeccf3@mail.gmail.com> On 27/11/2007, Robert Brewer wrote: > Chris Withers wrote: > > Right, I'm curious as to how wsgi applications end up being > > multi-threaded or multi-process and if they are, how they share > > resources such as databases and configuration. > > > > There's a couple of reasons I'm asking... > > > > The first was something Chris McDonough said about one ofthe issues > > they're having with the repoze project: when using something like > > mod_wsgi, it's the first person to hit each thread that takes the hit > > of loading the configuration and opening up the zodb. Opening the > ZODB, > > in particular, can take a lot of time. How should repoze be structured > > such that all the threads load their config and open their databases > > when apache is restarted rather than when each thread is first hit? > > If I were coding it, repoze would use a database connection pool that is > populated at (sub)process startup. The issue with running under Apache, whether it be mod_wsgi or mod_python, is that the server itself doesn't necessarily know anything about what applications may actually need to be loaded. This is because both support the concept of sticking the file representing the entry point to the application in some file system directory. The first that the server knows about the application is when a URL arrives which maps to that application file. Thus, in the general case one cant have pre initialisation at (sub)process startup. To have pre initialisation means providing an explicit means of configuring the server to say that it is possible that some application may get invoked through a URL and so just in case it should preload the application. Because it involves changing main server configuration, obviously can only be used as an option where you control the actual web server. There would be no way you could use such an option if you were just a user in a paid shared web hosting environment. In that case you can't avoid doing delayed initialisation at time that first request arrives. This is the big difference between Apache and pure Python hosting solutions. That is that Apache has to deal with potential shared hosting issues. Pure Python hosting solutions would probably always be under direct control of the user and be only running their own code. Graham From fumanchu at aminus.org Tue Nov 27 01:18:28 2007 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 26 Nov 2007 16:18:28 -0800 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <88e286470711261526v3896954fp4608aa6a03eeccf3@mail.gmail.com> References: <474AA390.2080209@simplistix.co.uk> <88e286470711261526v3896954fp4608aa6a03eeccf3@mail.gmail.com> Message-ID: Graham Dumpleton wrote: > On 27/11/2007, Robert Brewer wrote: > > Chris Withers wrote: > > > Right, I'm curious as to how wsgi applications end up being > > > multi-threaded or multi-process and if they are, how they share > > > resources such as databases and configuration. > > > > > > There's a couple of reasons I'm asking... > > > > > > The first was something Chris McDonough said about one ofthe issues > > > they're having with the repoze project: when using something like > > > mod_wsgi, it's the first person to hit each thread that takes the > hit > > > of loading the configuration and opening up the zodb. Opening the > > ZODB, > > > in particular, can take a lot of time. How should repoze be > structured > > > such that all the threads load their config and open their > databases > > > when apache is restarted rather than when each thread is first hit? > > > > If I were coding it, repoze would use a database connection pool that > is > > populated at (sub)process startup. > > The issue with running under Apache, whether it be mod_wsgi or > mod_python, is that the server itself doesn't necessarily know > anything about what applications may actually need to be loaded. This > is because both support the concept of sticking the file representing > the entry point to the application in some file system directory. The > first that the server knows about the application is when a URL > arrives which maps to that application file. > > Thus, in the general case one cant have pre initialisation at > (sub)process startup. To have pre initialisation means providing an > explicit means of configuring the server to say that it is possible > that some application may get invoked through a URL and so just in > case it should preload the application. > > Because it involves changing main server configuration, obviously can > only be used as an option where you control the actual web server. > There would be no way you could use such an option if you were just a > user in a paid shared web hosting environment. In that case you can't > avoid doing delayed initialisation at time that first request arrives. > > This is the big difference between Apache and pure Python hosting > solutions. That is that Apache has to deal with potential shared > hosting issues. Pure Python hosting solutions would probably always be > under direct control of the user and be only running their own code. True, but that doesn't change my recommendation. Even if you're willing to live with delays on the first request, you still should do as much as possible as early as possible. Any server, application, or framework which *requires* me to live with those delays even though I've taken pains to deploy in a capable, controllable environment would make me seriously question their utility. Robert Brewer fumanchu at aminus.org From tseaver at palladion.com Tue Nov 27 21:55:38 2007 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 27 Nov 2007 15:55:38 -0500 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: References: <474AA390.2080209@simplistix.co.uk> Message-ID: <474C844A.3000804@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Robert Brewer wrote: > Chris Withers wrote: >> Right, I'm curious as to how wsgi applications end up being >> multi-threaded or multi-process and if they are, how they share >> resources such as databases and configuration. >> >> There's a couple of reasons I'm asking... >> >> The first was something Chris McDonough said about one ofthe issues >> they're having with the repoze project: when using something like >> mod_wsgi, it's the first person to hit each thread that takes the hit >> of loading the configuration and opening up the zodb. Opening the > ZODB, >> in particular, can take a lot of time. How should repoze be structured >> such that all the threads load their config and open their databases >> when apache is restarted rather than when each thread is first hit? > > If I were coding it, repoze would use a database connection pool that is > populated at (sub)process startup. The main thread is the only one > "loading config". That avoids any waits during the HTTP request, so your > req/sec rate will go way up. It also allows the process to fail fast in > the event of unreachable databases, so such errors during deployment > will be found sooner and will be easier to debug if they occur outside > of an HTTP request. Zope has already done this for years. The issue is that, under mod_wsgi, one of the most attractive ways to run Zope / repoze is using 'daemon mode', which allows for separate, long-running processes to handle the application code. *Each* of these processes is going to have its own connection pool, which is less than ideal (from the point of view of RAM usage / cache coherence); however, isolating the application in separate process(es) has other large benefits (improved SMP behavior, for one, as well as allowing different applications to run with different configurations). > It's like a stage production: you don't ask your actors to buy props and > build the set during the show--instead, you buy/build all that and > script/debug/automate the hell out of it before you have an audience. > All long-running servers are a lot like that; do everything you can > before the first request to make absolutely sure nothing slows or stops > you during showtime. > >> The second is a problem I see an app I'm working on heading towards. >> The app has web-alterable configuration, so in a multi-threaded and >> particular multi-process environment, I need some way to get the other >> threads or processes to re-read their configuration when it has >> changed. > > In a multithreaded environment, I recommend apps read config only at > process startup, parse the entries and use them to modify live objects, > and then throw away the config. Then, if you need to make changes to > settings while live, you just modify the live objects in the same way > the config parsing step did (and then modify the config file only if > desired). That avoids having to re-read the whole config file for each > potential change. In a multiprocess environment, you can notify other > process with any of various forms of IPC or shared state mechanisms People want to be able to edit the config file and HUP the server, in order to ensure that any changes they make persist across restarts; requiring them to change it in two places (a "control panel" as well as the config file) is pretty much a non-starter (pun intended :). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHTIRK+gerLs4ltQ4RAt26AJ4yqmjtkHJM0uX4twIP66wAf+CdAwCgpK0T rrtH9FAWKzN2vyti+Pxmke4= =2B/f -----END PGP SIGNATURE----- From chris at simplistix.co.uk Wed Nov 28 22:50:21 2007 From: chris at simplistix.co.uk (Chris Withers) Date: Wed, 28 Nov 2007 21:50:21 +0000 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474AFBB4.9060406@palladion.com> References: <474AA390.2080209@simplistix.co.uk> <474AFBB4.9060406@palladion.com> Message-ID: <474DE29D.5060205@simplistix.co.uk> Tres Seaver wrote: > > Note first that we use mod_wsgi's "daemon"-mode exclusively, Forgive me for being uninformed, but what are the other options? > which > implies creating one or more dedicated subprocesses for each "process > group" defined in the Apache config. Does each sub process get its own python interpretter? (ie: does it have to reload all its config and open up its own database connections again?) > In that mode, Apache may create new subprocesses at any time, and may > destroy old ones (e.g., after reaching a max-requests threshhold). The > real issue isn't opening the ZODB; Even if the ZODB doesn't have index files? > cache. A second issue for multi-process configurations is doing all the > product initialization dance (for a Zope2 app) or processing ZCML (for > either Zope2 or Zope3). The "frist hit slow" problem is intrinsic to > any lazy + scalable system. Is there really no way that the "slow" work can be shared? >> The second is a problem I see an app I'm working on heading towards. The >> app has web-alterable configuration, so in a multi-threaded and >> particular multi-process environment, I need some way to get the other >> threads or processes to re-read their configuration when it has changed. >> >> Hope you guys can help! > > Making the ZODB connection pool sharable across processes doesn't seem > feasible. Indeed, but I don't see this app having any zodb connections (necessarilly ;-) ) But, even if you were using, say SQLAlchemy and its connection pooling, wouldn't each process end up having its own connection pool, etc? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Wed Nov 28 22:52:05 2007 From: chris at simplistix.co.uk (Chris Withers) Date: Wed, 28 Nov 2007 21:52:05 +0000 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474AFF20.4060406@colorstudy.com> References: <474AA390.2080209@simplistix.co.uk> <474AFF20.4060406@colorstudy.com> Message-ID: <474DE305.1050104@simplistix.co.uk> Ian Bicking wrote: > At least in Pylons apps, configuration is setup during instantiation. > Configuration is generally copyable (consisting of stuff like strings, > not open file objects), so it can be cloned across processes easily. I can understand sharing across threads, but how do you share across processes? >> The second is a problem I see an app I'm working on heading towards. >> The app has web-alterable configuration, so in a multi-threaded and >> particular multi-process environment, I need some way to get the other >> threads or processes to re-read their configuration when it has changed. > > In Paste/Pylons the configuration is stored in the environment (which is > per-request), and put into a threadlocal object for access. Again, how about across processes? And if the configuration changes once the app is up and running, how do you propogate changes to all the other app processes? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Wed Nov 28 22:54:23 2007 From: chris at simplistix.co.uk (Chris Withers) Date: Wed, 28 Nov 2007 21:54:23 +0000 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: References: <474AA390.2080209@simplistix.co.uk> Message-ID: <474DE38F.7030508@simplistix.co.uk> Robert Brewer wrote: > In a multithreaded environment, I recommend apps read config only at > process startup, parse the entries and use them to modify live objects, > and then throw away the config. Then, if you need to make changes to > settings while live, you just modify the live objects in the same way > the config parsing step did (and then modify the config file only if > desired). I completely agree with this :-) > potential change. In a multiprocess environment, you can notify other > process with any of various forms of IPC or shared state mechanisms. Can you suggest some good pythonic (sorry! ;-) ) IPC or shared state mechanisms? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Wed Nov 28 22:59:32 2007 From: chris at simplistix.co.uk (Chris Withers) Date: Wed, 28 Nov 2007 21:59:32 +0000 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <88e286470711261442y3b153906le827bb5c92e49a1a@mail.gmail.com> References: <474AA390.2080209@simplistix.co.uk> <88e286470711261442y3b153906le827bb5c92e49a1a@mail.gmail.com> Message-ID: <474DE4C4.7000603@simplistix.co.uk> Graham Dumpleton wrote: > As for general issues around best way to perform application > initialisation, problem is that what is the most appropriate way may > depend on the specific hosting mechanism. There isn't necessarily > going to be one way that will suit all ways that WSGI can be hosted, > thus why there partly isn't a standard on how to do it. I appreciate all the good points you made above that I snipped, however, surely the aim of WSGI is to allow the application author to not have to worry so much about deployment but what I'm hearing is that if you get your application implementation wrong, you'll suffer badly in lots of deployment situations. Of course, as someone trying to write a "good" wsgi app, I'd like it to play nice in as many hosting scenarios as possible. What's the best way to achieve that? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From graham.dumpleton at gmail.com Wed Nov 28 23:33:41 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 29 Nov 2007 09:33:41 +1100 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474DE29D.5060205@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> <474AFBB4.9060406@palladion.com> <474DE29D.5060205@simplistix.co.uk> Message-ID: <88e286470711281433s4ae070d2tdea499042d07a136@mail.gmail.com> On 29/11/2007, Chris Withers wrote: > Tres Seaver wrote: > > > > Note first that we use mod_wsgi's "daemon"-mode exclusively, > > Forgive me for being uninformed, but what are the other options? The other mode is embedded mode. Embedded mode is like using mod_python. Daemon mode is like using mod_fastcgi. In mod_wsgi it provides you the flexibility of choosing which you want to use in one package. You can if appropriate even use a combination of both modes. For example, run Django in embedded mode for best performance, but delegate a Trac instance to run in daemon mode so it is separated out of Apache child processes, there being various reasons with Trac why you might want to do that. > > which > > implies creating one or more dedicated subprocesses for each "process > > group" defined in the Apache config. > > Does each sub process get its own python interpretter? Each process can if necessary have multiple Python sub interpreters, and is not limited to just one. This would be used where you need to run multiple applications in the same process but with sub interpreters being used as a means of separating them so they don't interfere with each other. Take Django for instance, you can't run two instances of that inside a pure Python WSGI server process because of the way it uses a global to indicate what its configuration is. I agree that this isn't in the spirit of WSGI, but that is how things are. The Django folks are looking at trying to remove the limitation. In the mean time, you either have to use distinct processes, or using mod_wsgi or mod_python, run the different Django instances in separate Python sub interpreters within the one process. > (ie: does it have to reload all its config and open up its own database > connections again?) Being separate processes then obviously they would need to do that. This is generally no different to how people often run multiple instances of a standalone Python web based application and then use a proxy/load balancer to distribute requests across the processes. > > cache. A second issue for multi-process configurations is doing all the > > product initialization dance (for a Zope2 app) or processing ZCML (for > > either Zope2 or Zope3). The "frist hit slow" problem is intrinsic to > > any lazy + scalable system. > > Is there really no way that the "slow" work can be shared? As above, this is usually no different to where someone is creating multiple distinct Python web application instances and proxy/load balancing. Most complex Python web applications don't take kindly to doing complex stuff in a parent process and then forking off worker processes. This is because a lot of stuff like database connections can't necessarily be inherited across a fork easily without causing some problems or requiring some complicated coding to make it work. Thus generally better for each process to create its own connections etc. Reading of actual file configuration is generally a quite minor overhead in the greater scheme of things. Even if for a particular system one could gain something by doing it in a parent process and then forking, this isn't practical in Apache as the parent process typically runs as root and you wouldn't want user code being run as root. User code running in Apache parent would also cause a range of other problems as well. Graham From graham.dumpleton at gmail.com Wed Nov 28 23:45:50 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 29 Nov 2007 09:45:50 +1100 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474DE4C4.7000603@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> <88e286470711261442y3b153906le827bb5c92e49a1a@mail.gmail.com> <474DE4C4.7000603@simplistix.co.uk> Message-ID: <88e286470711281445q287d5fbfv48b5ca6fef93112c@mail.gmail.com> On 29/11/2007, Chris Withers wrote: > Graham Dumpleton wrote: > > As for general issues around best way to perform application > > initialisation, problem is that what is the most appropriate way may > > depend on the specific hosting mechanism. There isn't necessarily > > going to be one way that will suit all ways that WSGI can be hosted, > > thus why there partly isn't a standard on how to do it. > > I appreciate all the good points you made above that I snipped, however, > surely the aim of WSGI is to allow the application author to not have to > worry so much about deployment but what I'm hearing is that if you get > your application implementation wrong, you'll suffer badly in lots of > deployment situations. The WSGI specification only talks about the request interface between application and underlying web server. It doesn't really say anything about deployment issues. Thus, different hosting solutions provided different means of doing it. Remember, that various underlying servers now used for hosting WSGI applications existed before the WSGI specification came along. Also, various applications existed before as well and have been converted to be able to host on top of WSGI adapters. For some of those, they still carry setup requirements which hark back to the original way they were hosted. There is also no consistency in how configuration is done. Such is the way things are. :-) Graham From ianb at colorstudy.com Thu Nov 29 00:44:01 2007 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 28 Nov 2007 17:44:01 -0600 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474DE305.1050104@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> <474AFF20.4060406@colorstudy.com> <474DE305.1050104@simplistix.co.uk> Message-ID: <474DFD41.10007@colorstudy.com> Chris Withers wrote: > Ian Bicking wrote: >> At least in Pylons apps, configuration is setup during instantiation. >> Configuration is generally copyable (consisting of stuff like strings, >> not open file objects), so it can be cloned across processes easily. > > I can understand sharing across threads, but how do you share across > processes? Well, with a forking server like flup it is just inherited from the fork. Otherwise, I'm not sure. The config is pickleable. Usually I'd pass a reference to the config file, or config source, which are probably simple strings. >>> The second is a problem I see an app I'm working on heading towards. >>> The app has web-alterable configuration, so in a multi-threaded and >>> particular multi-process environment, I need some way to get the >>> other threads or processes to re-read their configuration when it has >>> changed. >> >> In Paste/Pylons the configuration is stored in the environment (which >> is per-request), and put into a threadlocal object for access. > > Again, how about across processes? > And if the configuration changes once the app is up and running, how do > you propogate changes to all the other app processes? Generally you'd want to kill all the worker processes and start over. If you have a reasonable way to do that (and I think mod_wsgi would give you a reasonable way to do that), restarting the process is always cleanest. I believe even something like Apache's "graceful" restart/reload just restarts the server, while letting existing requests finish. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From graham.dumpleton at gmail.com Thu Nov 29 01:02:16 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 29 Nov 2007 11:02:16 +1100 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474DFD41.10007@colorstudy.com> References: <474AA390.2080209@simplistix.co.uk> <474AFF20.4060406@colorstudy.com> <474DE305.1050104@simplistix.co.uk> <474DFD41.10007@colorstudy.com> Message-ID: <88e286470711281602re91e4f9r52fe2070629e152f@mail.gmail.com> > >>> The second is a problem I see an app I'm working on heading towards. > >>> The app has web-alterable configuration, so in a multi-threaded and > >>> particular multi-process environment, I need some way to get the > >>> other threads or processes to re-read their configuration when it has > >>> changed. > >> > >> In Paste/Pylons the configuration is stored in the environment (which > >> is per-request), and put into a threadlocal object for access. > > > > Again, how about across processes? > > And if the configuration changes once the app is up and running, how do > > you propogate changes to all the other app processes? > > Generally you'd want to kill all the worker processes and start over. > If you have a reasonable way to do that (and I think mod_wsgi would give > you a reasonable way to do that), restarting the process is always > cleanest. I believe even something like Apache's "graceful" > restart/reload just restarts the server, while letting existing requests > finish. Yes, with Apache one can use 'graceful'. That will cause applications running under either mod_wsgi embedded mode or daemon mode to be restarted. For daemon mode, in mod_wsgi 2.0 there is also 'Process' reload mechanism as an option. Just touching the main WSGI script file entry point will cause just the processes for that daemon process group to be restarted on next request, thereby avoiding a restart of the whole Apache web server and any other hosted applications. Thus, change your config file, whether it be actual Python code or an ini file and touch the main WSGI script file for that application. Upon the next request against that application it will detect WSGI script file has changed and will do the restart. If you wanted to force the restart without waiting for next request to arrive, then you can also just send a SIGINT to all processes in that daemon process group. The trick is knowing which processes they are, because if still run as Apache user, hard to tell them from normal Apace child processes since they are just a fork of main Apache process. If mod_wsgi told to change them to run as different user, somewhat easier. This sort of process reload mechanism based on script file change is also available in things like mod_fastcgi. Based on comments made by various people, not sure how reliable it is in mod_fastcgi though. For other options on how to trigger automatic reloading of application process in mod_wsgi when code and/or config changes see: http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode Latter example in this based on similar stuff that Ian has done, just customised to the setting of mod_wsgi and that one can rely on mod_wsgi to automatically restart a daemon process when killed off. Thus bit different to where this sort of idea is used elsewhere to trigger in process reload of modules on top of existing modules, which will not always work because of dependencies between modules. Graham From chris at simplistix.co.uk Fri Nov 30 00:16:07 2007 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 29 Nov 2007 23:16:07 +0000 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <88e286470711281433s4ae070d2tdea499042d07a136@mail.gmail.com> References: <474AA390.2080209@simplistix.co.uk> <474AFBB4.9060406@palladion.com> <474DE29D.5060205@simplistix.co.uk> <88e286470711281433s4ae070d2tdea499042d07a136@mail.gmail.com> Message-ID: <474F4837.4060103@simplistix.co.uk> Graham Dumpleton wrote: > package. You can if appropriate even use a combination of both modes. So these would just be seperate sections in apache's config files? > For example, run Django in embedded mode for best performance, Why does this give best performance? > but > delegate a Trac instance to run in daemon mode so it is separated out > of Apache child processes, there being various reasons with Trac why > you might want to do that. Such as? > Each process can if necessary have multiple Python sub interpreters, > and is not limited to just one. This would be used where you need to > run multiple applications in the same process but with sub > interpreters being used as a means of separating them so they don't > interfere with each other. Wow, I didn't even know this was possible.. what does dirt-simple-hello-world-like python that does this look like? >> (ie: does it have to reload all its config and open up its own database >> connections again?) > > Being separate processes then obviously they would need to do that. > This is generally no different to how people often run multiple > instances of a standalone Python web based application and then use a > proxy/load balancer to distribute requests across the processes. *nods* The difficulty comes when you have to invalidate changes across processes. ZEO/ZODB is the only object system I know that does that. If you were using a relational database and/or a mapper such as SQLAlchemy, I wonder how you could poke it such that config-like changes in one process were propogated to another. That said, I wonder how SQLAlchemy handles invalidations of its object model when the underlying database changes as a result of actions by another process... ...but this is the wrong list for that. > Thus generally better for each process to create its own connections > etc. Reading of actual file configuration is generally a quite minor > overhead in the greater scheme of things. You haven't used zope, right? ;-) cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri Nov 30 00:23:20 2007 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 29 Nov 2007 23:23:20 +0000 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <88e286470711281602re91e4f9r52fe2070629e152f@mail.gmail.com> References: <474AA390.2080209@simplistix.co.uk> <474AFF20.4060406@colorstudy.com> <474DE305.1050104@simplistix.co.uk> <474DFD41.10007@colorstudy.com> <88e286470711281602re91e4f9r52fe2070629e152f@mail.gmail.com> Message-ID: <474F49E8.4010209@simplistix.co.uk> Graham Dumpleton wrote: > For daemon mode, in mod_wsgi 2.0 there is also 'Process' reload > mechanism as an option. Just touching the main WSGI script file entry > point will cause just the processes for that daemon process group to > be restarted on next request, thereby avoiding a restart of the whole > Apache web server and any other hosted applications. Cool, when's 2.0 out? > Thus, change your config file, whether it be actual Python code or an > ini file and touch the main WSGI script file for that application. > Upon the next request against that application it will detect WSGI > script file has changed and will do the restart. Will that restart all processes/sub processes/threads? > For other options on how to trigger automatic reloading of application > process in mod_wsgi when code and/or config changes see: > > http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode Not sure I followed it all, but certainly looks useful :-) > Latter example in this based on similar stuff that Ian has done, just > customised to the setting of mod_wsgi and that one can rely on > mod_wsgi to automatically restart a daemon process when killed off. > Thus bit different to where this sort of idea is used elsewhere to > trigger in process reload of modules on top of existing modules, which > will not always work because of dependencies between modules. Yes, Zope learnt this the hard way :-( The was a "Refresh" add-on for a long time which 70% worked and so caused a lot of trouble... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From graham.dumpleton at gmail.com Fri Nov 30 00:41:54 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 30 Nov 2007 10:41:54 +1100 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474F4837.4060103@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> <474AFBB4.9060406@palladion.com> <474DE29D.5060205@simplistix.co.uk> <88e286470711281433s4ae070d2tdea499042d07a136@mail.gmail.com> <474F4837.4060103@simplistix.co.uk> Message-ID: <88e286470711291541k29ac308axb5a6bf6c7af27e82@mail.gmail.com> On 30/11/2007, Chris Withers wrote: > Graham Dumpleton wrote: > > package. You can if appropriate even use a combination of both modes. > > So these would just be seperate sections in apache's config files? For an example see: http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html You just need to use mod_wsgi WSGIProcessGroup directive in appropriate Directory or Location context to indicate that that WSGI application or subset of an application should be delegated to a different daemon process. > > For example, run Django in embedded mode for best performance, > > Why does this give best performance? For some discussion on this see the following, including comments: http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html In short, everything is done in the first process to accept the request. There is no proxying to a secondary process across another socket connection. Also, Apache has the ability to create additional processes on demand as server load increases. Proxy based solutions, unless proxying to Apache as a backend, often use a fixed number of backend processes and have no way of scaling up to meet demand by starting up new processes automatically. As explained in the blog it is a trade off. You get extra speed, but there are other issues to contend with which make running multiple applications in embedded mode at the same time problematic. But then if wanting maximum speed you would have dedicated the one server to that application and so the issues aren't really a problem. > > but > > delegate a Trac instance to run in daemon mode so it is separated out > > of Apache child processes, there being various reasons with Trac why > > you might want to do that. > > Such as? Python bindings for subversion need to run in main Python interpreter as they aren't written to work properly within a secondary Python sub interpreter. One can still force this when run in embedded mode, but Trac can chew up a lot of memory over time, especially if GoogleBot decides to browse every revision of your code through the Trac source browser. Yes one should block search engines from such searching, but even so, can be beneficial to use daemon mode as then takes main bloat out of Apache child processes and allows one to set more aggressive value for maximum requests so that processes are recycled and memory reclaimed on a regular basis and reset back to idle level. > > Each process can if necessary have multiple Python sub interpreters, > > and is not limited to just one. This would be used where you need to > > run multiple applications in the same process but with sub > > interpreters being used as a means of separating them so they don't > > interfere with each other. > > Wow, I didn't even know this was possible.. what does > dirt-simple-hello-world-like python that does this look like? The Python code looks exactly the same, you don't need to change anything. In mod_wsgi it defaults to automatically using a separate sub interpreter for each WSGI application script file. If your WSGI applications don't clash and you want to make them run in the same interpreter to avoiding loading multiple copies of modules into memory, just need to use the mod_wsgi WSGIApplicationGroup directory. For example: WSGIApplicationGroup %{SERVER} ... This will result in all WSGI applications running under a specific virtual server (on port 80/443) using the same Python sub interpreter. More details in: http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives > > Thus generally better for each process to create its own connections > > etc. Reading of actual file configuration is generally a quite minor > > overhead in the greater scheme of things. > > You haven't used zope, right? ;-) Not since last century some time. :-) Graham From graham.dumpleton at gmail.com Fri Nov 30 01:01:44 2007 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 30 Nov 2007 11:01:44 +1100 Subject: [Web-SIG] multi-threaded or multi-process wsgi apps In-Reply-To: <474F49E8.4010209@simplistix.co.uk> References: <474AA390.2080209@simplistix.co.uk> <474AFF20.4060406@colorstudy.com> <474DE305.1050104@simplistix.co.uk> <474DFD41.10007@colorstudy.com> <88e286470711281602re91e4f9r52fe2070629e152f@mail.gmail.com> <474F49E8.4010209@simplistix.co.uk> Message-ID: <88e286470711291601o2c5023b6ic946996e63d9a175@mail.gmail.com> On 30/11/2007, Chris Withers wrote: > Graham Dumpleton wrote: > > For daemon mode, in mod_wsgi 2.0 there is also 'Process' reload > > mechanism as an option. Just touching the main WSGI script file entry > > point will cause just the processes for that daemon process group to > > be restarted on next request, thereby avoiding a restart of the whole > > Apache web server and any other hosted applications. > > Cool, when's 2.0 out? Release candidate for 2.0 including this feature is already available. Everything was okay for release but then decided to tweak/enhance some other features, including now finally adding means of specifying script to run when process first starts so that one can preload stuff. > > Thus, change your config file, whether it be actual Python code or an > > ini file and touch the main WSGI script file for that application. > > Upon the next request against that application it will detect WSGI > > script file has changed and will do the restart. > > Will that restart all processes/sub processes/threads? Where a daemon process contains multiple processes, it will result in all the processes being restarted eventually. Because the restart only happens when next request comes in, they all effectively get restarted in a cascade. This is because request hits first sub process and realises it will need to restart. Then try again and will likely hit next process in group which indicates it needs to restart as well and so on until hit process that has finished its restart. This is all transparent to user except for the slight delay due to restart of application. Graham