From paul at boddie.org.uk Thu May 1 01:59:05 2008 From: paul at boddie.org.uk (Paul Boddie) Date: Thu, 1 May 2008 01:59:05 +0200 Subject: [Web-SIG] Web Activities at EuroPython 2008? Message-ID: <200805010159.05109.paul@boddie.org.uk> Hello, It's not often that I find myself posting to the Web-SIG list these days, but I find myself almost obliged to ask whether anyone is considering submitting talk proposals about Web programming to the EuroPython 2008 conference (to be held in Vilnius, Lithuania from 7th July until 9th July, with sprinting possibilities from 10th July until 12th July). In my monitoring of the Internet for mentions of EuroPython, I see that there's a survey out there which is asking for opinions about a Zope conference [1], with questions related to EuroPython. Last year, there were quite a few Zope talks, although perhaps not at the level enjoyed when there was a special Zope track at EuroPython. A few other Web-related technologies did also get coverage in the schedule, however: FormEncode, Genshi, KSS, Nevow, Pylons, Silva, WSGI, to name just a few. Nevertheless, Web programming (including and beyond Zope) has always been a major component of EuroPython, and it would certainly be interesting to see talks describing what people are doing with Python on the Web, whether it be the development of classic server-side Web applications, the usage of Python on the client side, or even the management of infrastructure using Python - large-scale computing is becoming an increasingly popular topic. Anyway, details of talk submissions and other activities at the conference can be found here: http://www.europython.org/community/CallForParticipation And the EuroPython site can be found here: http://www.europython.org/ Yes, it's running MoinMoin - a possibly unfashionable choice (and arguably unpopular in certain circles) - but maybe even the MoinMoin developers might consider sharing some of their insights on developing and customising MoinMoin as a talk. ;-) I look forward to seeing many talk submissions of a Web-related kind! Paul [1] http://www.surveymonkey.com/s.aspx?sm=1QRKEu8eTs2gNjiYPOCsBA_3d_3d From manlio_perillo at libero.it Fri May 2 23:03:43 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 02 May 2008 23:03:43 +0200 Subject: [Web-SIG] [proposal] wsgiref.util.abs_url Message-ID: <481B81AF.7050300@libero.it> Hi. I think that a function like (not tested): def abs_url(environ, relative_url): """Return the absolute url""" url = environ['wsgi.url_scheme']+'://' from urllib import quote if environ.get('HTTP_HOST'): url += environ['HTTP_HOST'] else: url += environ['SERVER_NAME'] if environ['wsgi.url_scheme'] == 'https': if environ['SERVER_PORT'] != '443': url += ':' + environ['SERVER_PORT'] else: if environ['SERVER_PORT'] != '80': url += ':' + environ['SERVER_PORT'] url += quote(relative_url) return url would be an useful addition to the wsgiref.util module. What do you think? Thanks Manlio Perillo From pje at telecommunity.com Sun May 4 19:43:23 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 04 May 2008 13:43:23 -0400 Subject: [Web-SIG] [proposal] wsgiref.util.abs_url In-Reply-To: <481B81AF.7050300@libero.it> References: <481B81AF.7050300@libero.it> Message-ID: <20080504174919.4AB8C3A4036@sparrow.telecommunity.com> At 11:03 PM 5/2/2008 +0200, Manlio Perillo wrote: >Hi. > >I think that a function like (not tested): > >def abs_url(environ, relative_url): > """Return the absolute url""" [...] > url += quote(relative_url) > return url > >would be an useful addition to the wsgiref.util module. > > >What do you think? I think that it doesn't accept a relative URL, it accepts an absolute path. I also think that using urlparse.urljoin() with either request_uri() or application_uri() would be a clearer (and tested) way to obtain an absolute URL, and more generally useful. From manlio_perillo at libero.it Mon May 5 18:27:33 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 05 May 2008 18:27:33 +0200 Subject: [Web-SIG] [proposal] wsgiref.util.abs_url In-Reply-To: <20080504174919.4AB8C3A4036@sparrow.telecommunity.com> References: <481B81AF.7050300@libero.it> <20080504174919.4AB8C3A4036@sparrow.telecommunity.com> Message-ID: <481F3575.50800@libero.it> Phillip J. Eby ha scritto: > At 11:03 PM 5/2/2008 +0200, Manlio Perillo wrote: >> Hi. >> >> I think that a function like (not tested): >> >> def abs_url(environ, relative_url): >> """Return the absolute url""" > [...] >> url += quote(relative_url) >> return url >> >> would be an useful addition to the wsgiref.util module. >> >> >> What do you think? > > I think that it doesn't accept a relative URL, it accepts an absolute path. > What do you mean? environ = {} setup_testing_defaults(environ) url = '/a/b/' self.failUnlessEqual( util.abs_url(environ, url), 'http://127.0.0.1/a/b/') > I also think that using urlparse.urljoin() with either request_uri() or > application_uri() would be a clearer (and tested) way to obtain an > absolute URL, and more generally useful. > But application_uri also includes SCRIPT_NAME. Regards Manlio Perillo From pje at telecommunity.com Mon May 5 19:39:50 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 05 May 2008 13:39:50 -0400 Subject: [Web-SIG] [proposal] wsgiref.util.abs_url In-Reply-To: <481F3575.50800@libero.it> References: <481B81AF.7050300@libero.it> <20080504174919.4AB8C3A4036@sparrow.telecommunity.com> <481F3575.50800@libero.it> Message-ID: <20080505173931.D3BA13A4036@sparrow.telecommunity.com> At 06:27 PM 5/5/2008 +0200, Manlio Perillo wrote: >Phillip J. Eby ha scritto: >>I think that it doesn't accept a relative URL, it accepts an absolute path. > >What do you mean? > > environ = {} > setup_testing_defaults(environ) > > url = '/a/b/' That's a relative URL that's also an absolute path. Try a relative URL like './a/b', or just plain 'a/b'. > self.failUnlessEqual( > util.abs_url(environ, url), 'http://127.0.0.1/a/b/') > >>I also think that using urlparse.urljoin() with either >>request_uri() or application_uri() would be a clearer (and tested) >>way to obtain an absolute URL, and more generally useful. > >But application_uri also includes SCRIPT_NAME. Yes, and you might want to use it as the base against which a relative URL will be resolved -- i.e. an application-relative URL, vs. a request-relative URL. In fact, application_uri() would probably be *more* useful, since if you want a request-relative URL, there's no need to turn it into an absolute URL, since you could just use it in its relative form. Note, however, that in either case, using a relative URL that's an absolute path (e.g. '/a/b'), will still produce the same result as your function would. It's just that urljoin also works properly for all kinds of relative urls, not just the absolute-path subset. From cstawarz at csail.mit.edu Tue May 6 03:30:27 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Mon, 5 May 2008 21:30:27 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant Message-ID: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> (I'm new to the list, so please forgive me for making my first post a specification proposal :) Browsing through the list archives, I see there's been some inconclusive discussions on adding better support for asynchronous web servers to the WSGI spec. Since such support would be very useful for some upcoming projects of mine, I decided to take a shot at specing out and implementing it. I'd be grateful for any feedback you have. If this seems like something worth pursuing, I would also welcome collaborators to help develop the spec further. The name for this proposed specification is the Asynchronous Web Server Gateway Interface (AWSGI). As the name suggests, the spec is closely related to WSGI and is most easily described in terms of how it differs from WSGI. AWSGI eliminates the following parts of WSGI: - the environment variables wsgi.version and wsgi.input - the write() callable returned by start_response() AWSGI adds the following environment variables: - awsgi.version - awsgi.input - awsgi.readable - awsgi.writable - awsgi.timeout In addition, AWSGI allows the application iterable to yield two types of data: - byte strings, handled as in WSGI - the result of calling awsgi.readable or awsgi.writable, which indicates that the application should be paused and restarted when a specified file descriptor is ready for reading or writing Because of AWSGI's similarity to WSGI, a simple wrapper can be used to run AWSGI applications on WSGI servers without alteration. The following example application demonstrates typical usage of AWSGI. This application simply reads the request body and sends it back to the client. Each time it wants to receive data from the client, it first tests awsgi.input for readability and then calls its recv() method. If awsgi.input is not readable after one second, the application sends a "408 Request Timeout" response to the client and terminates: def echo_request_body(environ, start_response): input = environ['awsgi.input'] readable = environ['awsgi.readable'] nbytes = int(environ.get('CONTENT_LENGTH') or 0) output = '' while nbytes: yield readable(input, 1.0) # Time out after 1 second if environ['awsgi.timeout']: msg = 'The request timed out.' start_response('408 Request Timeout', [('Content-Type', 'text/plain'), ('Content-Length', str(len(msg)))]) yield msg return data = input.recv(nbytes) if not data: break output += data nbytes -= len(data) start_response('200 OK', [('Content-Type', 'text/plain'), ('Content-Length', str(len(output)))]) yield output I have rough but functional implementations of a number of AWSGI components available in a Bazaar branch at http://pseudogreen.org/bzr/awsgiref/. The package includes an asyncore-based AWSGI server and an AWSGI-to-WSGI application wrapper. In addition, the file spec.txt contains a more detailed description of the specification (which is also appended below). Again, I'd very much appreciate comments and criticism. Thanks, Chris Detailed AWSGI Specification ---------------------------- - Required AWSGI environ variables: * All variables required by WSGI, except for wsgi.version and wsgi.input, which must *not* be present * awsgi.version => the tuple (1, 0) * awsgi.input This is an object with one method, recv(bufsize), which behaves like the socket method of the same name (although it doesn't support the optional flags parameter). Before each call to recv(), the application must test awsgi.input for readability via awsgi.readable. The result of calling recv() without doing so is undefined. (XXX: Should recv() handle EINTR for the application?) * awsgi.readable * awsgi.writable These are callables with the signature f(fd, timeout=None). fd is either a file descriptor (i.e. int or long) or an object with a fileno() method that returns a file descriptor. timeout has the same semantics as the timeout parameter to select.select(). If the operation times out, awsgi.timeout will be true when the application resumes. In addition to checking readiness for reading or writing, servers should also monitor file descriptors for "exceptional" conditions (e.g. out-of-band data) and restart the application if they occur. * awsgi.timeout => boolean indicating whether the most recent read or write wait timed out (false if there have been no waits) - start_response() must *not* return a write() callable, as this method of providing application output to the server is incompatible with asynchronous execution. - The server must accept awsgi.input as input to awsgi.readable, either by providing an actual socket object or by special-case handling (i.e. awsgi.input needn't have a fileno() method, as long as the server handles it as if it did). - Applications return iterators, which can yield: * a string => sent to client, just as in standard WSGI * the result of a call to awsgi.readable or awsgi.writable => application is resumed when either the file descriptor is ready for reading/writing or the wait times out (in which case, awsgi.timeout will be true) - Although AWSGI applications will *not* be directly compatible with WSGI servers, middleware will allow them to run as standard WSGI apps (with all I/O waits returning immediately). - AWSGI servers will not support unmodified WSGI applications. There are several reasons for this: - If the app does blocking I/O, it will block the entire server. - Calls to the read() method of wsgi.input may fail with EWOULDBLOCK, which an app expecting synchronous I/O probably won't be prepared to deal with. - The readline(), readlines(), and __iter__() methods of wsgi.input can require multiple network I/O operations, which is incompatible with asynchronous execution. - The write() callable returned by start_response() is inherently incompatible with asynchronous execution. Because of these issues, this specification aims for one-way compatibility between AWSGI and WSGI (i.e. the ability to run AWSGI apps on WSGI servers via middleware, but not vice versa). From graham.dumpleton at gmail.com Tue May 6 04:09:33 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 6 May 2008 12:09:33 +1000 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> Message-ID: <88e286470805051909w53ed2491taff222c9645a1f17@mail.gmail.com> 2008/5/6 Christopher Stawarz : > (I'm new to the list, so please forgive me for making my first post a > specification proposal :) > > Browsing through the list archives, I see there's been some > inconclusive discussions on adding better support for asynchronous web > servers to the WSGI spec. Since such support would be very useful for > some upcoming projects of mine, I decided to take a shot at specing > out and implementing it. I'd be grateful for any feedback you have. > If this seems like something worth pursuing, I would also welcome > collaborators to help develop the spec further. > > The name for this proposed specification is the Asynchronous Web > Server Gateway Interface (AWSGI). As the name suggests, the spec is > closely related to WSGI and is most easily described in terms of how > it differs from WSGI. AWSGI eliminates the following parts of WSGI: > > - the environment variables wsgi.version and wsgi.input > > - the write() callable returned by start_response() > > AWSGI adds the following environment variables: > > - awsgi.version > - awsgi.input > - awsgi.readable > - awsgi.writable > - awsgi.timeout > > In addition, AWSGI allows the application iterable to yield two types > of data: > > - byte strings, handled as in WSGI > > - the result of calling awsgi.readable or awsgi.writable, which > indicates that the application should be paused and restarted when > a specified file descriptor is ready for reading or writing > > Because of AWSGI's similarity to WSGI, a simple wrapper can be used to > run AWSGI applications on WSGI servers without alteration. > > The following example application demonstrates typical usage of AWSGI. > This application simply reads the request body and sends it back to > the client. Each time it wants to receive data from the client, it > first tests awsgi.input for readability and then calls its recv() > method. If awsgi.input is not readable after one second, the > application sends a "408 Request Timeout" response to the client and > terminates: > > > def echo_request_body(environ, start_response): > input = environ['awsgi.input'] > readable = environ['awsgi.readable'] > > nbytes = int(environ.get('CONTENT_LENGTH') or 0) > output = '' > while nbytes: > yield readable(input, 1.0) # Time out after 1 second > > if environ['awsgi.timeout']: > msg = 'The request timed out.' > start_response('408 Request Timeout', > [('Content-Type', 'text/plain'), > ('Content-Length', str(len(msg)))]) > yield msg > return > > data = input.recv(nbytes) > if not data: > break > output += data > nbytes -= len(data) > > start_response('200 OK', [('Content-Type', 'text/plain'), > ('Content-Length', str(len(output)))]) > yield output > > > I have rough but functional implementations of a number of AWSGI > components available in a Bazaar branch at > http://pseudogreen.org/bzr/awsgiref/. The package includes an > asyncore-based AWSGI server and an AWSGI-to-WSGI application wrapper. > In addition, the file spec.txt contains a more detailed description of > the specification (which is also appended below). > > Again, I'd very much appreciate comments and criticism. > > > Thanks, > Chris > > > > > Detailed AWSGI Specification > ---------------------------- > > - Required AWSGI environ variables: > > * All variables required by WSGI, except for wsgi.version and > wsgi.input, which must *not* be present > > * awsgi.version => the tuple (1, 0) > > * awsgi.input > > This is an object with one method, recv(bufsize), which behaves > like the socket method of the same name (although it doesn't > support the optional flags parameter). Before each call to > recv(), the application must test awsgi.input for readability via > awsgi.readable. The result of calling recv() without doing so is > undefined. > > (XXX: Should recv() handle EINTR for the application?) > > * awsgi.readable > * awsgi.writable > > These are callables with the signature f(fd, timeout=None). fd is > either a file descriptor (i.e. int or long) or an object with a > fileno() method that returns a file descriptor. > > timeout has the same semantics as the timeout parameter to > select.select(). If the operation times out, awsgi.timeout will > be true when the application resumes. > > In addition to checking readiness for reading or writing, servers > should also monitor file descriptors for "exceptional" conditions > (e.g. out-of-band data) and restart the application if they occur. > > * awsgi.timeout => boolean indicating whether the most recent read > or write wait timed out (false if there have been no waits) > > - start_response() must *not* return a write() callable, as this > method of providing application output to the server is incompatible > with asynchronous execution. > > - The server must accept awsgi.input as input to awsgi.readable, > either by providing an actual socket object or by special-case > handling (i.e. awsgi.input needn't have a fileno() method, as long > as the server handles it as if it did). > > - Applications return iterators, which can yield: > > * a string => sent to client, just as in standard WSGI > > * the result of a call to awsgi.readable or awsgi.writable => > application is resumed when either the file descriptor is ready > for reading/writing or the wait times out (in which case, > awsgi.timeout will be true) > > - Although AWSGI applications will *not* be directly compatible with > WSGI servers, middleware will allow them to run as standard WSGI > apps (with all I/O waits returning immediately). > > - AWSGI servers will not support unmodified WSGI applications. There > are several reasons for this: > > - If the app does blocking I/O, it will block the entire server. > > - Calls to the read() method of wsgi.input may fail with > EWOULDBLOCK, which an app expecting synchronous I/O probably won't > be prepared to deal with. > > - The readline(), readlines(), and __iter__() methods of wsgi.input > can require multiple network I/O operations, which is incompatible > with asynchronous execution. > > - The write() callable returned by start_response() is inherently > incompatible with asynchronous execution. > > Because of these issues, this specification aims for one-way > compatibility between AWSGI and WSGI (i.e. the ability to run AWSGI > apps on WSGI servers via middleware, but not vice versa). No time to understand all this, but a few comments. If write() isn't to be returned by start_response(), then do away with start_response() if possible as per discussions for WSGI 2.0. See: http://www.wsgi.org/wsgi/WSGI_2.0 In other words, perhaps better aligning it to proposals for WSGI 2.0 and not to WSGI 1.0. Also take note of: http://www.wsgi.org/wsgi/Amendments_1.0 and think about how Python 3.0 would affect things. I'd also rather it not be called AWSGI as not sufficient distinct from WSGI. If you want to pursue this asynchronous style, then be more explicitly and call it ASYNC-WSGI and use 'asyncwsgi' tag in environ. Graham From manlio_perillo at libero.it Tue May 6 12:17:41 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 06 May 2008 12:17:41 +0200 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> Message-ID: <48203045.60504@libero.it> Christopher Stawarz ha scritto: > (I'm new to the list, so please forgive me for making my first post a > specification proposal :) > > Browsing through the list archives, I see there's been some > inconclusive discussions on adding better support for asynchronous web > servers to the WSGI spec. Since such support would be very useful for > some upcoming projects of mine, I decided to take a shot at specing > out and implementing it. I'd be grateful for any feedback you have. > If this seems like something worth pursuing, I would also welcome > collaborators to help develop the spec further. > I'm glad to know that there are some other people interested in asynchronous application, do you have seen my extensions to WSGI in my module for Nginx? The extension is documented here: http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/README see the Extensions chapter. For some examples: http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-poll-sleep.py Note that in Nginx the request body is pre-read before the application is called (in fact wsgi.input is either a cStringIO or File object). Unfortunately there is a *big* usability problem: the extension is based on a well specified feature of WSGI: the gateway can suspend the execution of the WSGI application when it yields. However if the asynchronous code is present in a "child" function, we have something like this: def application(environ, start_response): def nested(): while True: poll(xxx) yield '' yield result for r in nested(): if not r: yield '' yield r That is, all the functions in the "chain" have to yield, and is not very good. The solution is to use coroutines, and I'm planning to integrate greenlets (from the pylib project) into the WSGI module for Nginx. > [...] Regards Manlio Perillo From manlio_perillo at libero.it Tue May 6 12:40:51 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 06 May 2008 12:40:51 +0200 Subject: [Web-SIG] [proposal] wsgiref.util.abs_url In-Reply-To: <20080505173931.D3BA13A4036@sparrow.telecommunity.com> References: <481B81AF.7050300@libero.it> <20080504174919.4AB8C3A4036@sparrow.telecommunity.com> <481F3575.50800@libero.it> <20080505173931.D3BA13A4036@sparrow.telecommunity.com> Message-ID: <482035B3.1090906@libero.it> Phillip J. Eby ha scritto: > At 06:27 PM 5/5/2008 +0200, Manlio Perillo wrote: >> Phillip J. Eby ha scritto: >>> I think that it doesn't accept a relative URL, it accepts an absolute >>> path. >> >> What do you mean? >> >> environ = {} >> setup_testing_defaults(environ) >> >> url = '/a/b/' > > That's a relative URL that's also an absolute path. Try a relative URL > like './a/b', or just plain 'a/b'. > > > >> self.failUnlessEqual( >> util.abs_url(environ, url), 'http://127.0.0.1/a/b/') >> >>> I also think that using urlparse.urljoin() with either request_uri() >>> or application_uri() would be a clearer (and tested) way to obtain an >>> absolute URL, and more generally useful. >> >> But application_uri also includes SCRIPT_NAME. > > Yes, and you might want to use it as the base against which a relative > URL will be resolved -- i.e. an application-relative URL, vs. a > request-relative URL. In fact, application_uri() would probably be > *more* useful, since if you want a request-relative URL, there's no need > to turn it into an absolute URL, since you could just use it in its > relative form. > Yes, but this is not always the case. > Note, however, that in either case, using a relative URL that's an > absolute path (e.g. '/a/b'), will still produce the same result as your > function would. It's just that urljoin also works properly for all > kinds of relative urls, not just the absolute-path subset. > You are right, thanks. Regards Manlio Perillo From cstawarz at csail.mit.edu Tue May 6 23:37:09 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Tue, 6 May 2008 17:37:09 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> Message-ID: <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote: > If write() isn't to be returned by start_response(), then do away with > start_response() if possible as per discussions for WSGI 2.0. I think start_response() is necessary, because the application may need to yield for I/O readiness (e.g. to read the request body, as in my example app) before it decides what response status and headers to send. > Also take note of: > > http://www.wsgi.org/wsgi/Amendments_1.0 > > and think about how Python 3.0 would affect things. OK, will do. > I'd also rather it not be called AWSGI as not sufficient distinct from > WSGI. If you want to pursue this asynchronous style, then be more > explicitly and call it ASYNC-WSGI and use 'asyncwsgi' tag in environ. Good point. It'd be easy to type "wsgi" when you meant "awsgi", or vice versa. But I think I'd prefer "wsgi_async" to "asyncwsgi". Thanks, Chris From cstawarz at csail.mit.edu Wed May 7 00:01:19 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Tue, 6 May 2008 18:01:19 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <48203045.60504@libero.it> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <48203045.60504@libero.it> Message-ID: <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu> On May 6, 2008, at 6:17 AM, Manlio Perillo wrote: > I'm glad to know that there are some other people interested in > asynchronous application, do you have seen my extensions to WSGI in > my module for Nginx? Yes, I have, and I had your module in mind as a potential provider of the AWSGI interface. > Note that in Nginx the request body is pre-read before the > application is called (in fact wsgi.input is either a cStringIO or > File object). Although I didn't state it explicitly in my spec, my intention is for the server to be able to implement awsgi.input in any way it likes, as long as it provides a recv() method. It's totally acceptable for the request body to be pre-read. > Unfortunately there is a *big* usability problem: the extension is > based on a well specified feature of WSGI: the gateway can suspend > the execution of the WSGI application when it yields. > > However if the asynchronous code is present in a "child" function, > we have something like this: > ... > That is, all the functions in the "chain" have to yield, and is not > very good. Yes, you're right. However, if you're willing/able to use Python 2.5, you can use the new features of generators to implement a call stack that lets you call child functions and receive return values and exceptions from them. I've implemented this in awsgiref.callstack. Have a look at http://pseudogreen.org/bzr/awsgiref/examples/echo_request_with_callstack.py for an example of how it works. > The solution is to use coroutines, and I'm planning to integrate > greenlets (from the pylib project) into the WSGI module for Nginx. Interesting, but it's not clear to me how/if this would work. Can you explain more or point me to some code? Thanks, Chris From graham.dumpleton at gmail.com Wed May 7 01:02:49 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Wed, 7 May 2008 09:02:49 +1000 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> Message-ID: <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> 2008/5/7 Christopher Stawarz : > On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote: > > > > If write() isn't to be returned by start_response(), then do away with > > start_response() if possible as per discussions for WSGI 2.0. > > I think start_response() is necessary, because the application may need to > yield for I/O readiness (e.g. to read the request body, as in my example > app) before it decides what response status and headers to send. One could come up with other ways of doing it which aligns better with WSGI 2.0. I previously gave an idea as a starting point for discussion, but don't think others really understood what I was suggesting. But then I did post it at 4am in the morning in the middle of a baby induced period of sleep deprivation. See post 24 in: http://groups.google.com/group/python-web-sig/tree/browse_frm/thread/74c1f8cf15adf114/d98086a8db568ebd?rnum=24 I think what was missed by others was that I wasn't suggest that the 102 code be sent all the way back to the client, but as a convention between WSGI application and underlying WSGI adapter only, to facilitate the ability to return control back to the WSGI adapter before one had decided what actual response headers to send. This seems to align with what you want. Graham From ionel.mc at gmail.com Wed May 7 02:51:13 2008 From: ionel.mc at gmail.com (Ionel Maries Cristian) Date: Wed, 7 May 2008 03:51:13 +0300 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> Message-ID: This is a very interesting initiative. However there are few problems: - there is no support for chunked input - that would require having support for readline in the first place, also, it should be the gateway's business decoding the chunked input. - the original wsgi spec somewhat has some support for streaming and asynchronicity [*1] - i don't see how removing the write callable will help (i don't see a issue having the server providing a stringio.write as the write callable for synchronous apps) - passing nonstring values though middleware will make using/porting existing wsgi middleware hairy (suppose you have a middleware that applies some filter to the appiter - you'll have your code full of isinstance nastiness) Also, have you looked at the existing gateway implementations with asynchronous support? There are a bunch of them: http://trac.wiretooth.com/public/wiki/asycwsgi http://chiral.j4cbo.com/trac http://wiki.secondlife.com/wiki/Eventlet my own shot at the problem: http://code.google.com/p/cogen/ and manlio's mod_wsgi for nginx (I may be missing some) However there is absolutely no unity in handling the wsgi.input (or equivalent) [*1]In my implementation i do a bunch of tricks to make use of regular wsgi middleware with async apps possible - i have a bunch of working examples using pylons: - the extensions in the environ (like your environ['awsgi.readable']) return a empty string that penetrates most[*2] middleware and set the actual message (like your (token, fd, timeout) tuple on some internal object) >From this point of view, an async middleware stack is just a set of middleware that supports streaming. Please see: http://cogen.googlecode.com/svn/trunk/docs/cogen.web.async.html http://cogen.googlecode.com/svn/trunk/docs/cogen.web.wsgi.html [*2] middleware that consume the app iter ruin that pattern, but regardless, they are not compliant to the wsgi spec (see http://www.python.org/dev/peps/pep-0333/#middleware-handling-of-block-boundaries ) - notable examples are most of the exception handling middleware (they can't work otherwise anyway) On Tue, May 6, 2008 at 4:30 AM, Christopher Stawarz wrote: > (I'm new to the list, so please forgive me for making my first post a > specification proposal :) > > Browsing through the list archives, I see there's been some > inconclusive discussions on adding better support for asynchronous web > servers to the WSGI spec. Since such support would be very useful for > some upcoming projects of mine, I decided to take a shot at specing > out and implementing it. I'd be grateful for any feedback you have. > If this seems like something worth pursuing, I would also welcome > collaborators to help develop the spec further. > > The name for this proposed specification is the Asynchronous Web > Server Gateway Interface (AWSGI). As the name suggests, the spec is > closely related to WSGI and is most easily described in terms of how > it differs from WSGI. AWSGI eliminates the following parts of WSGI: > > - the environment variables wsgi.version and wsgi.input > > - the write() callable returned by start_response() > > AWSGI adds the following environment variables: > > - awsgi.version > - awsgi.input > - awsgi.readable > - awsgi.writable > - awsgi.timeout > > In addition, AWSGI allows the application iterable to yield two types > of data: > > - byte strings, handled as in WSGI > > - the result of calling awsgi.readable or awsgi.writable, which > indicates that the application should be paused and restarted when > a specified file descriptor is ready for reading or writing > > Because of AWSGI's similarity to WSGI, a simple wrapper can be used to > run AWSGI applications on WSGI servers without alteration. > > The following example application demonstrates typical usage of AWSGI. > This application simply reads the request body and sends it back to > the client. Each time it wants to receive data from the client, it > first tests awsgi.input for readability and then calls its recv() > method. If awsgi.input is not readable after one second, the > application sends a "408 Request Timeout" response to the client and > terminates: > > > def echo_request_body(environ, start_response): > input = environ['awsgi.input'] > readable = environ['awsgi.readable'] > > nbytes = int(environ.get('CONTENT_LENGTH') or 0) > output = '' > while nbytes: > yield readable(input, 1.0) # Time out after 1 second > > if environ['awsgi.timeout']: > msg = 'The request timed out.' > start_response('408 Request Timeout', > [('Content-Type', 'text/plain'), > ('Content-Length', str(len(msg)))]) > yield msg > return > > data = input.recv(nbytes) > if not data: > break > output += data > nbytes -= len(data) > > start_response('200 OK', [('Content-Type', 'text/plain'), > ('Content-Length', str(len(output)))]) > yield output > > > I have rough but functional implementations of a number of AWSGI > components available in a Bazaar branch at > http://pseudogreen.org/bzr/awsgiref/. The package includes an > asyncore-based AWSGI server and an AWSGI-to-WSGI application wrapper. > In addition, the file spec.txt contains a more detailed description of > the specification (which is also appended below). > > Again, I'd very much appreciate comments and criticism. > > > Thanks, > Chris > > > > > Detailed AWSGI Specification > ---------------------------- > > - Required AWSGI environ variables: > > * All variables required by WSGI, except for wsgi.version and > wsgi.input, which must *not* be present > > * awsgi.version => the tuple (1, 0) > > * awsgi.input > > This is an object with one method, recv(bufsize), which behaves > like the socket method of the same name (although it doesn't > support the optional flags parameter). Before each call to > recv(), the application must test awsgi.input for readability via > awsgi.readable. The result of calling recv() without doing so is > undefined. > > (XXX: Should recv() handle EINTR for the application?) > > * awsgi.readable > * awsgi.writable > > These are callables with the signature f(fd, timeout=None). fd is > either a file descriptor (i.e. int or long) or an object with a > fileno() method that returns a file descriptor. > > timeout has the same semantics as the timeout parameter to > select.select(). If the operation times out, awsgi.timeout will > be true when the application resumes. > > In addition to checking readiness for reading or writing, servers > should also monitor file descriptors for "exceptional" conditions > (e.g. out-of-band data) and restart the application if they occur. > > * awsgi.timeout => boolean indicating whether the most recent read > or write wait timed out (false if there have been no waits) > > - start_response() must *not* return a write() callable, as this > method of providing application output to the server is incompatible > with asynchronous execution. > > - The server must accept awsgi.input as input to awsgi.readable, > either by providing an actual socket object or by special-case > handling (i.e. awsgi.input needn't have a fileno() method, as long > as the server handles it as if it did). > > - Applications return iterators, which can yield: > > * a string => sent to client, just as in standard WSGI > > * the result of a call to awsgi.readable or awsgi.writable => > application is resumed when either the file descriptor is ready > for reading/writing or the wait times out (in which case, > awsgi.timeout will be true) > > - Although AWSGI applications will *not* be directly compatible with > WSGI servers, middleware will allow them to run as standard WSGI > apps (with all I/O waits returning immediately). > > - AWSGI servers will not support unmodified WSGI applications. There > are several reasons for this: > > - If the app does blocking I/O, it will block the entire server. > > - Calls to the read() method of wsgi.input may fail with > EWOULDBLOCK, which an app expecting synchronous I/O probably won't > be prepared to deal with. > > - The readline(), readlines(), and __iter__() methods of wsgi.input > can require multiple network I/O operations, which is incompatible > with asynchronous execution. > > - The write() callable returned by start_response() is inherently > incompatible with asynchronous execution. > > Because of these issues, this specification aims for one-way > compatibility between AWSGI and WSGI (i.e. the ability to run AWSGI > apps on WSGI servers via middleware, but not vice versa). > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com > -- http://ionelmc.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From manlio_perillo at libero.it Wed May 7 09:59:47 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 07 May 2008 09:59:47 +0200 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> Message-ID: <48216173.7000502@libero.it> Graham Dumpleton ha scritto: > 2008/5/7 Christopher Stawarz : >> On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote: >> >> >>> If write() isn't to be returned by start_response(), then do away with >>> start_response() if possible as per discussions for WSGI 2.0. >> I think start_response() is necessary, because the application may need to >> yield for I/O readiness (e.g. to read the request body, as in my example >> app) before it decides what response status and headers to send. > > One could come up with other ways of doing it which aligns better with > WSGI 2.0. I previously gave an idea as a starting point for > discussion, but don't think others really understood what I was > suggesting. But then I did post it at 4am in the morning in the middle > of a baby induced period of sleep deprivation. See post 24 in: > > http://groups.google.com/group/python-web-sig/tree/browse_frm/thread/74c1f8cf15adf114/d98086a8db568ebd?rnum=24 > > I think what was missed by others was that I wasn't suggest that the > 102 code be sent all the way back to the client, but as a convention > between WSGI application and underlying WSGI adapter only, to > facilitate the ability to return control back to the WSGI adapter > before one had decided what actual response headers to send. This > seems to align with what you want. > Its seems a bit more complex to implement then the start_callable. Moreover the whole point of removing the start_callable is to simplify the writing of middlewares. With your solution it seems that writing middlewares will not became more easy. > Graham Manlio Perillo From manlio_perillo at libero.it Wed May 7 10:20:20 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 07 May 2008 10:20:20 +0200 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> Message-ID: <48216644.7020300@libero.it> Ionel Maries Cristian ha scritto: > This is a very interesting initiative. > > However there are few problems: > - there is no support for chunked input - that would require having > support for readline in the first place, also, it should be the > gateway's business decoding the chunked input. Unfortunately Nginx does not yet support chunked input, so I can't help here. > - the original wsgi spec somewhat has some support for streaming and > asynchronicity [*1] Right, and in fact I have used this for the implementation of some extensions in the WSGI module for Nginx. > - i don't see how removing the write callable will help (i don't see a > issue having the server providing a stringio.write as the write callable > for synchronous apps) To summarize: the main problem with the write callable is that after you call it control is not returned to the WSGI gateway. With an asynchronous server it is a problem since if you write a lot of data the server may not be able to send it to the client. This is not a problem if the application returns a generator, since the gateway can suspend the execution until the socket is ready to send data. With the write callable this is not possible, In my implementation of WSGI for Nginx I provide two separate implementation of the write callable: - put the socket temporary in synchronous mode (this is WSGI compliant but it is very bad for Nginx) - buffer all the written data until control is returned to the gateway (this is *not* WSGI compliant) However if you use greenlets, then implementing the write callable is not a problem. > - passing nonstring values though middleware will make using/porting > existing wsgi middleware hairy (suppose you have a middleware that > applies some filter to the appiter - you'll have your code full of > isinstance nastiness) > Yes, this should be avoided. > Also, have you looked at the existing gateway implementations with > asynchronous support? > There are a bunch of them: > http://trac.wiretooth.com/public/wiki/asycwsgi > http://chiral.j4cbo.com/trac > http://wiki.secondlife.com/wiki/Eventlet > my own shot at the problem: http://code.google.com/p/cogen/ > and manlio's mod_wsgi for nginx > (I may be missing some) > > However there is absolutely no unity in handling the wsgi.input (or > equivalent) > The wsgi.input can be handled with ngx.poll: c = ngx.connection_wrapper(wsgi.input) ... ngx.poll_register(c, WSGI_POLLIN) ... ngx.poll(1000) Unfortunately I can not test if this is implementable. I have some doubts. > [...] Manlio Perillo From graham.dumpleton at gmail.com Wed May 7 10:20:52 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Wed, 7 May 2008 18:20:52 +1000 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <48216173.7000502@libero.it> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> <48216173.7000502@libero.it> Message-ID: <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com> 2008/5/7 Manlio Perillo : > Graham Dumpleton ha scritto: > > > > > 2008/5/7 Christopher Stawarz : > > > > > On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote: > > > > > > > > > > > > > If write() isn't to be returned by start_response(), then do away with > > > > start_response() if possible as per discussions for WSGI 2.0. > > > > > > > I think start_response() is necessary, because the application may need > to > > > yield for I/O readiness (e.g. to read the request body, as in my example > > > app) before it decides what response status and headers to send. > > > > > > > One could come up with other ways of doing it which aligns better with > > WSGI 2.0. I previously gave an idea as a starting point for > > discussion, but don't think others really understood what I was > > suggesting. But then I did post it at 4am in the morning in the middle > > of a baby induced period of sleep deprivation. See post 24 in: > > > > > http://groups.google.com/group/python-web-sig/tree/browse_frm/thread/74c1f8cf15adf114/d98086a8db568ebd?rnum=24 > > > > I think what was missed by others was that I wasn't suggest that the > > 102 code be sent all the way back to the client, but as a convention > > between WSGI application and underlying WSGI adapter only, to > > facilitate the ability to return control back to the WSGI adapter > > before one had decided what actual response headers to send. This > > seems to align with what you want. > > > > > > Its seems a bit more complex to implement then the start_callable. > > Moreover the whole point of removing the start_callable is to simplify the > writing of middlewares. > > With your solution it seems that writing middlewares will not became more > easy. Part of what I was trying to say was that this needn't be exposed to middlewares, unless it has to be. It was effectively a lower level of interaction which a middleware immediately on top of the WSGI adapter would use to hook into the async type model, but then present it to higher levels as more traditional WSGI interface. That layer would though obviously use something like greenlets to bridge the two. So, a way of bringing the control of that bridge into the Python level, rather than it being interwined and non separable from the underlying WSGI adapter. As I said, it was 4am, so probably didn't explain it very well. :-) Graham From manlio_perillo at libero.it Wed May 7 10:44:23 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 07 May 2008 10:44:23 +0200 Subject: [Web-SIG] WSGI and greenlets In-Reply-To: <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <48203045.60504@libero.it> <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu> Message-ID: <48216BE7.5010000@libero.it> Christopher Stawarz ha scritto: > On May 6, 2008, at 6:17 AM, Manlio Perillo wrote: > >> I'm glad to know that there are some other people interested in >> asynchronous application, do you have seen my extensions to WSGI in my >> module for Nginx? > > Yes, I have, and I had your module in mind as a potential provider of > the AWSGI interface. > >> Note that in Nginx the request body is pre-read before the application >> is called (in fact wsgi.input is either a cStringIO or File object). > > Although I didn't state it explicitly in my spec, my intention is for > the server to be able to implement awsgi.input in any way it likes, as > long as it provides a recv() method. It's totally acceptable for the > request body to be pre-read. > Ok. But what I meant was that since Nginx pre-read the request body I have not tried to implement an interface for dealing with an asynchronous wsgi.input ;-). Moreover I don't see any readons to have a revc method instead of read. >> Unfortunately there is a *big* usability problem: the extension is >> based on a well specified feature of WSGI: the gateway can suspend the >> execution of the WSGI application when it yields. >> >> However if the asynchronous code is present in a "child" function, we >> have something like this: >> ... >> That is, all the functions in the "chain" have to yield, and is not >> very good. > > Yes, you're right. However, if you're willing/able to use Python 2.5, > you can use the new features of generators to implement a call stack > that lets you call child functions and receive return values and > exceptions from them. I've implemented this in awsgiref.callstack. > Have a look at > > > http://pseudogreen.org/bzr/awsgiref/examples/echo_request_with_callstack.py > > for an example of how it works. > I don't think this will solve the problem. Moreover in your example you buffer the whole request body so that you have to yield only one time. >> The solution is to use coroutines, and I'm planning to integrate >> greenlets (from the pylib project) into the WSGI module for Nginx. > > Interesting, but it's not clear to me how/if this would work. Can you > explain more or point me to some code? > http://codespeak.net/py/dist/greenlet.html def process_commands(*args): while True: line = '' while not line.endswith('\n'): line += read_next_char() if line == 'quit\n': print "are you sure?" if read_next_char() != 'y': continue # ignore the command process_command(line) With greenlets the execution can be suspened by any of the functions called by the main greelet. This has a lot of advantages. You can implement wsgi.input.read(n) so that it will suspend the execution of the current greenlet until *all* the n bytes have been read. You can also implement the write callable so that control is returned to the main greelet when the socket is ready to send more data. And, of course, you can implement a poll like interface and a sleep like interface. I think that it is a great advantage, moreover it is the only way to implement truly reusable components. Note that there is an effort of integrating greenlets with Twisted: http://radix.twistedmatrix.com/2008/03/corotwine-01.html The "problem" is that once you add support to greenlets, you have no more WSGI. The interface can be the same, and applications can work on it without problems, but the semantic is *completely* different. Also note that with greenlets should be possible to "magically" transform blocking applications like Django to non blocking. The main problem I see with greenlet is that is is not yet stable (there are some problems with the garbage collector) and that is is not part of CPython. This means that it can be not acceptable to write a PEP for a WSGI like interface with coroutine support. > > Thanks, > Chris > Regards Manlio Perillo From manlio_perillo at libero.it Wed May 7 11:23:04 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 07 May 2008 11:23:04 +0200 Subject: [Web-SIG] WSGI and greenlets In-Reply-To: <48216BE7.5010000@libero.it> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <48203045.60504@libero.it> <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu> <48216BE7.5010000@libero.it> Message-ID: <482174F8.6080600@libero.it> Manlio Perillo ha scritto: > [...] > The main problem I see with greenlet is that is is not yet stable (there > are some problems with the garbage collector) and that is is not part of > CPython. > > This means that it can be not acceptable to write a PEP for a WSGI like > interface with coroutine support. > Maybe a solution can be to add a new variable to the WSGI environ: wsgi.microthreads When it is true it means that the WSGI implementation will execute the application inside a micro thread (may it be stackless, greenlet, pypy coroutine). Also note that when using coroutines there will be no problems with WSGI 2.0. However I still think that we should release a WSGI 1.1 since many applications still use and will continue to use WSGI 1.x and a gateway will have to support WSGI 1.x in order to support both WSGI 1.x and 2.x Regards Manlio Perillo From cstawarz at csail.mit.edu Wed May 7 20:00:21 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Wed, 7 May 2008 14:00:21 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> <48216173.7000502@libero.it> <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com> Message-ID: <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu> On May 7, 2008, at 4:20 AM, Graham Dumpleton wrote: > 2008/5/7 Manlio Perillo : >> With your solution it seems that writing middlewares will not >> became more >> easy. > > Part of what I was trying to say was that this needn't be exposed to > middlewares, unless it has to be. It was effectively a lower level of > interaction which a middleware immediately on top of the WSGI adapter > would use to hook into the async type model, but then present it to > higher levels as more traditional WSGI interface. That would be a really elegant solution, except, as you say: > That layer would > though obviously use something like greenlets to bridge the two. The problem being that greenlets aren't part of the Python language. They're an extension that works by doing clever stuff with the C stack. And as much as we might wish that Python supported them natively (which I do, since they're a really nice alternative to OS threads), it doesn't, so I don't think they can play any role in a WSGI-ASYNC spec. Chris From manlio_perillo at libero.it Wed May 7 20:12:12 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 07 May 2008 20:12:12 +0200 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> <48216173.7000502@libero.it> <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com> <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu> Message-ID: <4821F0FC.2090302@libero.it> Christopher Stawarz ha scritto: > On May 7, 2008, at 4:20 AM, Graham Dumpleton wrote: > >> 2008/5/7 Manlio Perillo : >>> With your solution it seems that writing middlewares will not became >>> more >>> easy. >> >> Part of what I was trying to say was that this needn't be exposed to >> middlewares, unless it has to be. It was effectively a lower level of >> interaction which a middleware immediately on top of the WSGI adapter >> would use to hook into the async type model, but then present it to >> higher levels as more traditional WSGI interface. > > That would be a really elegant solution, except, as you say: > >> That layer would >> though obviously use something like greenlets to bridge the two. > > The problem being that greenlets aren't part of the Python language. > They're an extension that works by doing clever stuff with the C stack. > And as much as we might wish that Python supported them natively (which > I do, since they're a really nice alternative to OS threads), it > doesn't, so I don't think they can play any role in a WSGI-ASYNC spec. > This is not fully true, after all WSGI explicitly exposes the concept of processes and threads (via the relative variable in the WSGI environ and some hints in the specification) and these are not really part of the Python Language. > > Chris > Manlio Perillo From cstawarz at csail.mit.edu Wed May 7 21:00:10 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Wed, 7 May 2008 15:00:10 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> Message-ID: <4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu> On May 6, 2008, at 8:51 PM, Ionel Maries Cristian wrote: > - there is no support for chunked input - that would require having > support for readline in the first place, Why is readline a requirement for chunked input? Each chunk specifies its size, and the application receiving a chunk just keeps calling recv() until it's read the specified number of bytes. > also, it should be the gateway's business decoding the chunked input. OK, but if it's the gateway's responsibility, then this isn't an issue at all, as decoding of chunked data takes place before the application ever sees the request body. To be clear, I didn't mean to imply that awsgi.input must be the actual socket object connected to the client. It just has to provide a recv() method with the semantics of a socket. The server is free to pre-read the entire request, or it can receive data on demand, decoding any chunked input before it passes it to the application. > - i don't see how removing the write callable will help (i don't see > a issue having the server providing a stringio.write as the write > callable for synchronous apps) Manlio explained this well, so I'll refer you to his response. > - passing nonstring values though middleware will make using/porting > existing wsgi middleware hairy (suppose you have a middleware that > applies some filter to the appiter - you'll have your code full of > isinstance nastiness) Yes, my proposal would require existing middleware to be modified to support AWSGI, which is unfortunate. > Also, have you looked at the existing gateway implementations with > asynchronous support? > There are a bunch of them: > http://trac.wiretooth.com/public/wiki/asycwsgi > http://chiral.j4cbo.com/trac > http://wiki.secondlife.com/wiki/Eventlet > my own shot at the problem: http://code.google.com/p/cogen/ > and manlio's mod_wsgi for nginx > (I may be missing some) I've seen some of these, but I'll be sure to take a look at the others. > [*1]In my implementation i do a bunch of tricks to make use of > regular wsgi middleware with async apps possible - i have a bunch of > working examples using pylons: > - the extensions in the environ (like your > environ['awsgi.readable']) return a empty string that penetrates > most[*2] middleware and set the actual message (like your (token, > fd, timeout) tuple on some internal object) > From this point of view, an async middleware stack is just a set of > middleware that supports streaming. This is an interesting idea that I'd like to explore some more. I really like the fact that it works with existing middleware (or at least fully WSGI-compliant middleware, as you point out). Apart from the write() callable, the biggest issue I see with the WSGI spec for asynchronous servers is wsgi.input. The problem is that this is explicitly a file-like object. This means that input.read(n) reads until it finds n bytes or EOF, input.readline() reads until it finds a newline or EOF, and input.readlines() and input.__iter__() always read to EOF. Every one of these functions implies multiple I/O operations (calls to fread() for a file or recv() for a socket). This means that if an application calls input.read(8), and only 4 bytes are available, the first call to recv() returns 4 bytes, and the second one blocks. And now your entire server is blocked until data is available on this one socket. (Of course, the server is free to pre-read the entire request at its leisure and feed it to the application from a buffer, but this may not always be practical or desirable, and I don't think asynchronous servers should be forced to do so.) This is why I propose replacing wsgi.input with awsgi.input, which exposes a recv() method with socket-like (rather than file-like) semantics. The meaning of input.recv(n) is therefore "read at most n bytes (possibly less), calling the underlying socket recv() at most one time". So, although your suggestion may eliminate the need to yield non- string output from the application iterable, I still think there needs to be a separate specification for asynchronous gateways, since the semantics of wsgi.input just aren't compatible with an asynchronous model. Chris From duncan.mcgreggor at gmail.com Wed May 7 21:35:31 2008 From: duncan.mcgreggor at gmail.com (Duncan McGreggor) Date: Wed, 07 May 2008 14:35:31 -0500 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> <48216173.7000502@libero.it> <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com> <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu> Message-ID: <1210188931.4546.14.camel@gondor> On Wed, 2008-05-07 at 14:00 -0400, Christopher Stawarz wrote: > On May 7, 2008, at 4:20 AM, Graham Dumpleton wrote: > > > 2008/5/7 Manlio Perillo : > >> With your solution it seems that writing middlewares will not > >> became more > >> easy. > > > > Part of what I was trying to say was that this needn't be exposed to > > middlewares, unless it has to be. It was effectively a lower level of > > interaction which a middleware immediately on top of the WSGI adapter > > would use to hook into the async type model, but then present it to > > higher levels as more traditional WSGI interface. > > That would be a really elegant solution, except, as you say: > > > That layer would > > though obviously use something like greenlets to bridge the two. > > The problem being that greenlets aren't part of the Python language. > They're an extension that works by doing clever stuff with the C > stack. And as much as we might wish that Python supported them > natively (which I do, since they're a really nice alternative to OS > threads), it doesn't, so I don't think they can play any role in a > WSGI-ASYNC spec. It's my understanding that greenlets are python, not C. Are you thinking of tasklets in stackless? d From cstawarz at csail.mit.edu Wed May 7 21:54:59 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Wed, 7 May 2008 15:54:59 -0400 Subject: [Web-SIG] WSGI and greenlets In-Reply-To: <48216BE7.5010000@libero.it> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <48203045.60504@libero.it> <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu> <48216BE7.5010000@libero.it> Message-ID: <9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu> On May 7, 2008, at 4:44 AM, Manlio Perillo wrote: > Moreover I don't see any readons to have a revc method instead of > read. I just wanted to emphasize that its behavior is socket-like, not file- like. It could be called read as long as its behavior is made clear to application developers. >>> Unfortunately there is a *big* usability problem: the extension is >>> based on a well specified feature of WSGI: the gateway can suspend >>> the execution of the WSGI application when it yields. >>> >>> However if the asynchronous code is present in a "child" function, >>> we have something like this: >>> ... >>> That is, all the functions in the "chain" have to yield, and is >>> not very good. >> Yes, you're right. However, if you're willing/able to use Python >> 2.5, you can use the new features of generators to implement a call >> stack that lets you call child functions and receive return values >> and exceptions from them. I've implemented this in >> awsgiref.callstack. Have a look at >> http://pseudogreen.org/bzr/awsgiref/examples/echo_request_with_callstack.py >> for an example of how it works. > > I don't think this will solve the problem. > Moreover in your example you buffer the whole request body so that > you have to yield only one time. Your example was: def application(environ, start_response): def nested(): while True: poll(xxx) yield '' yield result for r in nested(): if not r: yield '' yield r My suggestion would allow you to rewrite this like so: @awsgiref.callstack.add_callstack def application(environ, start_response): def nested(): while True: poll(xxx) yield '' yield result yield nested() The nesting can be arbitrarily deep, so nested() could yield doubly_nested() and so on. While not as elegant as greenlets, I think this does address your concern. > The main problem I see with greenlet is that is is not yet stable > (there are some problems with the garbage collector) and that is is > not part of CPython. > > This means that it can be not acceptable to write a PEP for a WSGI > like interface with coroutine support. This is the problem I see with greenlets, too. If they were part of the stdlib, it'd be a different story, but as things stand, I don't think they should be part of the spec. Chris From cstawarz at csail.mit.edu Wed May 7 22:06:37 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Wed, 7 May 2008 16:06:37 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <1210188931.4546.14.camel@gondor> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com> <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu> <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com> <48216173.7000502@libero.it> <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com> <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu> <1210188931.4546.14.camel@gondor> Message-ID: <7F84F89F-2F0A-4556-973E-668034339267@csail.mit.edu> On May 7, 2008, at 3:35 PM, Duncan McGreggor wrote: > It's my understanding that greenlets are python, not C. Are you > thinking > of tasklets in stackless? The version for CPython is a C extension module. Have a look at the comments in http://svn.red-bean.com/bob/greenlet/trunk/greenlet.c The switching is accomplished by saving and restoring chunks of the C stack, which I find both extremely clever and kind of scary :) Chris From ionel.mc at gmail.com Wed May 7 23:36:56 2008 From: ionel.mc at gmail.com (Ionel Maries Cristian) Date: Thu, 8 May 2008 00:36:56 +0300 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu> Message-ID: On Wed, May 7, 2008 at 10:00 PM, Christopher Stawarz < cstawarz at csail.mit.edu> wrote: > On May 6, 2008, at 8:51 PM, Ionel Maries Cristian wrote: > > > - there is no support for chunked input - that would require having > > support for readline in the first place, > > > Why is readline a requirement for chunked input? Each chunk specifies its > size, and the application receiving a chunk just keeps calling recv() until > it's read the specified number of bytes. > Well, not really a requirement, i was implying there is some sort of readline since that is what one would generaly use some sort of realine to get the size of a chunk - but not necessarily. > also, it should be the gateway's business decoding the chunked input. > > > OK, but if it's the gateway's responsibility, then this isn't an issue at > all, as decoding of chunked data takes place before the application ever > sees the request body. > To be clear, I didn't mean to imply that awsgi.input must be the actual > socket object connected to the client. It just has to provide a recv() > method with the semantics of a socket. The server is free to pre-read the > entire request, or it can receive data on demand, decoding any chunked input > before it passes it to the application. > > - i don't see how removing the write callable will help (i don't see a > > issue having the server providing a stringio.write as the write callable for > > synchronous apps) > > > Manlio explained this well, so I'll refer you to his response. > > > - passing nonstring values though middleware will make using/porting > > existing wsgi middleware hairy (suppose you have a middleware that applies > > some filter to the appiter - you'll have your code full of isinstance > > nastiness) > > > Yes, my proposal would require existing middleware to be modified to > support AWSGI, which is unfortunate. > > > Also, have you looked at the existing gateway implementations with > > asynchronous support? There are a bunch of them: > > http://trac.wiretooth.com/public/wiki/asycwsgi > > http://chiral.j4cbo.com/trac > > http://wiki.secondlife.com/wiki/Eventlet > > my own shot at the problem: http://code.google.com/p/cogen/ > > and manlio's mod_wsgi for nginx > > (I may be missing some) > > > I've seen some of these, but I'll be sure to take a look at the others. > > > [*1]In my implementation i do a bunch of tricks to make use of regular > > wsgi middleware with async apps possible - i have a bunch of working > > examples using pylons: - the extensions in the environ (like your > > environ['awsgi.readable']) return a empty string that penetrates most[*2] > > middleware and set the actual message (like your (token, fd, timeout) tuple > > on some internal object) > > From this point of view, an async middleware stack is just a set of > > middleware that supports streaming. > > > This is an interesting idea that I'd like to explore some more. I really > like the fact that it works with existing middleware (or at least fully > WSGI-compliant middleware, as you point out). > Apart from the write() callable, the biggest issue I see with the WSGI > spec for asynchronous servers is wsgi.input. The problem is that this is > explicitly a file-like object. This means that input.read(n) reads until it > finds n bytes or EOF, input.readline() reads until it finds a newline or > EOF, and input.readlines() and input.__iter__() always read to EOF. Every > one of these functions implies multiple I/O operations (calls to fread() for > a file or recv() for a socket). > This means that if an application calls input.read(8), and only 4 bytes > are available, the first call to recv() returns 4 bytes, and the second one > blocks. And now your entire server is blocked until data is available on > this one socket. (Of course, the server is free to pre-read the entire > request at its leisure and feed it to the application from a buffer, but > this may not always be practical or desirable, and I don't think > asynchronous servers should be forced to do so.) > This is why I propose replacing wsgi.input with awsgi.input, which > exposes a recv() method with socket-like (rather than file-like) semantics. > The meaning of input.recv(n) is therefore "read at most n bytes (possibly > less), calling the underlying socket recv() at most one time". > So, although your suggestion may eliminate the need to yield non-string > output from the application iterable, I still think there needs to be a > separate specification for asynchronous gateways, since the semantics of > wsgi.input just aren't compatible with an asynchronous model. > Chris > The way I see it asynchronous wsgi is just a matter of deciding how to handle the input asynchronously - a asynchronous input wsgi extension specification. So I suggest completely dropping the idea of a incompatibility between async_wsgi and wsgi (since it doesn't help anyone in the long run really - it just fragments the gateway providers and overcomplicate things) and concentrate more on the async input extension. So the idea is that the gateways would provide async input by default and a piece of middleware or config option to make it synchronous (well, actually, buffer it). Also, since there already are a bunch of async gateways out there I would like to hear if the other providers would/could implement the proposed form of common async input - that would ultimately decide the success of this proposed spec. -- http://ionelmc.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cstawarz at csail.mit.edu Thu May 8 04:59:42 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Wed, 7 May 2008 22:59:42 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu> Message-ID: <36B208C9-2503-4AB2-B39B-6EB67398DD8A@csail.mit.edu> On May 7, 2008, at 5:36 PM, Ionel Maries Cristian wrote: > The way I see it asynchronous wsgi is just a matter of deciding how > to handle the input asynchronously - a asynchronous input wsgi > extension specification. Another crucial element is the ability to perform non-blocking I/O on other file descriptors (TCP connections to other servers, pipes to other OS processes). This is why the readable/writable functions (or something like them) are necessary. > So I suggest completely dropping the idea of a incompatibility > between async_wsgi and wsgi (since it doesn't help anyone in the > long run really - it just fragments the gateway providers and > overcomplicate things) and concentrate more on the async input > extension. This is a compelling argument. As long as the application iterable yields only strings (which, the more I think about it, seems like the right thing to do), then the remaining functionality I propose can be implemented as extensions to WSGI, perhaps in a "x-wsgiorg.async" namespace. However, the problem remains that, even though an asynchronous server can implement the write() callable and wsgi.input as required by the WSGI spec, they effectively can't be used by applications, since they involve potentially blocking I/O operations. So either WSGI has to be revised to take the needs of asynchronous servers into account, or we have to accept that async servers can never be fully WSGI compliant. > So the idea is that the gateways would provide async input by > default and a piece of middleware or config option to make it > synchronous (well, actually, buffer it). You mean the middleware would be used to make the input synchronous so that an app that uses wsgi.input would function normally (reading from the buffer)? That would fix the problem for wsgi.input, but the issue with write() remains. Another point to keep in mind is that in order to function correctly on an async server, an application really has to be written with that execution environment in mind. For example, an app couldn't use httplib, since it does blocking I/O (which, again, would freeze up the entire server). > Also, since there already are a bunch of async gateways out there I > would like to hear if the other providers would/could implement the > proposed form of common async input - that would ultimately decide > the success of this proposed spec. I would like to hear their opinions as well. In particular, do any Twisted folks have comments on what we've discussed? Chris From cstawarz at csail.mit.edu Thu May 8 07:49:42 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Thu, 8 May 2008 01:49:42 -0400 Subject: [Web-SIG] Proposal for asynchronous WSGI variant In-Reply-To: <36B208C9-2503-4AB2-B39B-6EB67398DD8A@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu> <36B208C9-2503-4AB2-B39B-6EB67398DD8A@csail.mit.edu> Message-ID: <52E01258-D046-4A44-AD35-2E9413C1DAB6@csail.mit.edu> On May 7, 2008, at 10:59 PM, Christopher Stawarz wrote: > However, the problem remains that, even though an asynchronous > server can implement the write() callable and wsgi.input as required > by the WSGI spec, they effectively can't be used by applications, > since they involve potentially blocking I/O operations. So either > WSGI has to be revised to take the needs of asynchronous servers > into account, or we have to accept that async servers can never be > fully WSGI compliant. Maybe this isn't as big a deal as I'm making it. The point of the async extensions is to make it possible for WSGI apps to run effectively on asynchronous servers. Apps that use the extensions won't use write() or wsgi.input, so it really doesn't matter whether they're blocking or not. Although apps that don't use the async extensions *could* be run on an asynchronous server (by using wsgi.input in a blocking fashion), doing so would mean that the server could effectively handle only one request at a time (i.e. serially). If this were unacceptable (which it most likely would be), then you just wouldn't do it. Better to use mod_wsgi or some other server that can run your app effectively. So I guess the only issue is that authors of asynchronous servers who want to comply fully with the WSGI spec have to implement functionality (write() and wsgi.input) that can't be used without severely degrading the server's performance. But that's an issue that server authors can address as they see fit, not something that the WSGI spec needs to account for. Thanks to everyone who has provided input so far -- please keep the comments coming! I'm going to work on another draft of my proposal that takes into account what we've discussed and will post it here when it's done. Chris From cstawarz at csail.mit.edu Mon May 12 00:15:57 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Sun, 11 May 2008 18:15:57 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers Message-ID: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> This is a revised version of my AWSGI proposal from last week. While many of the details remain the same, the big change is that I'm now proposing a set of extensions to standard WSGI, rather than a separate specification for asynchronous servers. The updated proposal is included below. I've also posted it at http://wsgi.org/wsgi/Specifications/async The bzr repository for my reference implementation (which is only partially updated to match the new spec) is now at http://pseudogreen.org/bzr/wsgiorg_async_ref/ I'd appreciate your comments. Thanks, Chris Abstract -------- This specification defines a set of extensions that allow WSGI applications to run effectively on asynchronous (aka event driven) servers. Rationale --------- The architecture of an asynchronous server requires all I/O operations, including both interprocess and network communication, to be non-blocking. For a WSGI-compliant server, this requirement extends to all applications run on the server. However, the WSGI specification does not provide sufficient facilities for an application to ensure that its I/O is non-blocking. Specifically, there are two issues: * The methods provided by the input stream (``environ['wsgi.input']``) follow the semantics of the corresponding methods of the ``file`` class. In particular, each of these methods can invoke the underlying I/O function (in this case, ``recv`` on the socket connected to the client) more than once, without giving the application the opportunity to check whether each invocation will block. * WSGI does not provide the application with a mechanism to test arbitrary file descriptors (such as those belonging to sockets or pipes opened by the application) for I/O readiness. This specification defines a standard interface by which asynchronous servers can provide the required facilities to applications. Specification ------------- Servers that want to allow applications to perform non-blocking I/O must add four new variables to the WSGI environment: ``x-wsgiorg.async.input``, ``x-wsgiorg.async.readable``, ``x-wsgiorg.async.writable``, and ``x-wsgiorg.async.timeout``. The following sections describe these extensions. Non-blocking Input Stream ~~~~~~~~~~~~~~~~~~~~~~~~~ The ``x-wsgiorg.async.input`` variable provides a non-blocking replacement for ``wsgi.input``. It is an object with one method, ``read(size)``, that behaves like the ``recv`` method of ``socket.socket``. This means that a call to ``read`` will invoke the underlying socket ``recv`` **no more than once** and return **at most** ``size`` bytes of data (possibly less). In addition, ``read`` may return an empty string (zero bytes) **only** if the client closes the connection or the application attempts to read more data than is specified by the ``CONTENT_LENGTH`` variable. Before each call to ``read``, the application **must** test the input stream for readiness with ``x-wsgiorg.async.readable`` (see below). The result of calling ``read`` on a non-ready input stream is undefined. As with ``wsgi.input``, the server is free to implement ``x-wsgiorg.async.input`` using any technique it chooses (performing reads on demand, pre-reading the request body, etc.). The only requirements are for ``read`` to obey the expected semantics and the input object to be accepted as the first argument to ``x-wsgiorg.async.readable``. Testing File Descriptors for I/O Readiness ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The variables ``x-wsgiorg.async.readable`` and ``x-wsgiorg.async.writable`` are callable objects that accept two positional arguments, one required and one optional. In the following description, these arguments are given the names ``fd`` and ``timeout``, but they are not required to have these names, and the application **must** invoke the callables using positional arguments. The first argument, ``fd``, is either an integer representing a file descriptor or an object with a ``fileno`` method that returns such an integer. (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even if it lacks a ``fileno`` method.) The second, optional argument, ``timeout``, is either ``None`` or a floating-point value in seconds. If omitted, it defaults to ``None``. When called, ``readable`` and ``writable`` return the empty string (``''``), which **must** be yielded by the application iterable to the server (passing through any middleware). The server then suspends execution of the application until one of the following conditions is met: * The specified file descriptor is ready for reading or writing. * ``timeout`` seconds have elapsed without the file descriptor becoming ready for I/O. * The server detects an error or "exceptional" condition (such as out-of-band data) on the file descriptor. Put another way, if the application calls ``readable`` and yields the empty string, it will be suspended until ``select.select([fd],[],[fd],timeout)`` would return. If the application calls ``writable`` and yields the empty string, it will be suspended until ``select.select([],[fd],[fd],timeout)`` would return. If ``timeout`` seconds elapse without the file descriptor becoming ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true when the application resumes. Otherwise, it will be false. The value of ``x-wsgiorg.async.timeout`` when the application is first started or after it yields each response-body string is undefined. The server may use any technique it desires to detect when an application's file descriptors are ready for I/O. (Most likely, it will add them to the same event loop that it uses for accepting new client connections, receiving requests, and sending responses.) Examples -------- The following application reads the request body and sends it back to the client unmodified. Each time it wants to receive data from the client, it first tests ``environ['x-wsgiorg.async.input']`` for readability and then calls its ``read`` method. If the input stream is not readable after one second, the application sends a ``408 Request Timeout`` response to the client and terminates:: def echo_request_body(environ, start_response): input = environ['x-wsgiorg.async.input'] readable = environ['x-wsgiorg.async.readable'] nbytes = int(environ.get('CONTENT_LENGTH') or 0) output = '' while nbytes: yield readable(input, 1.0) # Time out after 1 second if environ['x-wsgiorg.async.timeout']: msg = 'The request timed out.' start_response('408 Request Timeout', [('Content-Type', 'text/plain'), ('Content-Length', str(len(msg)))]) yield msg return data = input.read(nbytes) if not data: break output += data nbytes -= len(data) content_type = (environ.get('CONTENT_TYPE') or 'application/ octet-stream') start_response('200 OK', [('Content-Type', content_type), ('Content-Length', str(len(output)))]) yield output The following middleware component allows an application that uses the ``x-wsgiorg.async`` extensions to run on a server that does not support them, without any modification to the application's code:: def dummy_async(application): def wrapper(environ, start_response): input = environ['wsgi.input'] environ['x-wsgiorg.async.input'] = input select_args = [None] def readable(fd, timeout=None): select_args[0] = ([fd], [], [fd], timeout) return '' def writable(fd, timeout=None): select_args[0] = ([], [fd], [fd], timeout) return '' environ['x-wsgiorg.async.readable'] = readable environ['x-wsgiorg.async.writable'] = writable for result in application(environ, start_response): if result or (not select_args[0]): yield result else: if select_args[0][2][0] is input: environ['x-wsgiorg.async.timeout'] = False else: ready = select.select(*select_args[0]) environ['x-wsgiorg.async.timeout'] = (ready == ([],[],[])) select_args[0] = None return wrapper Problems -------- * The empty string yielded by an application after calling ``readable`` or ``writable`` must pass through any intervening middleware and be detected by the server. Although WSGI explicitly requires middleware to relay such strings to the server (see `Middleware Handling of Block Boundaries `_), some components may not, making them incompatible with this specification. * Although the extensions described here make it *possible* for applications to run effectively on asynchronous servers, they do not (and cannot) *ensure* that they do so. As is the case with any cooperative multitasking environment, the burden of ensuring that all application code is non-blocking rests with application authors. Other Possibilities ------------------- * To prevent an application that does blocking I/O from blocking the entire server, an asynchronous server could run each instance of the application in a separate thread. However, since asynchronous servers achieve high levels of concurrency by expressly *avoiding* multithreading, this technique will almost always be unacceptable. * The `greenlet `_ package enables the use of cooperatively-scheduled micro-threads in Python programs, and a WSGI server could potentially use it to pause and resume applications around blocking I/O operations. However, such micro-threading is not part of the Python language or standard library, and some server authors may be unwilling or unable to make use of it. Open Issues ----------- * Some third-party libraries (such as `PycURL `_) provide non-blocking interfaces that may need to monitor multiple file descriptors for I/O readiness simultaneously. Since this specification allows an application to wait on only one file descriptor at a time, it may be difficult or impossible for applications to use such libraries. Although this specification could be extended to include an interface for waiting on multiple file descriptors, it is unclear whether it would be easy (or even possible) for all servers to implement it. Also, the appropriate behavior for a multi-descriptor wait is not obvious. (Should the application be resumed when a single descriptor is ready? All of them? Some minimum number?) From pje at telecommunity.com Mon May 12 01:05:33 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 11 May 2008 19:05:33 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> Message-ID: <20080511230511.CE3C13A4061@sparrow.telecommunity.com> At 06:15 PM 5/11/2008 -0400, Christopher Stawarz wrote: >Non-blocking Input Stream >~~~~~~~~~~~~~~~~~~~~~~~~~ > >The ``x-wsgiorg.async.input`` variable provides a non-blocking >replacement for ``wsgi.input``. It is an object with one method, >``read(size)``, that behaves like the ``recv`` method of >``socket.socket``. This means that a call to ``read`` will invoke the >underlying socket ``recv`` **no more than once** and return **at >most** ``size`` bytes of data (possibly less). In addition, ``read`` >may return an empty string (zero bytes) **only** if the client closes >the connection or the application attempts to read more data than is >specified by the ``CONTENT_LENGTH`` variable. > >Before each call to ``read``, the application **must** test the input >stream for readiness with ``x-wsgiorg.async.readable`` (see below). >The result of calling ``read`` on a non-ready input stream is >undefined. For this to work, you're going to need this to take the wsgi.input object as a parameter. If you don't, then this will bypass middleware that replaces wsgi.input. That is, you will need a way for this spec to support middleware that's replacing wsgi.input, without the middleware knowing that this specification exists. In the worst case, it should detect the replaced input and give an error or some response that lets the application know it won't really be able to use the async feature. >If ``timeout`` seconds elapse without the file descriptor becoming >ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true >when the application resumes. Otherwise, it will be false. The value >of ``x-wsgiorg.async.timeout`` when the application is first started >or after it yields each response-body string is undefined. Er, I think you are confused here. There is no way for the server to know what environ dictionary the application is using, unless you explicitly pass it into your extension API. From cstawarz at csail.mit.edu Mon May 12 02:25:57 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Sun, 11 May 2008 20:25:57 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <20080511230511.CE3C13A4061@sparrow.telecommunity.com> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <20080511230511.CE3C13A4061@sparrow.telecommunity.com> Message-ID: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote: > At 06:15 PM 5/11/2008 -0400, Christopher Stawarz wrote: >> Non-blocking Input Stream >> ~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> The ``x-wsgiorg.async.input`` variable provides a non-blocking >> replacement for ``wsgi.input``. It is an object with one method, >> ``read(size)``, that behaves like the ``recv`` method of >> ``socket.socket``. This means that a call to ``read`` will invoke >> the >> underlying socket ``recv`` **no more than once** and return **at >> most** ``size`` bytes of data (possibly less). In addition, ``read`` >> may return an empty string (zero bytes) **only** if the client closes >> the connection or the application attempts to read more data than is >> specified by the ``CONTENT_LENGTH`` variable. >> >> Before each call to ``read``, the application **must** test the input >> stream for readiness with ``x-wsgiorg.async.readable`` (see below). >> The result of calling ``read`` on a non-ready input stream is >> undefined. > > For this to work, you're going to need this to take the wsgi.input > object as a parameter. If you don't, then this will bypass > middleware that replaces wsgi.input. > > That is, you will need a way for this spec to support middleware > that's replacing wsgi.input, without the middleware knowing that > this specification exists. In the worst case, it should detect the > replaced input and give an error or some response that lets the > application know it won't really be able to use the async feature. I hadn't considered middleware that replaces wsgi.input. Is there an example component you can point me to, just so I have something concrete to look at? Given that the semantics of wsgi.input are, in general, incompatible with non-blocking execution, I'm inclined to think that such middleware would either need to be rewritten to use x- wsgiorg.async.input, or just couldn't be used with asynchronous servers. But I'll think about it some more -- maybe there's a way to make this work. >> If ``timeout`` seconds elapse without the file descriptor becoming >> ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true >> when the application resumes. Otherwise, it will be false. The >> value >> of ``x-wsgiorg.async.timeout`` when the application is first started >> or after it yields each response-body string is undefined. > > Er, I think you are confused here. There is no way for the server > to know what environ dictionary the application is using, unless you > explicitly pass it into your extension API. My thinking is that the server *creates* the environ dictionary, so it can just keep a reference to it and update it as needed. Is middleware allowed to replace environ with another dict instance before passing it to the application? I wasn't aware that this was allowed, but if it is, then I see the problem. The solution would probably be for the application to pass a mutable object (e.g. an empty list) to readable/writable that the server could set a timeout flag on. Thanks, Chris From pje at telecommunity.com Mon May 12 03:09:41 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 11 May 2008 21:09:41 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <20080511230511.CE3C13A4061@sparrow.telecommunity.com> <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> Message-ID: <20080512010919.B3EB63A4061@sparrow.telecommunity.com> At 08:25 PM 5/11/2008 -0400, Christopher Stawarz wrote: >Given that the semantics of wsgi.input are, in general, incompatible >with non-blocking execution, I'm inclined to think that such >middleware would either need to be rewritten to use x- >wsgiorg.async.input, or just couldn't be used with asynchronous >servers. But I'll think about it some more -- maybe there's a way to >make this work. Please read http://www.python.org/dev/peps/pep-0333/#server-extension-apis for the lowdown on this. It's only seven paragraphs, but it already covers this ground thoroughly. >Is >middleware allowed to replace environ with another dict instance >before passing it to the application? See the same seven paragraphs for the answer to this as well (albeit somewhat implicitly). From ionel.mc at gmail.com Mon May 12 06:01:40 2008 From: ionel.mc at gmail.com (Ionel Maries Cristian) Date: Mon, 12 May 2008 07:01:40 +0300 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <20080511230511.CE3C13A4061@sparrow.telecommunity.com> <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> Message-ID: > My thinking is that the server *creates* the environ dictionary, so it can > just keep a reference to it and update it as needed. Is middleware allowed > to replace environ with another dict instance before passing it to the > application? I wasn't aware that this was allowed, but if it is, then I see > the problem. > > The solution would probably be for the application to pass a mutable > object (e.g. an empty list) to readable/writable that the server could set a > timeout flag on. > How about a environ['x-wsgiorg.async'].timeout ? I do something like that in cogen. -- http://ionelmc.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionel.mc at gmail.com Mon May 12 06:45:22 2008 From: ionel.mc at gmail.com (Ionel Maries Cristian) Date: Mon, 12 May 2008 07:45:22 +0300 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <20080511230511.CE3C13A4061@sparrow.telecommunity.com> <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> Message-ID: On Mon, May 12, 2008 at 3:25 AM, Christopher Stawarz wrote: > On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote: > > For this to work, you're going to need this to take the wsgi.input object > > as a parameter. If you don't, then this will bypass middleware that > > replaces wsgi.input. > > > > That is, you will need a way for this spec to support middleware that's > > replacing wsgi.input, without the middleware knowing that this specification > > exists. In the worst case, it should detect the replaced input and give an > > error or some response that lets the application know it won't really be > > able to use the async feature. > > > > I hadn't considered middleware that replaces wsgi.input. Is there an > example component you can point me to, just so I have something concrete to > look at? > > Given that the semantics of wsgi.input are, in general, incompatible with > non-blocking execution, I'm inclined to think that such middleware would > either need to be rewritten to use x-wsgiorg.async.input, or just couldn't > be used with asynchronous servers. But I'll think about it some more -- > maybe there's a way to make this work. > Making input filters work could be achieved using greenlets - but then again - if one would use greenlets he could use them to simulate a seemingly blocking api for the input so this is pretty much pointless. But I agree, detecting this is good and errors should be thrown in this case. In cogen i'm setting wsgi.input to None - so any use of it would end in a error - though it's not very elegant. -- http://ionelmc.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From manlio_perillo at libero.it Mon May 12 15:03:44 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 12 May 2008 15:03:44 +0200 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <20080511230511.CE3C13A4061@sparrow.telecommunity.com> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <20080511230511.CE3C13A4061@sparrow.telecommunity.com> Message-ID: <48284030.6020300@libero.it> Phillip J. Eby ha scritto: > [...] > > >> If ``timeout`` seconds elapse without the file descriptor becoming >> ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true >> when the application resumes. Otherwise, it will be false. The value >> of ``x-wsgiorg.async.timeout`` when the application is first started >> or after it yields each response-body string is undefined. > > Er, I think you are confused here. There is no way for the server to > know what environ dictionary the application is using, unless you > explicitly pass it into your extension API. > Interesting, this is something I have never considered. In my implementation ngx.poll returns a function, so there should be no problems. Regards Manlio Perillo From cstawarz at csail.mit.edu Mon May 12 16:35:09 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Mon, 12 May 2008 10:35:09 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <20080511230511.CE3C13A4061@sparrow.telecommunity.com> <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> Message-ID: On May 12, 2008, at 12:01 AM, Ionel Maries Cristian wrote: > My thinking is that the server *creates* the environ dictionary, so > it can just keep a reference to it and update it as needed. Is > middleware allowed to replace environ with another dict instance > before passing it to the application? I wasn't aware that this was > allowed, but if it is, then I see the problem. > > The solution would probably be for the application to pass a mutable > object (e.g. an empty list) to readable/writable that the server > could set a timeout flag on. > > How about a environ['x-wsgiorg.async'].timeout ? I do something like > that in cogen. Or environ['x-wsgiorg.async.timeout'] could be an object whose truth value can be toggled by the server, like an instance of the following: class MutaBool(object): def __init__(self): self.val = False def __nonzero__(self): return self.val Then there's no need for the server to change environ after starting the app. I think that's probably the way to go. Chris From cstawarz at csail.mit.edu Mon May 12 17:03:12 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Mon, 12 May 2008 11:03:12 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <20080511230511.CE3C13A4061@sparrow.telecommunity.com> <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu> Message-ID: <8FE609A5-6865-4D25-A318-064729FF99E1@csail.mit.edu> On May 12, 2008, at 12:45 AM, Ionel Maries Cristian wrote: > On Mon, May 12, 2008 at 3:25 AM, Christopher Stawarz > wrote: > On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote: > > For this to work, you're going to need this to take the wsgi.input > object as a parameter. If you don't, then this will bypass > middleware that replaces wsgi.input. > > That is, you will need a way for this spec to support middleware > that's replacing wsgi.input, without the middleware knowing that > this specification exists. In the worst case, it should detect the > replaced input and give an error or some response that lets the > application know it won't really be able to use the async feature. > > I hadn't considered middleware that replaces wsgi.input. Is there > an example component you can point me to, just so I have something > concrete to look at? > > Given that the semantics of wsgi.input are, in general, incompatible > with non-blocking execution, I'm inclined to think that such > middleware would either need to be rewritten to use x- > wsgiorg.async.input, or just couldn't be used with asynchronous > servers. But I'll think about it some more -- maybe there's a way > to make this work. > > > Making input filters work could be achieved using greenlets - but > then again - if one would use greenlets he could use them to > simulate a seemingly blocking api for the input so this is pretty > much pointless. > > But I agree, detecting this is good and errors should be thrown in > this case. > In cogen i'm setting wsgi.input to None - so any use of it would end > in a error - though it's not very elegant. But if your server sets wsgi.input to None, then you really can't claim that it's WSGI-compliant. It seems like the authors of asynchronous servers have two options for how to handle wsgi.input. The first option is to provide a compliant wsgi.input (with file-like, blocking behavior). This means that middleware that uses/replaces wsgi.input will work properly, but the whole server can block whenever such use takes place. Therefore, apps and middleware will essentially be required to use x- wsgiorg.async.input. The second option is to provide a non-compliant (i.e. non-blocking) wsgi.input, which works something like x-wsgiorg.async.input. But then any middleware that uses wsgi.input will be broken, since it won't work as expected. In either case, wsgi.input ends up being unusable. Ugh. Of course, there is an easy way out of this: Drop the idea of x- wsgiorg.async.input, and push the responsibility for making wsgi.input non-blocking on to server authors. In effect, this would mean that asynchronous servers must *always* pre-read the request body and provide it to the app as a StringIO (or whatever). I would like to avoid this requirement, since the ability for servers to provide on-demand, non-blocking input to the application seems useful. But if it comes down to a choice between (1) the ability to receive data from the client on-demand and (2) having a wsgi.input that can actually be used, I'm think I'd choose (2). Chris From foom at fuhm.net Mon May 12 18:18:50 2008 From: foom at fuhm.net (James Y Knight) Date: Mon, 12 May 2008 12:18:50 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> Message-ID: <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net> On May 11, 2008, at 6:15 PM, Christopher Stawarz wrote: > Abstract > -------- > > This specification defines a set of extensions that allow WSGI > applications to run effectively on asynchronous (aka event driven) > servers. > > Rationale > --------- > > The architecture of an asynchronous server requires all I/O > operations, including both interprocess and network communication, to > be non-blocking. For a WSGI-compliant server, this requirement > extends to all applications run on the server. However, the WSGI > specification does not provide sufficient facilities for an > application to ensure that its I/O is non-blocking. Specifically, > there are two issues: > > * The methods provided by the input stream (``environ['wsgi.input']``) > follow the semantics of the corresponding methods of the ``file`` > class. > > * WSGI does not provide the application with a mechanism to test > arbitrary file descriptors (such as those belonging to sockets or > pipes opened by the application) for I/O readiness. There are other issues. How do you do a DNS lookup? How do you get process completion notification? Heck, how do you run a process? Once you have I/O readiness information, what do you do with that? I guess you'd need to write a whole new asynchronous server framework on top of AWSGI? I can't see being able to use it "raw" for any real applications. > The first argument, ``fd``, is either an integer representing a file > descriptor or an object with a ``fileno`` method that returns such an > integer. (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even > if it lacks a ``fileno`` method.) The second, optional argument, > ``timeout``, is either ``None`` or a floating-point value in seconds. > If omitted, it defaults to ``None``. What if the event-loop of the server doesn't use integer fds, but windows file handles or a java channel object? Where are you allowed to get these integers from? Is it always a socket from socket.socket().fileno()? Or can it be a file from open().fileno() or os.open()? A pipe from os.pipe()? Note that these distinctions are important everywhere but UNIX. > Other Possibilities > ------------------- > > * To prevent an application that does blocking I/O from blocking the > entire server, an asynchronous server could run each instance of the > application in a separate thread. However, since asynchronous > servers achieve high levels of concurrency by expressly *avoiding* > multithreading, this technique will almost always be unacceptable. Well, my claim would be that it's usually acceptable. Certainly sometimes it's not, which is where the use of an asynchronous server framework comes in handy. But here you're inventing a whole new framework... PS, a minor bug: I notice the spec says wsgiorg.async.input is supposed to have only a read function, but you actually call recv() on it in the examples. James From cstawarz at csail.mit.edu Mon May 12 20:55:27 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Mon, 12 May 2008 14:55:27 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net> Message-ID: <4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu> On May 12, 2008, at 12:18 PM, James Y Knight wrote: > There are other issues. How do you do a DNS lookup? How do you get > process completion notification? Heck, how do you run a process? These are valid questions that I'm not attempting to address with this proposal. So maybe the title of my spec should be "Extensions for Asynchronous I/O", since that's the only issue it deals with. I see these other issues as something for other specifications to address. > Once you have I/O readiness information, what do you do with that? I > guess you'd need to write a whole new asynchronous server framework > on top of AWSGI? I can't see being able to use it "raw" for any real > applications. No, you don't need a whole new framework. You need libraries (for making HTTP requests, talking to databases, etc.) that are written to use the extensions the spec provides. These only need to be written once and can then be used with *any* server that supports the extensions. So the existence of a spec like this lets us move from a world where every server/framework (be it Twisted, nginx, cogen, whatever) needs to reimplement these utilities in terms of its own async I/O framework, to one where a single implementation can be written against the spec and then used by any server that implements it. In turn, this should make application developers more comfortable with targeting their apps at async servers, since they won't be tied to any particular server/framework's API. And, yes, the fact that what I just wrote sounds like "write once, run anywhere" sets off alarm bells in my head, too :) But I think the interface I propose is so basic that any async server should be able to provide it with very little trouble. > What if the event-loop of the server doesn't use integer fds, but > windows file handles or a java channel object? Where are you allowed > to get these integers from? Is it always a socket from > socket.socket().fileno()? Or can it be a file from open().fileno() > or os.open()? A pipe from os.pipe()? Note that these distinctions > are important everywhere but UNIX. Although I didn't state it in the spec, my thinking was that readable/ writable should accept whatever would be accepted by select() on the platform you're running on. On Windows, they would be limited to sockets; elsewhere, any file descriptor would do. In that light, maybe the title should really be "Extensions for Polling File Descriptors for I/O Readiness". But even limited to that scope, I still think it'd be extremely useful. >> * To prevent an application that does blocking I/O from blocking the >> entire server, an asynchronous server could run each instance of the >> application in a separate thread. However, since asynchronous >> servers achieve high levels of concurrency by expressly *avoiding* >> multithreading, this technique will almost always be unacceptable. > > Well, my claim would be that it's usually acceptable. Certainly > sometimes it's not, which is where the use of an asynchronous server > framework comes in handy. I don't get how it's acceptable. If you spawn a separate thread for each request, then your server is no longer asynchronous. At that point, why not just save yourself some trouble and use Apache? > PS, a minor bug: I notice the spec says wsgiorg.async.input is > supposed to have only a read function, but you actually call recv() > on it in the examples. Thanks. The examples in the spec text are correct, but I haven't updated the examples in my reference code yet. Chris From foom at fuhm.net Mon May 12 23:07:33 2008 From: foom at fuhm.net (James Y Knight) Date: Mon, 12 May 2008 17:07:33 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net> <4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu> Message-ID: On May 12, 2008, at 2:55 PM, Christopher Stawarz wrote: > >> There are other issues. How do you do a DNS lookup? How do you get >> process completion notification? Heck, how do you run a process? > > These are valid questions that I'm not attempting to address with > this proposal. So maybe the title of my spec should be "Extensions > for Asynchronous I/O", since that's the only issue it deals with. I > see these other issues as something for other specifications to > address. Surely you need DNS lookup to make a socket connection? Do you mean to provide that in an external library via a threadpool? > No, you don't need a whole new framework. You need libraries (for > making HTTP requests, talking to databases, etc.) that are written > to use the extensions the spec provides. These only need to be > written once and can then be used with *any* server that supports > the extensions. You do need a framework. Using socket functions correctly (and portably) in non-blocking mode is not trivial. >> Well, my claim would be that it's usually acceptable. Certainly >> sometimes it's not, which is where the use of an asynchronous >> server framework comes in handy. > > I don't get how it's acceptable. If you spawn a separate thread for > each request, then your server is no longer asynchronous. At that > point, why not just save yourself some trouble and use Apache? Well, 1) Using apache is certainly a valid option performance-wise. Apache is pretty fast (obviously not the fastest server ever, but pretty good...). So if it has the features/packaging you need, by all means, use it. The advantage IMO of python servers is that they're lighter- weight deployment-wise and more easily configurable by code. 2) If your app uses a database, you probably might as well just run it in a thread, because you're most likely going to use a blocking database API anyhow. 3) If your app does not make use of outgoing sockets, then 3a) If it also doesn't use wsgi.input, you could inform the WSGI server that it can just run the app not in a thread as it won't be blocking. 3b) If it does use wsgi.input, but doesn't need to read it incrementally, you could inform the server that it should pre-read the input and then run the app directly, not in a thread, as it won't be blocking. If none of the above apply, that is: you do not use a database, you do use incremental reading of wsgi.input, or an outgoing socket connection, /then/ an async WSGI extension might be useful. I claim that will cover a small subset of WSGI apps. James From cstawarz at csail.mit.edu Tue May 13 00:18:47 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Mon, 12 May 2008 18:18:47 -0400 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net> <4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu> Message-ID: <9FB1544F-79B6-47CE-9918-B95BF24C1B62@csail.mit.edu> On May 12, 2008, at 5:07 PM, James Y Knight wrote: > Surely you need DNS lookup to make a socket connection? Do you mean > to provide that in an external library via a threadpool? No, I don't mean to, because I don't care enough to bother. But if you or someone else did, you'd be free to. > You do need a framework. Using socket functions correctly (and > portably) in non-blocking mode is not trivial. I need a library, not a framework. And I may not even need to write it myself. (For example, for making HTTP requests, I can use pycurl.) > 1) Using apache is certainly a valid option performance-wise. Apache > is pretty fast (obviously not the fastest server ever, but pretty > good...). So if it has the features/packaging you need, by all > means, use it. The advantage IMO of python servers is that they're > lighter-weight deployment-wise and more easily configurable by code. Fair enough. But I'm specifically interested in doing non-blocking I/ O on an asynchronous server. > 2) If your app uses a database, you probably might as well just run > it in a thread, because you're most likely going to use a blocking > database API anyhow. Yes, the compatibility of database and other API's with an asynchronous execution model is important. Some (like MySQL) don't support non-blocking connections, so you'd have to work around that with threads or some other technique. Others (like PostgreSQL) do provide an async API, which could be used with my proposed extensions. (Manlio Perillo has an example of how this works with his nginx mod_wsgi module at http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py.) This is another issue you have to worry about to keep your app non- blocking, but I don't think it's an insurmountable one. And again, any library you develop to support these operations, written in terms of the proposed non-blocking I/O extensions, will be usable on any server that supports the extensions. > 3) If your app does not make use of outgoing sockets, then > 3a) If it also doesn't use wsgi.input, you could inform the WSGI > server that it can just run the app not in a thread as it won't be > blocking. > 3b) If it does use wsgi.input, but doesn't need to read it > incrementally, you could inform the server that it should pre-read > the input and then run the app directly, not in a thread, as it > won't be blocking. > > If none of the above apply, that is: you do not use a database, you > do use incremental reading of wsgi.input, or an outgoing socket > connection, /then/ an async WSGI extension might be useful. I claim > that will cover a small subset of WSGI apps. As I mentioned above, the database issue is a real one, but it can be dealt with. I would like to be able to allow incremental reading of wsgi.input, but I don't see how to do this without breaking middleware. (If you have suggestions, please let me know.) As for outgoing socket connections, I'm willing to accept the cost of a DNS lookup; if someone else isn't, then they're free to write some kind of local lookup server that their app talks to over a socket, and other applications running on other servers can enjoy the fruits of their labor. I regret calling my proposal "Extensions for Asynchronous Servers", since clearly that encompasses a much broader range of functionality for you than it does for me. All I'm interested in is the ability to poll file descriptors (and the things that allows me to do), and in the next revision of my proposal I'll strive to make that clear. If you have an application that requires functionality beyond that, then my proposal won't be sufficient for your needs. Chris From manlio_perillo at libero.it Tue May 13 14:51:58 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 13 May 2008 14:51:58 +0200 Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers In-Reply-To: <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net> References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu> <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net> Message-ID: <48298EEE.7080703@libero.it> James Y Knight ha scritto: > > On May 11, 2008, at 6:15 PM, Christopher Stawarz wrote: >> Abstract >> -------- >> >> This specification defines a set of extensions that allow WSGI >> applications to run effectively on asynchronous (aka event driven) >> servers. >> >> Rationale >> --------- >> >> The architecture of an asynchronous server requires all I/O >> operations, including both interprocess and network communication, to >> be non-blocking. For a WSGI-compliant server, this requirement >> extends to all applications run on the server. However, the WSGI >> specification does not provide sufficient facilities for an >> application to ensure that its I/O is non-blocking. Specifically, >> there are two issues: >> >> * The methods provided by the input stream (``environ['wsgi.input']``) >> follow the semantics of the corresponding methods of the ``file`` >> class. >> >> * WSGI does not provide the application with a mechanism to test >> arbitrary file descriptors (such as those belonging to sockets or >> pipes opened by the application) for I/O readiness. > > There are other issues. How do you do a DNS lookup? How do you get > process completion notification? Heck, how do you run a process? Once > you have I/O readiness information, what do you do with that? I guess > you'd need to write a whole new asynchronous server framework on top of > AWSGI? I can't see being able to use it "raw" for any real applications. > This is not a problem with AWSGI. As an example there are libraries like PostgreSQL and curl that can be used with an external event loop. In the WSGI implementation for Nginx I can provide an interface for using the builtin supporto for asynchronous DNS client. >> The first argument, ``fd``, is either an integer representing a file >> descriptor or an object with a ``fileno`` method that returns such an >> integer. (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even >> if it lacks a ``fileno`` method.) The second, optional argument, >> ``timeout``, is either ``None`` or a floating-point value in seconds. >> If omitted, it defaults to ``None``. > > What if the event-loop of the server doesn't use integer fds, but > windows file handles or a java channel object? Where are you allowed to > get these integers from? Is it always a socket from > socket.socket().fileno()? Or can it be a file from open().fileno() or > os.open()? A pipe from os.pipe()? Note that these distinctions are > important everywhere but UNIX. > This has the same problems that we have with wsgi.file_wrapper. This is the reason, among other things, why the API in my implementation uses ngx.connection_wrapper and ngx.poll_register > [...] Manlio Perillo From stephan.diehl at gmx.net Fri May 16 11:13:28 2008 From: stephan.diehl at gmx.net (Stephan Diehl) Date: Fri, 16 May 2008 11:13:28 +0200 Subject: [Web-SIG] upgrading wsgi.org Message-ID: <482D5038.9010406@gmx.net> Hi, just in case somebody have problems accessing wsgi.org: I'll upgrade the OS. Cheers Stephan From manlio_perillo at libero.it Tue May 20 18:38:22 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Tue, 20 May 2008 18:38:22 +0200 Subject: [Web-SIG] WSGI and PEP 325 Message-ID: <4832FE7E.2060508@libero.it> The WSGI PEP explicitly mention the PEP 325 (for the application iterable close method). Maybe this should be updated for the next WSGI spec, since Python 2.5 implements the PEP 342? Regards Manlio Perillo From cstawarz at csail.mit.edu Wed May 21 02:42:48 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Tue, 20 May 2008 20:42:48 -0400 Subject: [Web-SIG] Proposed specification: waiting for file descriptor events Message-ID: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu> This is the third draft of my proposed extensions for better supporting WSGI apps on asynchronous servers. The major changes since the last draft are as follows: * The title and abstract now accurately reflect the scope of the proposal. In addition, the extensions are now in the namespace "x- wsgiorg.fdevent" (instead of "x-wsgiorg.async"). * The proposal for an alternative, non-blocking input stream has been dropped, since I don't see a way to add one that wouldn't break middleware. Instead, the spec recommends that async servers pre-read the request body before invoking the app (either by default or as a configurable option). * The mechanism for indicating timeouts no longer requires the server to know what environ dict the app is using (addressing one of PJE's points). * The examples have been updated. The first one shows how an app can use pycurl to perform an outgoing HTTP request in a non-blocking fashion. The updated spec is included below and is also available at http://wsgi.org/wsgi/Specifications/fdevent The example code and some utilities are available in a bzr repository at http://pseudogreen.org/bzr/wsgiorg_fdevent_util Once again, I'd appreciate your comments. Thanks, Chris Abstract -------- This specification defines a set of extensions that allow a WSGI application to suspend its execution until an event occurs on a specified file descriptor. Rationale --------- The architecture of asynchronous (aka event driven) servers requires all I/O operations, including both interprocess and network communication, to be non-blocking. For a WSGI-compliant server, this requirement extends to all applications run on the server. However, the WSGI specification does not provide sufficient facilities for an application to ensure that its I/O is non-blocking. Specifically, it lacks a mechanism by which an application can suspend its execution until an arbitrary file descriptor (such as one belonging to a socket or pipe opened by the application) is ready for reading or writing. This specification defines a standard interface by which servers can provide such a mechanism to applications. Specification ------------- This specification introduces three new variables to the WSGI environment: ``x-wsgiorg.fdevent.readable``, ``x-wsgiorg.fdevent.writable``, and ``x-wsgiorg.fdevent.timeout``. The variables ``x-wsgiorg.fdevent.readable`` and ``x-wsgiorg.fdevent.writable`` are callable objects that accept two positional arguments, one required and one optional. In the following description, these arguments are given the names ``fd`` and ``timeout``, but they are not required to have these names, and the application **must** invoke the callables using positional arguments. The first argument, ``fd``, is either an integer representing a file descriptor or an object with a ``fileno`` method that returns such an integer. The set of acceptable file descriptors is defined to be those accepted by ``select.select``. (Note that this set is platform dependent: only sockets are allowed on Windows, whereas sockets, pipes, and files are acceptable on Unix-like systems.) The second, optional argument, ``timeout``, is either ``None`` or a floating-point value in seconds. If omitted, it defaults to ``None``. When called, ``x-wsgiorg.fdevent.readable`` and ``x-wsgiorg.fdevent.writable`` return the empty string (``''``), which **must** be yielded by the application iterable to the server (passing through any middleware). The server then suspends execution of the application until one of the following conditions is met: * The specified file descriptor is ready for reading (if the application called ``x-wsgiorg.fdevent.readable``) or writing (if the application called ``x-wsgiorg.fdevent.writable``). * ``timeout`` seconds have elapsed without the desired file descriptor event occurring (unless the value of ``timeout`` is ``None``, in which case the wait will never timeout). * The server detects an error or "exceptional" condition (such as out-of-band data) on the file descriptor. Put another way, if the application calls ``x-wsgiorg.fdevent.readable`` and yields the empty string, it will be suspended until ``select.select([fd],[],[fd],timeout)`` would return. If the application calls ``x-wsgiorg.fdevent.writable`` and yields the empty string, it will be suspended until ``select.select([],[fd],[fd],timeout)`` would return. The variable ``x-wsgiorg.fdevent.timeout`` is an object whose truth value can be changed by the server. (For example, it could be a ``list`` instance, whose truth value is false when empty, true otherwise.) If ``timeout`` seconds elapse without the desired file descriptor event occurring, ``x-wsgiorg.fdevent.timeout`` will be true when the application resumes; otherwise, it will be false. The truth value of ``x-wsgiorg.fdevent.timeout`` when the application is first started or after it yields each response-body string is undefined. The server may use any technique it desires to detect events on an application's file descriptors. (Most likely, it will add them to the same event loop that it uses for accepting new client connections, receiving requests, and sending responses.) Handling of the Input Stream ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While technically outside the scope of this specification, the application's input stream (``environ['wsgi.input']``) is another source of potentially blocking I/O that deserves mention. The methods provided by the input stream follow the semantics of the corresponding methods of the ``file`` class. In particular, each of these methods can invoke the underlying I/O function (in this case, ``recv`` on the socket connected to the client) more than once, without giving the application the opportunity to check whether each invocation will block. Although authors of asynchronous servers may be tempted to provide a non-standard input stream that supports on-demand, non-blocking reads, such an input stream would be incompatible with WSGI middleware. In order to avoid these problems, it is strongly recommended that asynchronous servers pre-read the entire request body before invoking the application, either by default or as a configurable option. Doing so will ensure that the input stream is compatible with middleware and that reads from it are always non-blocking. Examples -------- The following application acts as a proxy to `python.org `_. It uses a ``pycurl.CurlMulti`` instance to perform the outgoing HTTP request in a non-blocking fashion. When the ``CurlMulti.perform`` method detects that its next I/O operation would block, it returns control to the application, which then yields until the file descriptor of interest becomes readable or writable as required. If the descriptor is not ready after one second, the application sends a ``504 Gateway Timeout`` response to the client and terminates:: def pyorg_proxy(environ, start_response): result = StringIO() c = pycurl.Curl() c.setopt(pycurl.URL, 'http://python.org' + environ['PATH_INFO']) c.setopt(pycurl.WRITEFUNCTION, result.write) m = pycurl.CurlMulti() m.add_handle(c) while True: while True: ret, num_handles = m.perform() if ret != pycurl.E_CALL_MULTI_PERFORM: break if not num_handles: break read, write, exc = m.fdset() if read: yield environ['x-wsgiorg.fdevent.readable'](read[0], 1.0) else: yield environ['x-wsgiorg.fdevent.writable'](write[0], 1.0) if environ['x-wsgiorg.fdevent.timeout']: msg = 'The request to python.org timed out.' start_response('504 Gateway Timeout', [('Content-Type', 'text/plain'), ('Content-Length', str(len(msg)))]) yield msg return start_response('200 OK', [('Content-Type', 'application/octet- stream'), ('Content-Length', str(result.len))]) yield result.getvalue() The following adapter allows an application that uses the ``x-wsgiorg.fdevent`` extensions to run on a server that does not support them, without any modification to the application's code:: def with_fdevent(application): def wrapper(environ, start_response): select_args = [None] def readable(fd, timeout=None): select_args[0] = ([fd], [], [fd], timeout) return '' def writable(fd, timeout=None): select_args[0] = ([], [fd], [fd], timeout) return '' environ['x-wsgiorg.fdevent.readable'] = readable environ['x-wsgiorg.fdevent.writable'] = writable timeout = False class TimeoutWrapper(object): def __nonzero__(self): return timeout environ['x-wsgiorg.fdevent.timeout'] = TimeoutWrapper() for result in application(environ, start_response): if result or (not select_args[0]): yield result else: ready = select.select(*select_args[0]) timeout = (ready == ([], [], [])) select_args[0] = None return wrapper Problems -------- * The empty string yielded by an application after calling ``x-wsgiorg.fdevent.readable`` or ``x-wsgiorg.fdevent.writable`` must pass through any intervening middleware and be detected by the server. Although WSGI explicitly requires middleware to relay such strings to the server (see `Middleware Handling of Block Boundaries `_), some components may not, making them incompatible with this specification. Other Possibilities ------------------- * To prevent an application that does blocking I/O from blocking the entire server, an asynchronous server could run each instance of the application in a separate thread. However, since asynchronous servers achieve high levels of concurrency by expressly *avoiding* multithreading, this technique will almost always be unacceptable. * The `greenlet `_ package enables the use of cooperatively-scheduled micro-threads in Python programs, and a WSGI server could potentially use it to pause and resume applications around blocking I/O operations. However, such micro-threading is not part of the Python language or standard library, and some server authors may be unwilling or unable to make use of it. Open Issues ----------- * Some third-party libraries (such as `PycURL `_) provide non-blocking interfaces that may need to monitor multiple file descriptors for events simultaneously. Since this specification allows an application to wait on only one file descriptor at a time, application authors may find it difficult or impossible to use such libraries, or they may be limited to a subset of the libraries' capabilities. Although this specification could be extended to include an interface for waiting on multiple file descriptors, it is unclear whether it would be easy (or even possible) for all servers to implement it. Also, the appropriate behavior for a multi-descriptor wait is not obvious. (Should the application be resumed when a single descriptor is ready? All of them? Some minimum number?) From manlio_perillo at libero.it Wed May 21 19:34:19 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Wed, 21 May 2008 19:34:19 +0200 Subject: [Web-SIG] Proposed specification: waiting for file descriptor events In-Reply-To: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu> References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu> Message-ID: <48345D1B.2030905@libero.it> Christopher Stawarz ha scritto: > This is the third draft of my proposed extensions for better supporting > WSGI apps on asynchronous servers. The major changes since the last > draft are as follows: > First of all, thanks for your effort. > * The title and abstract now accurately reflect the scope of the proposal. > In addition, the extensions are now in the namespace "x-wsgiorg.fdevent" > (instead of "x-wsgiorg.async"). > > * The proposal for an alternative, non-blocking input stream has been > dropped, since I don't see a way to add one that wouldn't break > middleware. Well, IMHO the "general" solution here is to use greenlets. > Instead, the spec recommends that async servers pre-read the request body > before invoking the app (either by default or as a configurable option). > This is the best solution most of the time (but not for all of the time), especially if the "server" can do some "pre-parsing" of multipart/form-data request body. In fact I plan to write a custom function (in C for Nginx) that will "reduce", as an example: Content-Type: multipart/form-data; boundary=AaB03x --AaB03x Content-Disposition: form-data; name="submit-name" Larry --AaB03x Content-Disposition: form-data; name="files"; filename="file1.txt" Content-Type: text/plain ... contents of file1.txt ... --AaB03x-- to (not properly escaped): Content-Type: application/x-www-form-urlencoded submit-name=Larry&files.filename=file1.txt&files.ctype=text/plain&files.path=xxx and the contents of file1.txt will be saved to a temporary file 'xxx'. > > Once again, I'd appreciate your comments. > I have some comments: 1) Why not add a more generic poll like interface? Moreover IMHO storing a timeout variable in the environ to check if the previous call timedout, is not the best solution. In my implementation I return a function, but with generators in Python 2.5 this can be done in a better way. 2) In Nginx it is not possible to simply handle "plain" file descriptors, since these are wrapped in a connection structure. This is the reason why I had to add a connection_wrapper function in my WSGI module for Nginx. 3) If you read an example that implements a database connection pool: http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py you can see that there is a problem. In fact the pool is not very flexible; the application can not handle more than POOL_SIZE concurrent requests. However it is possible to just have a new request to wait until a previous connection is free (or a timeout occurs). I have attached an example (it is not in the repository since there are some problems). The examples use a new extension: - ctx = environ['ngx.request_context']() - ctx.resume() ctx.resume() "asynchronously" resumes the given request (it will be resumed as soon as control returns to Nginx, when the application yields something). Note that the problem of resuming another request is easily solved with greenlets, without the need to new extensions (this is one of the reason why I like greenlets). > [...] Regards Manlio Perillo -------------- next part -------------- A non-text attachment was scrubbed... Name: nginx-postgres-async-2.py Type: text/x-python Size: 4155 bytes Desc: not available URL: From manlio_perillo at libero.it Thu May 22 10:51:09 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Thu, 22 May 2008 10:51:09 +0200 Subject: [Web-SIG] WSGI and greenlets In-Reply-To: <9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <48203045.60504@libero.it> <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu> <48216BE7.5010000@libero.it> <9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu> Message-ID: <483533FD.8090707@libero.it> Christopher Stawarz ha scritto: > On May 7, 2008, at 4:44 AM, Manlio Perillo wrote: > [...] >> I don't think this will solve the problem. >> Moreover in your example you buffer the whole request body so that you >> have to yield only one time. > > Your example was: > > def application(environ, start_response): > def nested(): > while True: > poll(xxx) > yield '' > yield result > > for r in nested(): > if not r: > yield '' > > yield r > > My suggestion would allow you to rewrite this like so: > > @awsgiref.callstack.add_callstack > def application(environ, start_response): > def nested(): > while True: > poll(xxx) > yield '' > yield result > > yield nested() > > The nesting can be arbitrarily deep, so nested() could yield > doubly_nested() and so on. While not as elegant as greenlets, I think > this does address your concern. > I'm reading the PEP 342, and I still think that this will not work as I want for Nginx (where I have no control over the "scheduler"). In fact the PEP 342 says: """However, if it were possible to pass values or exceptions *into* a generator at the point where it was suspended, a simple co-routine scheduler or "trampoline function" would let coroutines "call" each other without blocking.""" However writing a co-routine scheduler or "trampoline function" when your application is embedded in an external server is not possible (but please, correct me if I'm wrong). > [...] Regards Manlio Perillo From cstawarz at csail.mit.edu Thu May 22 18:30:47 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Thu, 22 May 2008 12:30:47 -0400 Subject: [Web-SIG] Proposed specification: waiting for file descriptor events In-Reply-To: <48345D1B.2030905@libero.it> References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu> <48345D1B.2030905@libero.it> Message-ID: <4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu> On May 21, 2008, at 1:34 PM, Manlio Perillo wrote: >> Instead, the spec recommends that async servers pre-read the >> request body >> before invoking the app (either by default or as a configurable >> option). > > This is the best solution most of the time (but not for all of the > time), especially if the "server" can do some "pre-parsing" of > multipart/form-data request body. > > In fact I plan to write a custom function (in C for Nginx) that will > "reduce", as an example: > > Content-Type: multipart/form-data; boundary=AaB03x > > --AaB03x > Content-Disposition: form-data; name="submit-name" > > Larry > --AaB03x > Content-Disposition: form-data; name="files"; filename="file1.txt" > Content-Type: text/plain > > ... contents of file1.txt ... > --AaB03x-- > > to (not properly escaped): > > Content-Type: application/x-www-form-urlencoded > > submit-name=Larry&files.filename=file1.txt&files.ctype=text/ > plain&files.path=xxx > > > and the contents of file1.txt will be saved to a temporary file 'xxx'. It seems like you're making this more complicated than it needs to be. Why not just store the entire request body in a temporary file, and then pass an open handle to it as wsgi.input? That way, the server doesn't have to rewrite the request, and the application doesn't need to know how to interpret the files.* parameters. > 1) Why not add a more generic poll like interface? Because such an interface would be more complicated than what I've proposed and harder for server authors to implement. Also, I'm not sure that it gains you much. Note that I'm not 100% sure on this, as I tried to indicate in the "Open Issues" section of my proposal. The approach I'd like to take is to try writing apps with my interface for a while, and if real- world usage shows that a poll-like interface would be very useful (or necessary), then the spec could be extended to add one. I think this is a safe route, since the readable/writable functions could easily be implemented in terms of a more generic poll-like interface, so existing apps that use the fdevent extensions would continue to work. > Moreover IMHO storing a timeout variable in the environ to check if > the previous call timedout, is not the best solution. I think it's a simple and effective solution. Server authors don't need to implement any new functions or data types. They just create and hold on to a mutable object instance (the simplest being a list instance) for each app instance and toggle its truth value as required. > In my implementation I return a function, but with generators in > Python 2.5 this can be done in a better way. What advantage does this have over what I've proposed? > 2) In Nginx it is not possible to simply handle "plain" file > descriptors, since these are wrapped in a connection structure. > > This is the reason why I had to add a connection_wrapper function in > my WSGI module for Nginx. But the connection structure just wraps an integer file descriptor, right? So the readable/writable functions can create the required wrapper to register with nginx. There's no reason to make the application author do it. > 3) If you read an example that implements a database connection pool: > http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py > > you can see that there is a problem. > > In fact the pool is not very flexible; the application can not > handle > more than POOL_SIZE concurrent requests. > > However it is possible to just have a new request to wait until a > previous connection is free (or a timeout occurs). > > I have attached an example (it is not in the repository since there > are some problems). > > The examples use a new extension: > > - ctx = environ['ngx.request_context']() > - ctx.resume() > > ctx.resume() "asynchronously" resumes the given request > (it will be resumed as soon as control returns to Nginx, when the > application yields something). > > > Note that the problem of resuming another request is easily solved > with greenlets, without the need to new extensions > (this is one of the reason why I like greenlets). Right, you want something like Queue.Queue, but for exchanging data between request handlers in the same thread. Since this is a different problem from waiting on file descriptors, it's outside the scope of my proposal. However, one way you might implement something like this using my proposal would be to run the connection-pool manager in a separate thread, and have request handlers talk to it over sockets. Kind of ugly, but I think it would do the job. Chris From cstawarz at csail.mit.edu Thu May 22 20:10:02 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Thu, 22 May 2008 14:10:02 -0400 Subject: [Web-SIG] WSGI and greenlets In-Reply-To: <483533FD.8090707@libero.it> References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu> <48203045.60504@libero.it> <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu> <48216BE7.5010000@libero.it> <9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu> <483533FD.8090707@libero.it> Message-ID: <5D29F147-3619-4156-A4F4-D7FD2EE2AFB1@csail.mit.edu> On May 22, 2008, at 4:51 AM, Manlio Perillo wrote: > I'm reading the PEP 342, and I still think that this will not work > as I want for Nginx (where I have no control over the "scheduler"). > > In fact the PEP 342 says: > """However, if it were possible to pass values or exceptions *into* a > generator at the point where it was suspended, a simple co-routine > scheduler or "trampoline function" would let coroutines "call" each > other without blocking.""" > > However writing a co-routine scheduler or "trampoline function" when > your application is embedded in an external server is not possible > (but please, correct me if I'm wrong). That's correct. My with_callstack wrapper supports calling subroutines (which can yield values to the server or return results to their caller) within a single application instance. It doesn't support switching between app instances, since that's the server's job. Therefore, it doesn't help with your DB connection pool example. Chris From manlio_perillo at libero.it Fri May 23 00:21:13 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Fri, 23 May 2008 00:21:13 +0200 Subject: [Web-SIG] Proposed specification: waiting for file descriptor events In-Reply-To: <4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu> References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu> <48345D1B.2030905@libero.it> <4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu> Message-ID: <4835F1D9.3070406@libero.it> Christopher Stawarz ha scritto: > On May 21, 2008, at 1:34 PM, Manlio Perillo wrote: > >>> Instead, the spec recommends that async servers pre-read the request >>> body >>> before invoking the app (either by default or as a configurable >>> option). >> >> This is the best solution most of the time (but not for all of the >> time), especially if the "server" can do some "pre-parsing" of >> multipart/form-data request body. >> >> In fact I plan to write a custom function (in C for Nginx) that will >> "reduce", as an example: >> >> Content-Type: multipart/form-data; boundary=AaB03x >> >> --AaB03x >> Content-Disposition: form-data; name="submit-name" >> >> Larry >> --AaB03x >> Content-Disposition: form-data; name="files"; filename="file1.txt" >> Content-Type: text/plain >> >> ... contents of file1.txt ... >> --AaB03x-- >> >> to (not properly escaped): >> >> Content-Type: application/x-www-form-urlencoded >> >> submit-name=Larry&files.filename=file1.txt&files.ctype=text/plain&files.path=xxx >> >> >> >> and the contents of file1.txt will be saved to a temporary file 'xxx'. > > It seems like you're making this more complicated than it needs to be. > Why not just store the entire request body in a temporary file, and then > pass an open handle to it as wsgi.input? Because if you have a big file (like a video of > 100 MB), your application will block everything while parsing the request body. Parsing the body incrementally is far more efficient (although it is more hard). > That way, the server doesn't > have to rewrite the request, and the application doesn't need to know > how to interpret the files.* parameters. > How to interpret the files.* parameters is not really a problem. >> 1) Why not add a more generic poll like interface? > > Because such an interface would be more complicated than what I've > proposed and harder for server authors to implement. Also, I'm not sure > that it gains you much. > Well, I have modelled my extension so that it has a "well know" interface and that it is not hard to implement. But I have to say that I'm not sure if one want to poll multiple sockets. Moreover in my implementation ngx.poll only returns one "ready" socket at a time. By the way: I see a problem with you API. What happens if an application do: read, write, exc = m.fdset() environ['x-wsgiorg.fdevent.readable'](read[0], 1.0) environ['x-wsgiorg.fdevent.writable'](write[0], 1.0) yield '' There is no way to know, when the application is resumed, if the socket is ready for read or write. This probabily should not be a problem, but I'm not sure. > Note that I'm not 100% sure on this, as I tried to indicate in the "Open > Issues" section of my proposal. The approach I'd like to take is to try > writing apps with my interface for a while, and if real-world usage > shows that a poll-like interface would be very useful (or necessary), > then the spec could be extended to add one. I think this is a safe > route, since the readable/writable functions could easily be implemented > in terms of a more generic poll-like interface, so existing apps that > use the fdevent extensions would continue to work. > >> Moreover IMHO storing a timeout variable in the environ to check if >> the previous call timedout, is not the best solution. > > I think it's a simple and effective solution. Server authors don't need > to implement any new functions or data types. They just create and hold > on to a mutable object instance (the simplest being a list instance) for > each app instance and toggle its truth value as required. > >> In my implementation I return a function, but with generators in >> Python 2.5 this can be done in a better way. > > What advantage does this have over what I've proposed? > You don't need to store a mutable variable in the environ. >> 2) In Nginx it is not possible to simply handle "plain" file >> descriptors, since these are wrapped in a connection structure. >> >> This is the reason why I had to add a connection_wrapper function in >> my WSGI module for Nginx. > > But the connection structure just wraps an integer file descriptor, > right? So the readable/writable functions can create the required > wrapper to register with nginx. There's no reason to make the > application author do it. > The "problem" is that Ninx keeps a list of preallocated connection objects (the size of the list being controlled by worker_connections). This means that a newly constructed connection *must* be freed as soon as it is no more used, otherwise it can limit the number of concurrent connections that can be handled by Nginx. Since with my API (register/unregister) a connection should be kept alive until is is unregistered, I have choosen to create a wrapper for the Nginx connection object. Probabily with your API it can be possible to create temporary wrappers. But I don't know if this is a good idea. > [...] > Chris > Manlio Perillo From cstawarz at csail.mit.edu Fri May 23 17:12:37 2008 From: cstawarz at csail.mit.edu (Christopher Stawarz) Date: Fri, 23 May 2008 11:12:37 -0400 Subject: [Web-SIG] Proposed specification: waiting for file descriptor events In-Reply-To: <4835F1D9.3070406@libero.it> References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu> <48345D1B.2030905@libero.it> <4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu> <4835F1D9.3070406@libero.it> Message-ID: On May 22, 2008, at 6:21 PM, Manlio Perillo wrote: >> That way, the server doesn't have to rewrite the request, and the >> application doesn't need to know how to interpret the files.* >> parameters. > > How to interpret the files.* parameters is not really a problem. It's a problem for a portable application, which will have to be able to parse both the original request and your server's rewritten version of it. In any case, your request rewriting is compatible with my proposal. > By the way: I see a problem with you API. > What happens if an application do: > > read, write, exc = m.fdset() > > environ['x-wsgiorg.fdevent.readable'](read[0], 1.0) > environ['x-wsgiorg.fdevent.writable'](write[0], 1.0) > > yield '' > > There is no way to know, when the application is resumed, if the > socket is ready for read or write. > > This probabily should not be a problem, but I'm not sure. The result of doing this is undefined, and I've updated the spec to say so. The application shouldn't do it, and the server should probably throw an error if it does. >>> In my implementation I return a function, but with generators in >>> Python 2.5 this can be done in a better way. >> What advantage does this have over what I've proposed? > > You don't need to store a mutable variable in the environ. I don't see any problem with a mutable environ variable, especially if it makes things simpler for server and application authors. But if you want to do something fancier (like raising a Timeout exception in a Python 2.5 generator), then it's easy to write a wrapper that does so. Chris