From paul at boddie.org.uk  Thu May  1 01:59:05 2008
From: paul at boddie.org.uk (Paul Boddie)
Date: Thu, 1 May 2008 01:59:05 +0200
Subject: [Web-SIG] Web Activities at EuroPython 2008?
Message-ID: <200805010159.05109.paul@boddie.org.uk>

Hello,

It's not often that I find myself posting to the Web-SIG list these days, but 
I find myself almost obliged to ask whether anyone is considering submitting 
talk proposals about Web programming to the EuroPython 2008 conference (to be 
held in Vilnius, Lithuania from 7th July until 9th July, with sprinting 
possibilities from 10th July until 12th July).

In my monitoring of the Internet for mentions of EuroPython, I see that 
there's a survey out there which is asking for opinions about a Zope 
conference [1], with questions related to EuroPython. Last year, there were 
quite a few Zope talks, although perhaps not at the level enjoyed when there 
was a special Zope track at EuroPython. A few other Web-related technologies 
did also get coverage in the schedule, however: FormEncode, Genshi, KSS, 
Nevow, Pylons, Silva, WSGI, to name just a few.

Nevertheless, Web programming (including and beyond Zope) has always been a 
major component of EuroPython, and it would certainly be interesting to see 
talks describing what people are doing with Python on the Web, whether it be 
the development of classic server-side Web applications, the usage of Python 
on the client side, or even the management of infrastructure using Python - 
large-scale computing is becoming an increasingly popular topic.

Anyway, details of talk submissions and other activities at the conference can 
be found here:

  http://www.europython.org/community/CallForParticipation

And the EuroPython site can be found here:

  http://www.europython.org/

Yes, it's running MoinMoin - a possibly unfashionable choice (and arguably 
unpopular in certain circles) - but maybe even the MoinMoin developers might 
consider sharing some of their insights on developing and customising 
MoinMoin as a talk. ;-)

I look forward to seeing many talk submissions of a Web-related kind!

Paul

[1] http://www.surveymonkey.com/s.aspx?sm=1QRKEu8eTs2gNjiYPOCsBA_3d_3d

From manlio_perillo at libero.it  Fri May  2 23:03:43 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Fri, 02 May 2008 23:03:43 +0200
Subject: [Web-SIG] [proposal] wsgiref.util.abs_url
Message-ID: <481B81AF.7050300@libero.it>

Hi.

I think that a function like (not tested):

def abs_url(environ, relative_url):
     """Return the absolute url"""
     url = environ['wsgi.url_scheme']+'://'
     from urllib import quote

     if environ.get('HTTP_HOST'):
         url += environ['HTTP_HOST']
     else:
         url += environ['SERVER_NAME']

         if environ['wsgi.url_scheme'] == 'https':
             if environ['SERVER_PORT'] != '443':
                 url += ':' + environ['SERVER_PORT']
         else:
             if environ['SERVER_PORT'] != '80':
                 url += ':' + environ['SERVER_PORT']

     url += quote(relative_url)
     return url


would be an useful addition to the wsgiref.util module.


What do you think?


Thanks   Manlio Perillo

From pje at telecommunity.com  Sun May  4 19:43:23 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 04 May 2008 13:43:23 -0400
Subject: [Web-SIG] [proposal] wsgiref.util.abs_url
In-Reply-To: <481B81AF.7050300@libero.it>
References: <481B81AF.7050300@libero.it>
Message-ID: <20080504174919.4AB8C3A4036@sparrow.telecommunity.com>

At 11:03 PM 5/2/2008 +0200, Manlio Perillo wrote:
>Hi.
>
>I think that a function like (not tested):
>
>def abs_url(environ, relative_url):
>     """Return the absolute url"""
[...]
>     url += quote(relative_url)
>     return url
>
>would be an useful addition to the wsgiref.util module.
>
>
>What do you think?

I think that it doesn't accept a relative URL, it accepts an absolute path.

I also think that using urlparse.urljoin() with either request_uri() 
or application_uri() would be a clearer (and tested) way to obtain an 
absolute URL, and more generally useful.


From manlio_perillo at libero.it  Mon May  5 18:27:33 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Mon, 05 May 2008 18:27:33 +0200
Subject: [Web-SIG] [proposal] wsgiref.util.abs_url
In-Reply-To: <20080504174919.4AB8C3A4036@sparrow.telecommunity.com>
References: <481B81AF.7050300@libero.it>
	<20080504174919.4AB8C3A4036@sparrow.telecommunity.com>
Message-ID: <481F3575.50800@libero.it>

Phillip J. Eby ha scritto:
> At 11:03 PM 5/2/2008 +0200, Manlio Perillo wrote:
>> Hi.
>>
>> I think that a function like (not tested):
>>
>> def abs_url(environ, relative_url):
>>     """Return the absolute url"""
> [...]
>>     url += quote(relative_url)
>>     return url
>>
>> would be an useful addition to the wsgiref.util module.
>>
>>
>> What do you think?
> 
> I think that it doesn't accept a relative URL, it accepts an absolute path.
> 

What do you mean?

  environ = {}
  setup_testing_defaults(environ)

  url = '/a/b/'
    self.failUnlessEqual(
       util.abs_url(environ, url), 'http://127.0.0.1/a/b/')

> I also think that using urlparse.urljoin() with either request_uri() or 
> application_uri() would be a clearer (and tested) way to obtain an 
> absolute URL, and more generally useful.
> 

But application_uri also includes SCRIPT_NAME.


Regards   Manlio Perillo

From pje at telecommunity.com  Mon May  5 19:39:50 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 05 May 2008 13:39:50 -0400
Subject: [Web-SIG] [proposal] wsgiref.util.abs_url
In-Reply-To: <481F3575.50800@libero.it>
References: <481B81AF.7050300@libero.it>
	<20080504174919.4AB8C3A4036@sparrow.telecommunity.com>
	<481F3575.50800@libero.it>
Message-ID: <20080505173931.D3BA13A4036@sparrow.telecommunity.com>

At 06:27 PM 5/5/2008 +0200, Manlio Perillo wrote:
>Phillip J. Eby ha scritto:
>>I think that it doesn't accept a relative URL, it accepts an absolute path.
>
>What do you mean?
>
>  environ = {}
>  setup_testing_defaults(environ)
>
>  url = '/a/b/'

That's a relative URL that's also an absolute path.  Try a relative 
URL like './a/b', or just plain 'a/b'.


>    self.failUnlessEqual(
>       util.abs_url(environ, url), 'http://127.0.0.1/a/b/')
>
>>I also think that using urlparse.urljoin() with either 
>>request_uri() or application_uri() would be a clearer (and tested) 
>>way to obtain an absolute URL, and more generally useful.
>
>But application_uri also includes SCRIPT_NAME.

Yes, and you might want to use it as the base against which a 
relative URL will be resolved -- i.e. an application-relative URL, 
vs. a request-relative URL.  In fact, application_uri() would 
probably be *more* useful, since if you want a request-relative URL, 
there's no need to turn it into an absolute URL, since you could just 
use it in its relative form.

Note, however, that in either case, using a relative URL that's an 
absolute path (e.g. '/a/b'), will still produce the same result as 
your function would.  It's just that urljoin also works properly for 
all kinds of relative urls, not just the absolute-path subset.


From cstawarz at csail.mit.edu  Tue May  6 03:30:27 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Mon, 5 May 2008 21:30:27 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
Message-ID: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>

(I'm new to the list, so please forgive me for making my first post a
specification proposal :)

Browsing through the list archives, I see there's been some
inconclusive discussions on adding better support for asynchronous web
servers to the WSGI spec.  Since such support would be very useful for
some upcoming projects of mine, I decided to take a shot at specing
out and implementing it.  I'd be grateful for any feedback you have.
If this seems like something worth pursuing, I would also welcome
collaborators to help develop the spec further.

The name for this proposed specification is the Asynchronous Web
Server Gateway Interface (AWSGI).  As the name suggests, the spec is
closely related to WSGI and is most easily described in terms of how
it differs from WSGI.  AWSGI eliminates the following parts of WSGI:

   - the environment variables wsgi.version and wsgi.input

   - the write() callable returned by start_response()

AWSGI adds the following environment variables:

   - awsgi.version
   - awsgi.input
   - awsgi.readable
   - awsgi.writable
   - awsgi.timeout

In addition, AWSGI allows the application iterable to yield two types
of data:

   - byte strings, handled as in WSGI

   - the result of calling awsgi.readable or awsgi.writable, which
     indicates that the application should be paused and restarted when
     a specified file descriptor is ready for reading or writing

Because of AWSGI's similarity to WSGI, a simple wrapper can be used to
run AWSGI applications on WSGI servers without alteration.

The following example application demonstrates typical usage of AWSGI.
This application simply reads the request body and sends it back to
the client.  Each time it wants to receive data from the client, it
first tests awsgi.input for readability and then calls its recv()
method.  If awsgi.input is not readable after one second, the
application sends a "408 Request Timeout" response to the client and
terminates:


   def echo_request_body(environ, start_response):
       input = environ['awsgi.input']
       readable = environ['awsgi.readable']

       nbytes = int(environ.get('CONTENT_LENGTH') or 0)
       output = ''
       while nbytes:
           yield readable(input, 1.0)  # Time out after 1 second

           if environ['awsgi.timeout']:
               msg = 'The request timed out.'
               start_response('408 Request Timeout',
                              [('Content-Type', 'text/plain'),
                               ('Content-Length', str(len(msg)))])
               yield msg
               return

           data = input.recv(nbytes)
           if not data:
               break
           output += data
           nbytes -= len(data)

       start_response('200 OK', [('Content-Type', 'text/plain'),
                                 ('Content-Length', str(len(output)))])
       yield output


I have rough but functional implementations of a number of AWSGI
components available in a Bazaar branch at
http://pseudogreen.org/bzr/awsgiref/.  The package includes an
asyncore-based AWSGI server and an AWSGI-to-WSGI application wrapper.
In addition, the file spec.txt contains a more detailed description of
the specification (which is also appended below).

Again, I'd very much appreciate comments and criticism.


Thanks,
Chris


Detailed AWSGI Specification
----------------------------

- Required AWSGI environ variables:

   * All variables required by WSGI, except for wsgi.version and
     wsgi.input, which must *not* be present

   * awsgi.version => the tuple (1, 0)

   * awsgi.input

     This is an object with one method, recv(bufsize), which behaves
     like the socket method of the same name (although it doesn't
     support the optional flags parameter).  Before each call to
     recv(), the application must test awsgi.input for readability via
     awsgi.readable.  The result of calling recv() without doing so is
     undefined.

     (XXX: Should recv() handle EINTR for the application?)

   * awsgi.readable
   * awsgi.writable

     These are callables with the signature f(fd, timeout=None).  fd is
     either a file descriptor (i.e. int or long) or an object with a
     fileno() method that returns a file descriptor.

     timeout has the same semantics as the timeout parameter to
     select.select().  If the operation times out, awsgi.timeout will
     be true when the application resumes.

     In addition to checking readiness for reading or writing, servers
     should also monitor file descriptors for "exceptional" conditions
     (e.g. out-of-band data) and restart the application if they occur.

   * awsgi.timeout => boolean indicating whether the most recent read
     or write wait timed out (false if there have been no waits)

- start_response() must *not* return a write() callable, as this
   method of providing application output to the server is incompatible
   with asynchronous execution.

- The server must accept awsgi.input as input to awsgi.readable,
   either by providing an actual socket object or by special-case
   handling (i.e. awsgi.input needn't have a fileno() method, as long
   as the server handles it as if it did).

- Applications return iterators, which can yield:

   * a string => sent to client, just as in standard WSGI

   * the result of a call to awsgi.readable or awsgi.writable =>
     application is resumed when either the file descriptor is ready
     for reading/writing or the wait times out (in which case,
     awsgi.timeout will be true)

- Although AWSGI applications will *not* be directly compatible with
   WSGI servers, middleware will allow them to run as standard WSGI
   apps (with all I/O waits returning immediately).

- AWSGI servers will not support unmodified WSGI applications.  There
   are several reasons for this:

   - If the app does blocking I/O, it will block the entire server.

   - Calls to the read() method of wsgi.input may fail with
     EWOULDBLOCK, which an app expecting synchronous I/O probably won't
     be prepared to deal with.

   - The readline(), readlines(), and __iter__() methods of wsgi.input
     can require multiple network I/O operations, which is incompatible
     with asynchronous execution.

   - The write() callable returned by start_response() is inherently
     incompatible with asynchronous execution.

   Because of these issues, this specification aims for one-way
   compatibility between AWSGI and WSGI (i.e. the ability to run AWSGI
   apps on WSGI servers via middleware, but not vice versa).


From graham.dumpleton at gmail.com  Tue May  6 04:09:33 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Tue, 6 May 2008 12:09:33 +1000
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
Message-ID: <88e286470805051909w53ed2491taff222c9645a1f17@mail.gmail.com>

2008/5/6 Christopher Stawarz <cstawarz at csail.mit.edu>:
> (I'm new to the list, so please forgive me for making my first post a
>  specification proposal :)
>
>  Browsing through the list archives, I see there's been some
>  inconclusive discussions on adding better support for asynchronous web
>  servers to the WSGI spec.  Since such support would be very useful for
>  some upcoming projects of mine, I decided to take a shot at specing
>  out and implementing it.  I'd be grateful for any feedback you have.
>  If this seems like something worth pursuing, I would also welcome
>  collaborators to help develop the spec further.
>
>  The name for this proposed specification is the Asynchronous Web
>  Server Gateway Interface (AWSGI).  As the name suggests, the spec is
>  closely related to WSGI and is most easily described in terms of how
>  it differs from WSGI.  AWSGI eliminates the following parts of WSGI:
>
>   - the environment variables wsgi.version and wsgi.input
>
>   - the write() callable returned by start_response()
>
>  AWSGI adds the following environment variables:
>
>   - awsgi.version
>   - awsgi.input
>   - awsgi.readable
>   - awsgi.writable
>   - awsgi.timeout
>
>  In addition, AWSGI allows the application iterable to yield two types
>  of data:
>
>   - byte strings, handled as in WSGI
>
>   - the result of calling awsgi.readable or awsgi.writable, which
>     indicates that the application should be paused and restarted when
>     a specified file descriptor is ready for reading or writing
>
>  Because of AWSGI's similarity to WSGI, a simple wrapper can be used to
>  run AWSGI applications on WSGI servers without alteration.
>
>  The following example application demonstrates typical usage of AWSGI.
>  This application simply reads the request body and sends it back to
>  the client.  Each time it wants to receive data from the client, it
>  first tests awsgi.input for readability and then calls its recv()
>  method.  If awsgi.input is not readable after one second, the
>  application sends a "408 Request Timeout" response to the client and
>  terminates:
>
>
>   def echo_request_body(environ, start_response):
>       input = environ['awsgi.input']
>       readable = environ['awsgi.readable']
>
>       nbytes = int(environ.get('CONTENT_LENGTH') or 0)
>       output = ''
>       while nbytes:
>           yield readable(input, 1.0)  # Time out after 1 second
>
>           if environ['awsgi.timeout']:
>               msg = 'The request timed out.'
>               start_response('408 Request Timeout',
>                              [('Content-Type', 'text/plain'),
>                               ('Content-Length', str(len(msg)))])
>               yield msg
>               return
>
>           data = input.recv(nbytes)
>           if not data:
>               break
>           output += data
>           nbytes -= len(data)
>
>       start_response('200 OK', [('Content-Type', 'text/plain'),
>                                 ('Content-Length', str(len(output)))])
>       yield output
>
>
>  I have rough but functional implementations of a number of AWSGI
>  components available in a Bazaar branch at
>  http://pseudogreen.org/bzr/awsgiref/.  The package includes an
>  asyncore-based AWSGI server and an AWSGI-to-WSGI application wrapper.
>  In addition, the file spec.txt contains a more detailed description of
>  the specification (which is also appended below).
>
>  Again, I'd very much appreciate comments and criticism.
>
>
>  Thanks,
>  Chris
>
>
>
>
>  Detailed AWSGI Specification
>  ----------------------------
>
>  - Required AWSGI environ variables:
>
>   * All variables required by WSGI, except for wsgi.version and
>     wsgi.input, which must *not* be present
>
>   * awsgi.version => the tuple (1, 0)
>
>   * awsgi.input
>
>     This is an object with one method, recv(bufsize), which behaves
>     like the socket method of the same name (although it doesn't
>     support the optional flags parameter).  Before each call to
>     recv(), the application must test awsgi.input for readability via
>     awsgi.readable.  The result of calling recv() without doing so is
>     undefined.
>
>     (XXX: Should recv() handle EINTR for the application?)
>
>   * awsgi.readable
>   * awsgi.writable
>
>     These are callables with the signature f(fd, timeout=None).  fd is
>     either a file descriptor (i.e. int or long) or an object with a
>     fileno() method that returns a file descriptor.
>
>     timeout has the same semantics as the timeout parameter to
>     select.select().  If the operation times out, awsgi.timeout will
>     be true when the application resumes.
>
>     In addition to checking readiness for reading or writing, servers
>     should also monitor file descriptors for "exceptional" conditions
>     (e.g. out-of-band data) and restart the application if they occur.
>
>   * awsgi.timeout => boolean indicating whether the most recent read
>     or write wait timed out (false if there have been no waits)
>
>  - start_response() must *not* return a write() callable, as this
>   method of providing application output to the server is incompatible
>   with asynchronous execution.
>
>  - The server must accept awsgi.input as input to awsgi.readable,
>   either by providing an actual socket object or by special-case
>   handling (i.e. awsgi.input needn't have a fileno() method, as long
>   as the server handles it as if it did).
>
>  - Applications return iterators, which can yield:
>
>   * a string => sent to client, just as in standard WSGI
>
>   * the result of a call to awsgi.readable or awsgi.writable =>
>     application is resumed when either the file descriptor is ready
>     for reading/writing or the wait times out (in which case,
>     awsgi.timeout will be true)
>
>  - Although AWSGI applications will *not* be directly compatible with
>   WSGI servers, middleware will allow them to run as standard WSGI
>   apps (with all I/O waits returning immediately).
>
>  - AWSGI servers will not support unmodified WSGI applications.  There
>   are several reasons for this:
>
>   - If the app does blocking I/O, it will block the entire server.
>
>   - Calls to the read() method of wsgi.input may fail with
>     EWOULDBLOCK, which an app expecting synchronous I/O probably won't
>     be prepared to deal with.
>
>   - The readline(), readlines(), and __iter__() methods of wsgi.input
>     can require multiple network I/O operations, which is incompatible
>     with asynchronous execution.
>
>   - The write() callable returned by start_response() is inherently
>     incompatible with asynchronous execution.
>
>   Because of these issues, this specification aims for one-way
>   compatibility between AWSGI and WSGI (i.e. the ability to run AWSGI
>   apps on WSGI servers via middleware, but not vice versa).


No time to understand all this, but a few comments.

 If write() isn't to be returned by start_response(), then do away with
 start_response() if possible as per discussions for WSGI 2.0. See:

  http://www.wsgi.org/wsgi/WSGI_2.0

 In other words, perhaps better aligning it to proposals for WSGI 2.0
 and not to WSGI 1.0.

 Also take note of:

  http://www.wsgi.org/wsgi/Amendments_1.0

 and think about how Python 3.0 would affect things.

 I'd also rather it not be called AWSGI as not sufficient distinct from
 WSGI. If you want to pursue this asynchronous style, then be more
 explicitly and call it ASYNC-WSGI and use 'asyncwsgi' tag in environ.

 Graham

From manlio_perillo at libero.it  Tue May  6 12:17:41 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 06 May 2008 12:17:41 +0200
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
Message-ID: <48203045.60504@libero.it>

Christopher Stawarz ha scritto:
> (I'm new to the list, so please forgive me for making my first post a
> specification proposal :)
> 
> Browsing through the list archives, I see there's been some
> inconclusive discussions on adding better support for asynchronous web
> servers to the WSGI spec.  Since such support would be very useful for
> some upcoming projects of mine, I decided to take a shot at specing
> out and implementing it.  I'd be grateful for any feedback you have.
> If this seems like something worth pursuing, I would also welcome
> collaborators to help develop the spec further.
> 

I'm glad to know that there are some other people interested in 
asynchronous application, do you have seen my extensions to WSGI in my 
module for Nginx?

The extension is documented here:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/README

see the Extensions chapter.

For some examples:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-poll-sleep.py

Note that in Nginx the request body is pre-read before the application 
is called (in fact wsgi.input is either a cStringIO or File object).


Unfortunately there is a *big* usability problem: the extension is based 
on a well specified feature of WSGI: the gateway can suspend the 
execution of the WSGI application when it yields.

However if the asynchronous code is present in a "child" function, we 
have something like this:

def application(environ, start_response):
    def nested():
       while True:
          poll(xxx)
          yield ''

       yield result


    for r in nested():
       if not r:
           yield ''

    yield r


That is, all the functions in the "chain" have to yield, and is not very 
good.


The solution is to use coroutines, and I'm planning to integrate 
greenlets (from the pylib project) into the WSGI module for Nginx.


 > [...]


Regards   Manlio Perillo

From manlio_perillo at libero.it  Tue May  6 12:40:51 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 06 May 2008 12:40:51 +0200
Subject: [Web-SIG] [proposal] wsgiref.util.abs_url
In-Reply-To: <20080505173931.D3BA13A4036@sparrow.telecommunity.com>
References: <481B81AF.7050300@libero.it>
	<20080504174919.4AB8C3A4036@sparrow.telecommunity.com>
	<481F3575.50800@libero.it>
	<20080505173931.D3BA13A4036@sparrow.telecommunity.com>
Message-ID: <482035B3.1090906@libero.it>

Phillip J. Eby ha scritto:
> At 06:27 PM 5/5/2008 +0200, Manlio Perillo wrote:
>> Phillip J. Eby ha scritto:
>>> I think that it doesn't accept a relative URL, it accepts an absolute 
>>> path.
>>
>> What do you mean?
>>
>>  environ = {}
>>  setup_testing_defaults(environ)
>>
>>  url = '/a/b/'
> 
> That's a relative URL that's also an absolute path.  Try a relative URL 
> like './a/b', or just plain 'a/b'.
> 
> 
> 
>>    self.failUnlessEqual(
>>       util.abs_url(environ, url), 'http://127.0.0.1/a/b/')
>>
>>> I also think that using urlparse.urljoin() with either request_uri() 
>>> or application_uri() would be a clearer (and tested) way to obtain an 
>>> absolute URL, and more generally useful.
>>
>> But application_uri also includes SCRIPT_NAME.
> 
> Yes, and you might want to use it as the base against which a relative 
> URL will be resolved -- i.e. an application-relative URL, vs. a 
> request-relative URL.  In fact, application_uri() would probably be 
> *more* useful, since if you want a request-relative URL, there's no need 
> to turn it into an absolute URL, since you could just use it in its 
> relative form.
> 

Yes, but this is not always the case.

> Note, however, that in either case, using a relative URL that's an 
> absolute path (e.g. '/a/b'), will still produce the same result as your 
> function would.  It's just that urljoin also works properly for all 
> kinds of relative urls, not just the absolute-path subset.
>

You are right, thanks.


Regards  Manlio Perillo

From cstawarz at csail.mit.edu  Tue May  6 23:37:09 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Tue, 6 May 2008 17:37:09 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
Message-ID: <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>

On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote:

> If write() isn't to be returned by start_response(), then do away with
> start_response() if possible as per discussions for WSGI 2.0.

I think start_response() is necessary, because the application may  
need to yield for I/O readiness (e.g. to read the request body, as in  
my example app) before it decides what response status and headers to  
send.

> Also take note of:
>
>  http://www.wsgi.org/wsgi/Amendments_1.0
>
> and think about how Python 3.0 would affect things.

OK, will do.

> I'd also rather it not be called AWSGI as not sufficient distinct from
> WSGI. If you want to pursue this asynchronous style, then be more
> explicitly and call it ASYNC-WSGI and use 'asyncwsgi' tag in environ.

Good point.  It'd be easy to type "wsgi" when you meant "awsgi", or  
vice versa.  But I think I'd prefer "wsgi_async" to "asyncwsgi".


Thanks,
Chris

From cstawarz at csail.mit.edu  Wed May  7 00:01:19 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Tue, 6 May 2008 18:01:19 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <48203045.60504@libero.it>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<48203045.60504@libero.it>
Message-ID: <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu>

On May 6, 2008, at 6:17 AM, Manlio Perillo wrote:

> I'm glad to know that there are some other people interested in  
> asynchronous application, do you have seen my extensions to WSGI in  
> my module for Nginx?

Yes, I have, and I had your module in mind as a potential provider of  
the AWSGI interface.

> Note that in Nginx the request body is pre-read before the  
> application is called (in fact wsgi.input is either a cStringIO or  
> File object).

Although I didn't state it explicitly in my spec, my intention is for  
the server to be able to implement awsgi.input in any way it likes, as  
long as it provides a recv() method.  It's totally acceptable for the  
request body to be pre-read.

> Unfortunately there is a *big* usability problem: the extension is  
> based on a well specified feature of WSGI: the gateway can suspend  
> the execution of the WSGI application when it yields.
>
> However if the asynchronous code is present in a "child" function,  
> we have something like this:
> ...
> That is, all the functions in the "chain" have to yield, and is not  
> very good.

Yes, you're right.  However, if you're willing/able to use Python 2.5,  
you can use the new features of generators to implement a call stack  
that lets you call child functions and receive return values and  
exceptions from them.  I've implemented this in awsgiref.callstack.   
Have a look at

   http://pseudogreen.org/bzr/awsgiref/examples/echo_request_with_callstack.py

for an example of how it works.

> The solution is to use coroutines, and I'm planning to integrate  
> greenlets (from the pylib project) into the WSGI module for Nginx.

Interesting, but it's not clear to me how/if this would work.  Can you  
explain more or point me to some code?


Thanks,
Chris

From graham.dumpleton at gmail.com  Wed May  7 01:02:49 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Wed, 7 May 2008 09:02:49 +1000
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
	<68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
Message-ID: <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>

2008/5/7 Christopher Stawarz <cstawarz at csail.mit.edu>:
> On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote:
>
>
> > If write() isn't to be returned by start_response(), then do away with
> > start_response() if possible as per discussions for WSGI 2.0.
>
>  I think start_response() is necessary, because the application may need to
> yield for I/O readiness (e.g. to read the request body, as in my example
> app) before it decides what response status and headers to send.

One could come up with other ways of doing it which aligns better with
WSGI 2.0. I previously gave an idea as a starting point for
discussion, but don't think others really understood what I was
suggesting. But then I did post it at 4am in the morning in the middle
of a baby induced period of sleep deprivation. See post 24 in:

http://groups.google.com/group/python-web-sig/tree/browse_frm/thread/74c1f8cf15adf114/d98086a8db568ebd?rnum=24

I think what was missed by others was that I wasn't suggest that the
102 code be sent all the way back to the client, but as a convention
between WSGI application and underlying WSGI adapter only, to
facilitate the ability to return control back to the WSGI adapter
before one had decided what actual response headers to send. This
seems to align with what you want.

Graham

From ionel.mc at gmail.com  Wed May  7 02:51:13 2008
From: ionel.mc at gmail.com (Ionel Maries Cristian)
Date: Wed, 7 May 2008 03:51:13 +0300
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
Message-ID: <b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>

This is a very interesting initiative.

However there are few problems:
- there is no support for chunked input - that would require having support
for readline in the first place, also, it should be the gateway's business
decoding the chunked input.
- the original wsgi spec somewhat has some support for streaming and
asynchronicity [*1]
- i don't see how removing the write callable will help (i don't see a issue
having the server providing a stringio.write as the write callable for
synchronous apps)
- passing nonstring values though middleware will make using/porting
existing wsgi middleware hairy (suppose you have a middleware that applies
some filter to the appiter - you'll have your code full of isinstance
nastiness)

Also, have you looked at the existing gateway implementations with
asynchronous support?
There are a bunch of them:
http://trac.wiretooth.com/public/wiki/asycwsgi
http://chiral.j4cbo.com/trac
http://wiki.secondlife.com/wiki/Eventlet
my own shot at the problem: http://code.google.com/p/cogen/
and manlio's mod_wsgi for nginx
(I may be missing some)

However there is absolutely no unity in handling the wsgi.input (or
equivalent)

[*1]In my implementation i do a bunch of tricks to make use of regular wsgi
middleware with async apps possible - i have a bunch of working examples
using pylons:
 - the extensions in the environ (like your environ['awsgi.readable'])
return a empty string that penetrates most[*2] middleware and set the actual
message (like your (token, fd, timeout) tuple on some internal object)
>From this point of view, an async middleware stack is just a set of
middleware that supports streaming.

Please see:
 http://cogen.googlecode.com/svn/trunk/docs/cogen.web.async.html
http://cogen.googlecode.com/svn/trunk/docs/cogen.web.wsgi.html


[*2] middleware that consume the app iter ruin that pattern, but regardless,
they are not compliant to the wsgi spec (see
http://www.python.org/dev/peps/pep-0333/#middleware-handling-of-block-boundaries
)
- notable examples are most of the exception handling middleware (they can't
work otherwise anyway)

On Tue, May 6, 2008 at 4:30 AM, Christopher Stawarz <cstawarz at csail.mit.edu>
wrote:

> (I'm new to the list, so please forgive me for making my first post a
> specification proposal :)
>
> Browsing through the list archives, I see there's been some
> inconclusive discussions on adding better support for asynchronous web
> servers to the WSGI spec.  Since such support would be very useful for
> some upcoming projects of mine, I decided to take a shot at specing
> out and implementing it.  I'd be grateful for any feedback you have.
> If this seems like something worth pursuing, I would also welcome
> collaborators to help develop the spec further.
>
> The name for this proposed specification is the Asynchronous Web
> Server Gateway Interface (AWSGI).  As the name suggests, the spec is
> closely related to WSGI and is most easily described in terms of how
> it differs from WSGI.  AWSGI eliminates the following parts of WSGI:
>
>  - the environment variables wsgi.version and wsgi.input
>
>  - the write() callable returned by start_response()
>
> AWSGI adds the following environment variables:
>
>  - awsgi.version
>  - awsgi.input
>  - awsgi.readable
>  - awsgi.writable
>  - awsgi.timeout
>
> In addition, AWSGI allows the application iterable to yield two types
> of data:
>
>  - byte strings, handled as in WSGI
>
>  - the result of calling awsgi.readable or awsgi.writable, which
>    indicates that the application should be paused and restarted when
>    a specified file descriptor is ready for reading or writing
>
> Because of AWSGI's similarity to WSGI, a simple wrapper can be used to
> run AWSGI applications on WSGI servers without alteration.
>
> The following example application demonstrates typical usage of AWSGI.
> This application simply reads the request body and sends it back to
> the client.  Each time it wants to receive data from the client, it
> first tests awsgi.input for readability and then calls its recv()
> method.  If awsgi.input is not readable after one second, the
> application sends a "408 Request Timeout" response to the client and
> terminates:
>
>
>  def echo_request_body(environ, start_response):
>      input = environ['awsgi.input']
>      readable = environ['awsgi.readable']
>
>      nbytes = int(environ.get('CONTENT_LENGTH') or 0)
>      output = ''
>      while nbytes:
>          yield readable(input, 1.0)  # Time out after 1 second
>
>          if environ['awsgi.timeout']:
>              msg = 'The request timed out.'
>              start_response('408 Request Timeout',
>                             [('Content-Type', 'text/plain'),
>                              ('Content-Length', str(len(msg)))])
>              yield msg
>              return
>
>          data = input.recv(nbytes)
>          if not data:
>              break
>          output += data
>          nbytes -= len(data)
>
>      start_response('200 OK', [('Content-Type', 'text/plain'),
>                                ('Content-Length', str(len(output)))])
>      yield output
>
>
> I have rough but functional implementations of a number of AWSGI
> components available in a Bazaar branch at
> http://pseudogreen.org/bzr/awsgiref/.  The package includes an
> asyncore-based AWSGI server and an AWSGI-to-WSGI application wrapper.
> In addition, the file spec.txt contains a more detailed description of
> the specification (which is also appended below).
>
> Again, I'd very much appreciate comments and criticism.
>
>
> Thanks,
> Chris
>
>
>
>
> Detailed AWSGI Specification
> ----------------------------
>
> - Required AWSGI environ variables:
>
>  * All variables required by WSGI, except for wsgi.version and
>    wsgi.input, which must *not* be present
>
>  * awsgi.version => the tuple (1, 0)
>
>  * awsgi.input
>
>    This is an object with one method, recv(bufsize), which behaves
>    like the socket method of the same name (although it doesn't
>    support the optional flags parameter).  Before each call to
>    recv(), the application must test awsgi.input for readability via
>    awsgi.readable.  The result of calling recv() without doing so is
>    undefined.
>
>    (XXX: Should recv() handle EINTR for the application?)
>
>  * awsgi.readable
>  * awsgi.writable
>
>    These are callables with the signature f(fd, timeout=None).  fd is
>    either a file descriptor (i.e. int or long) or an object with a
>    fileno() method that returns a file descriptor.
>
>    timeout has the same semantics as the timeout parameter to
>    select.select().  If the operation times out, awsgi.timeout will
>    be true when the application resumes.
>
>    In addition to checking readiness for reading or writing, servers
>    should also monitor file descriptors for "exceptional" conditions
>    (e.g. out-of-band data) and restart the application if they occur.
>
>  * awsgi.timeout => boolean indicating whether the most recent read
>    or write wait timed out (false if there have been no waits)
>
> - start_response() must *not* return a write() callable, as this
>  method of providing application output to the server is incompatible
>  with asynchronous execution.
>
> - The server must accept awsgi.input as input to awsgi.readable,
>  either by providing an actual socket object or by special-case
>  handling (i.e. awsgi.input needn't have a fileno() method, as long
>  as the server handles it as if it did).
>
> - Applications return iterators, which can yield:
>
>  * a string => sent to client, just as in standard WSGI
>
>  * the result of a call to awsgi.readable or awsgi.writable =>
>    application is resumed when either the file descriptor is ready
>    for reading/writing or the wait times out (in which case,
>    awsgi.timeout will be true)
>
> - Although AWSGI applications will *not* be directly compatible with
>  WSGI servers, middleware will allow them to run as standard WSGI
>  apps (with all I/O waits returning immediately).
>
> - AWSGI servers will not support unmodified WSGI applications.  There
>  are several reasons for this:
>
>  - If the app does blocking I/O, it will block the entire server.
>
>  - Calls to the read() method of wsgi.input may fail with
>    EWOULDBLOCK, which an app expecting synchronous I/O probably won't
>    be prepared to deal with.
>
>  - The readline(), readlines(), and __iter__() methods of wsgi.input
>    can require multiple network I/O operations, which is incompatible
>    with asynchronous execution.
>
>  - The write() callable returned by start_response() is inherently
>    incompatible with asynchronous execution.
>
>  Because of these issues, this specification aims for one-way
>  compatibility between AWSGI and WSGI (i.e. the ability to run AWSGI
>  apps on WSGI servers via middleware, but not vice versa).
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com
>


-- 
http://ionelmc.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20080507/386eda8e/attachment-0001.htm>

From manlio_perillo at libero.it  Wed May  7 09:59:47 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Wed, 07 May 2008 09:59:47 +0200
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>	<68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
	<88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>
Message-ID: <48216173.7000502@libero.it>

Graham Dumpleton ha scritto:
> 2008/5/7 Christopher Stawarz <cstawarz at csail.mit.edu>:
>> On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote:
>>
>>
>>> If write() isn't to be returned by start_response(), then do away with
>>> start_response() if possible as per discussions for WSGI 2.0.
>>  I think start_response() is necessary, because the application may need to
>> yield for I/O readiness (e.g. to read the request body, as in my example
>> app) before it decides what response status and headers to send.
> 
> One could come up with other ways of doing it which aligns better with
> WSGI 2.0. I previously gave an idea as a starting point for
> discussion, but don't think others really understood what I was
> suggesting. But then I did post it at 4am in the morning in the middle
> of a baby induced period of sleep deprivation. See post 24 in:
> 
> http://groups.google.com/group/python-web-sig/tree/browse_frm/thread/74c1f8cf15adf114/d98086a8db568ebd?rnum=24
> 
> I think what was missed by others was that I wasn't suggest that the
> 102 code be sent all the way back to the client, but as a convention
> between WSGI application and underlying WSGI adapter only, to
> facilitate the ability to return control back to the WSGI adapter
> before one had decided what actual response headers to send. This
> seems to align with what you want.
> 

Its seems a bit more complex to implement then the start_callable.

Moreover the whole point of removing the start_callable is to simplify 
the writing of middlewares.

With your solution it seems that writing middlewares will not became 
more easy.


> Graham


Manlio Perillo

From manlio_perillo at libero.it  Wed May  7 10:20:20 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Wed, 07 May 2008 10:20:20 +0200
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>
Message-ID: <48216644.7020300@libero.it>

Ionel Maries Cristian ha scritto:
> This is a very interesting initiative.
>  
> However there are few problems:
> - there is no support for chunked input - that would require having 
> support for readline in the first place, also, it should be the 
> gateway's business decoding the chunked input.

Unfortunately Nginx does not yet support chunked input, so I can't help 
here.

> - the original wsgi spec somewhat has some support for streaming and 
> asynchronicity [*1]

Right, and in fact I have used this for the implementation of some 
extensions in the WSGI module for Nginx.

> - i don't see how removing the write callable will help (i don't see a 
> issue having the server providing a stringio.write as the write callable 
> for synchronous apps)

To summarize: the main problem with the write callable is that after you 
call it control is not returned to the WSGI gateway.

With an asynchronous server it is a problem since if you write a lot of 
data the server may not be able to send it to the client.

This is not a problem if the application returns a generator, since the 
gateway can suspend the execution until the socket is ready to send data.

With the write callable this is not possible,

In my implementation of WSGI for Nginx I provide two separate 
implementation of the write callable:
- put the socket temporary in synchronous mode
   (this is WSGI compliant but it is very bad for Nginx)
- buffer all the written data until control is returned to the
   gateway (this is *not* WSGI compliant)


However if you use greenlets, then implementing the write callable is 
not a problem.

> - passing nonstring values though middleware will make using/porting 
> existing wsgi middleware hairy (suppose you have a middleware that 
> applies some filter to the appiter - you'll have your code full of 
> isinstance nastiness)
>  

Yes, this should be avoided.

> Also, have you looked at the existing gateway implementations with 
> asynchronous support?
> There are a bunch of them:
> http://trac.wiretooth.com/public/wiki/asycwsgi
> http://chiral.j4cbo.com/trac
> http://wiki.secondlife.com/wiki/Eventlet
> my own shot at the problem: http://code.google.com/p/cogen/
> and manlio's mod_wsgi for nginx
> (I may be missing some)
>  
> However there is absolutely no unity in handling the wsgi.input (or 
> equivalent)
>  

The wsgi.input can be handled with ngx.poll:

c = ngx.connection_wrapper(wsgi.input)
...

ngx.poll_register(c, WSGI_POLLIN)
...

ngx.poll(1000)


Unfortunately I can not test if this is implementable.
I have some doubts.


 > [...]


Manlio Perillo

From graham.dumpleton at gmail.com  Wed May  7 10:20:52 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Wed, 7 May 2008 18:20:52 +1000
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <48216173.7000502@libero.it>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
	<68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
	<88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>
	<48216173.7000502@libero.it>
Message-ID: <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com>

2008/5/7 Manlio Perillo <manlio_perillo at libero.it>:
> Graham Dumpleton ha scritto:
>
>
>
> > 2008/5/7 Christopher Stawarz <cstawarz at csail.mit.edu>:
> >
> > > On May 5, 2008, at 10:08 PM, Graham Dumpleton wrote:
> > >
> > >
> > >
> > > > If write() isn't to be returned by start_response(), then do away with
> > > > start_response() if possible as per discussions for WSGI 2.0.
> > > >
> > >  I think start_response() is necessary, because the application may need
> to
> > > yield for I/O readiness (e.g. to read the request body, as in my example
> > > app) before it decides what response status and headers to send.
> > >
> >
> > One could come up with other ways of doing it which aligns better with
> > WSGI 2.0. I previously gave an idea as a starting point for
> > discussion, but don't think others really understood what I was
> > suggesting. But then I did post it at 4am in the morning in the middle
> > of a baby induced period of sleep deprivation. See post 24 in:
> >
> >
> http://groups.google.com/group/python-web-sig/tree/browse_frm/thread/74c1f8cf15adf114/d98086a8db568ebd?rnum=24
> >
> > I think what was missed by others was that I wasn't suggest that the
> > 102 code be sent all the way back to the client, but as a convention
> > between WSGI application and underlying WSGI adapter only, to
> > facilitate the ability to return control back to the WSGI adapter
> > before one had decided what actual response headers to send. This
> > seems to align with what you want.
> >
> >
>
>  Its seems a bit more complex to implement then the start_callable.
>
>  Moreover the whole point of removing the start_callable is to simplify the
> writing of middlewares.
>
>  With your solution it seems that writing middlewares will not became more
> easy.

Part of what I was trying to say was that this needn't be exposed to
middlewares, unless it has to be. It was effectively a lower level of
interaction which a middleware immediately on top of the WSGI adapter
would use to hook into the async type model, but then present it to
higher levels as more traditional WSGI interface. That layer would
though obviously use something like greenlets to bridge the two. So, a
way of bringing the control of that bridge into the Python level,
rather than it being interwined and non separable from the underlying
WSGI adapter.

As I said, it was 4am, so probably didn't explain it very well. :-)

Graham

From manlio_perillo at libero.it  Wed May  7 10:44:23 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Wed, 07 May 2008 10:44:23 +0200
Subject: [Web-SIG] WSGI and greenlets
In-Reply-To: <8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<48203045.60504@libero.it>
	<8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu>
Message-ID: <48216BE7.5010000@libero.it>

Christopher Stawarz ha scritto:
> On May 6, 2008, at 6:17 AM, Manlio Perillo wrote:
> 
>> I'm glad to know that there are some other people interested in 
>> asynchronous application, do you have seen my extensions to WSGI in my 
>> module for Nginx?
> 
> Yes, I have, and I had your module in mind as a potential provider of 
> the AWSGI interface.
> 
>> Note that in Nginx the request body is pre-read before the application 
>> is called (in fact wsgi.input is either a cStringIO or File object).
> 
> Although I didn't state it explicitly in my spec, my intention is for 
> the server to be able to implement awsgi.input in any way it likes, as 
> long as it provides a recv() method.  It's totally acceptable for the 
> request body to be pre-read.
> 

Ok.
But what I meant was that since Nginx pre-read the request body I have 
not tried to implement an interface for dealing with an asynchronous 
wsgi.input ;-).


Moreover I don't see any readons to have a revc method instead of read.

>> Unfortunately there is a *big* usability problem: the extension is 
>> based on a well specified feature of WSGI: the gateway can suspend the 
>> execution of the WSGI application when it yields.
>>
>> However if the asynchronous code is present in a "child" function, we 
>> have something like this:
>> ...
>> That is, all the functions in the "chain" have to yield, and is not 
>> very good.
> 
> Yes, you're right.  However, if you're willing/able to use Python 2.5, 
> you can use the new features of generators to implement a call stack 
> that lets you call child functions and receive return values and 
> exceptions from them.  I've implemented this in awsgiref.callstack.  
> Have a look at
> 
>   
> http://pseudogreen.org/bzr/awsgiref/examples/echo_request_with_callstack.py
> 
> for an example of how it works.
> 

I don't think this will solve the problem.
Moreover in your example you buffer the whole request body so that you 
have to yield only one time.

>> The solution is to use coroutines, and I'm planning to integrate 
>> greenlets (from the pylib project) into the WSGI module for Nginx.
> 
> Interesting, but it's not clear to me how/if this would work.  Can you 
> explain more or point me to some code?
> 

http://codespeak.net/py/dist/greenlet.html

def process_commands(*args):
     while True:
         line = ''
         while not line.endswith('\n'):
             line += read_next_char()
         if line == 'quit\n':
             print "are you sure?"
             if read_next_char() != 'y':
                 continue    # ignore the command
         process_command(line)


With greenlets the execution can be suspened by any of the functions 
called by the main greelet.

This has a lot of advantages.

You can implement wsgi.input.read(n) so that it will suspend the 
execution of the current greenlet until *all* the n bytes have been read.

You can also implement the write callable so that control is returned to 
the main greelet when the socket is ready to send more data.

And, of course, you can implement a poll like interface and a sleep like 
interface.


I think that it is a great advantage, moreover it is the only way to 
implement truly reusable components.

Note that there is an effort of integrating greenlets with Twisted:
http://radix.twistedmatrix.com/2008/03/corotwine-01.html


The "problem" is that once you add support to greenlets, you have no 
more WSGI.

The interface can be the same, and applications can work on it without 
problems, but the semantic is *completely* different.


Also note that with greenlets should be possible to "magically" 
transform blocking applications like Django to non blocking.


The main problem I see with greenlet is that is is not yet stable (there 
are some problems with the garbage collector) and that is is not part of 
CPython.

This means that it can be not acceptable to write a PEP for a WSGI like 
interface with coroutine support.


> 
> Thanks,
> Chris
> 


Regards  Manlio Perillo

From manlio_perillo at libero.it  Wed May  7 11:23:04 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Wed, 07 May 2008 11:23:04 +0200
Subject: [Web-SIG] WSGI and greenlets
In-Reply-To: <48216BE7.5010000@libero.it>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>	<48203045.60504@libero.it>	<8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu>
	<48216BE7.5010000@libero.it>
Message-ID: <482174F8.6080600@libero.it>

Manlio Perillo ha scritto:
> [...]
> The main problem I see with greenlet is that is is not yet stable (there 
> are some problems with the garbage collector) and that is is not part of 
> CPython.
> 
> This means that it can be not acceptable to write a PEP for a WSGI like 
> interface with coroutine support.
> 

Maybe a solution can be to add a new variable to the WSGI environ:
wsgi.microthreads


When it is true it means that the WSGI implementation will execute the 
application inside a micro thread (may it be stackless, greenlet, pypy 
coroutine).


Also note that when using coroutines there will be no problems with WSGI 
2.0.

However I still think that we should release a WSGI 1.1 since many 
applications still use and will continue to use WSGI 1.x and a gateway 
will have to support WSGI 1.x in order to support both WSGI 1.x and 2.x


Regards  Manlio Perillo

From cstawarz at csail.mit.edu  Wed May  7 20:00:21 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Wed, 7 May 2008 14:00:21 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
	<68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
	<88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>
	<48216173.7000502@libero.it>
	<88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com>
Message-ID: <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu>

On May 7, 2008, at 4:20 AM, Graham Dumpleton wrote:

> 2008/5/7 Manlio Perillo <manlio_perillo at libero.it>:
>> With your solution it seems that writing middlewares will not  
>> became more
>> easy.
>
> Part of what I was trying to say was that this needn't be exposed to
> middlewares, unless it has to be. It was effectively a lower level of
> interaction which a middleware immediately on top of the WSGI adapter
> would use to hook into the async type model, but then present it to
> higher levels as more traditional WSGI interface.

That would be a really elegant solution, except, as you say:

> That layer would
> though obviously use something like greenlets to bridge the two.

The problem being that greenlets aren't part of the Python language.   
They're an extension that works by doing clever stuff with the C  
stack.  And as much as we might wish that Python supported them  
natively (which I do, since they're a really nice alternative to OS  
threads), it doesn't, so I don't think they can play any role in a  
WSGI-ASYNC spec.


Chris

From manlio_perillo at libero.it  Wed May  7 20:12:12 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Wed, 07 May 2008 20:12:12 +0200
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
	<68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
	<88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>
	<48216173.7000502@libero.it>
	<88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com>
	<8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu>
Message-ID: <4821F0FC.2090302@libero.it>

Christopher Stawarz ha scritto:
> On May 7, 2008, at 4:20 AM, Graham Dumpleton wrote:
> 
>> 2008/5/7 Manlio Perillo <manlio_perillo at libero.it>:
>>> With your solution it seems that writing middlewares will not became 
>>> more
>>> easy.
>>
>> Part of what I was trying to say was that this needn't be exposed to
>> middlewares, unless it has to be. It was effectively a lower level of
>> interaction which a middleware immediately on top of the WSGI adapter
>> would use to hook into the async type model, but then present it to
>> higher levels as more traditional WSGI interface.
> 
> That would be a really elegant solution, except, as you say:
> 
>> That layer would
>> though obviously use something like greenlets to bridge the two.
> 
> The problem being that greenlets aren't part of the Python language.  
> They're an extension that works by doing clever stuff with the C stack.  
> And as much as we might wish that Python supported them natively (which 
> I do, since they're a really nice alternative to OS threads), it 
> doesn't, so I don't think they can play any role in a WSGI-ASYNC spec.
> 

This is not fully true, after all WSGI explicitly exposes the concept of 
processes and threads (via the relative variable in the WSGI environ and 
some hints in the specification) and these are not really part of the 
Python Language.


> 
> Chris
> 


Manlio Perillo

From cstawarz at csail.mit.edu  Wed May  7 21:00:10 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Wed, 7 May 2008 15:00:10 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>
Message-ID: <4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu>

On May 6, 2008, at 8:51 PM, Ionel Maries Cristian wrote:

> - there is no support for chunked input - that would require having  
> support for readline in the first place,

Why is readline a requirement for chunked input?  Each chunk specifies  
its size, and the application receiving a chunk just keeps calling  
recv() until it's read the specified number of bytes.

> also, it should be the gateway's business decoding the chunked input.

OK, but if it's the gateway's responsibility, then this isn't an issue  
at all, as decoding of chunked data takes place before the application  
ever sees the request body.

To be clear, I didn't mean to imply that awsgi.input must be the  
actual socket object connected to the client.  It just has to provide  
a recv() method with the semantics of a socket.  The server is free to  
pre-read the entire request, or it can receive data on demand,  
decoding any chunked input before it passes it to the application.

> - i don't see how removing the write callable will help (i don't see  
> a issue having the server providing a stringio.write as the write  
> callable for synchronous apps)

Manlio explained this well, so I'll refer you to his response.

> - passing nonstring values though middleware will make using/porting  
> existing wsgi middleware hairy (suppose you have a middleware that  
> applies some filter to the appiter - you'll have your code full of  
> isinstance nastiness)

Yes, my proposal would require existing middleware to be modified to  
support AWSGI, which is unfortunate.

> Also, have you looked at the existing gateway implementations with  
> asynchronous support?
> There are a bunch of them:
> http://trac.wiretooth.com/public/wiki/asycwsgi
> http://chiral.j4cbo.com/trac
> http://wiki.secondlife.com/wiki/Eventlet
> my own shot at the problem: http://code.google.com/p/cogen/
> and manlio's mod_wsgi for nginx
> (I may be missing some)

I've seen some of these, but I'll be sure to take a look at the others.

> [*1]In my implementation i do a bunch of tricks to make use of  
> regular wsgi middleware with async apps possible - i have a bunch of  
> working examples using pylons:
>  - the extensions in the environ (like your  
> environ['awsgi.readable']) return a empty string that penetrates  
> most[*2] middleware and set the actual message (like your (token,  
> fd, timeout) tuple on some internal object)
> From this point of view, an async middleware stack is just a set of  
> middleware that supports streaming.

This is an interesting idea that I'd like to explore some more.  I  
really like the fact that it works with existing middleware (or at  
least fully WSGI-compliant middleware, as you point out).

Apart from the write() callable, the biggest issue I see with the WSGI  
spec for asynchronous servers is wsgi.input.  The problem is that this  
is explicitly a file-like object.  This means that input.read(n) reads  
until it finds n bytes or EOF, input.readline() reads until it finds a  
newline or EOF, and input.readlines() and input.__iter__() always read  
to EOF.  Every one of these functions implies multiple I/O operations  
(calls to fread() for a file or recv() for a socket).

This means that if an application calls input.read(8), and only 4  
bytes are available, the first call to recv() returns 4 bytes, and the  
second one blocks.  And now your entire server is blocked until data  
is available on this one socket.   (Of course, the server is free to  
pre-read the entire request at its leisure and feed it to the  
application from a buffer, but this may not always be practical or  
desirable, and I don't think asynchronous servers should be forced to  
do so.)

This is why I propose replacing wsgi.input with awsgi.input, which  
exposes a recv() method with socket-like (rather than file-like)  
semantics.  The meaning of input.recv(n) is therefore "read at most n  
bytes (possibly less), calling the underlying socket recv() at most  
one time".

So, although your suggestion may eliminate the need to yield non- 
string output from the application iterable, I still think there needs  
to be a separate specification for asynchronous gateways, since the  
semantics of wsgi.input just aren't compatible with an asynchronous  
model.


Chris

From duncan.mcgreggor at gmail.com  Wed May  7 21:35:31 2008
From: duncan.mcgreggor at gmail.com (Duncan McGreggor)
Date: Wed, 07 May 2008 14:35:31 -0500
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
	<68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
	<88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>
	<48216173.7000502@libero.it>
	<88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com>
	<8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu>
Message-ID: <1210188931.4546.14.camel@gondor>


On Wed, 2008-05-07 at 14:00 -0400, Christopher Stawarz wrote:
> On May 7, 2008, at 4:20 AM, Graham Dumpleton wrote:
> 
> > 2008/5/7 Manlio Perillo <manlio_perillo at libero.it>:
> >> With your solution it seems that writing middlewares will not  
> >> became more
> >> easy.
> >
> > Part of what I was trying to say was that this needn't be exposed to
> > middlewares, unless it has to be. It was effectively a lower level of
> > interaction which a middleware immediately on top of the WSGI adapter
> > would use to hook into the async type model, but then present it to
> > higher levels as more traditional WSGI interface.
> 
> That would be a really elegant solution, except, as you say:
> 
> > That layer would
> > though obviously use something like greenlets to bridge the two.
> 
> The problem being that greenlets aren't part of the Python language.   
> They're an extension that works by doing clever stuff with the C  
> stack.  And as much as we might wish that Python supported them  
> natively (which I do, since they're a really nice alternative to OS  
> threads), it doesn't, so I don't think they can play any role in a  
> WSGI-ASYNC spec.

It's my understanding that greenlets are python, not C. Are you thinking
of tasklets in stackless?

d


From cstawarz at csail.mit.edu  Wed May  7 21:54:59 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Wed, 7 May 2008 15:54:59 -0400
Subject: [Web-SIG] WSGI and greenlets
In-Reply-To: <48216BE7.5010000@libero.it>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<48203045.60504@libero.it>
	<8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu>
	<48216BE7.5010000@libero.it>
Message-ID: <9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu>

On May 7, 2008, at 4:44 AM, Manlio Perillo wrote:

> Moreover I don't see any readons to have a revc method instead of  
> read.

I just wanted to emphasize that its behavior is socket-like, not file- 
like.  It could be called read as long as its behavior is made clear  
to application developers.

>>> Unfortunately there is a *big* usability problem: the extension is  
>>> based on a well specified feature of WSGI: the gateway can suspend  
>>> the execution of the WSGI application when it yields.
>>>
>>> However if the asynchronous code is present in a "child" function,  
>>> we have something like this:
>>> ...
>>> That is, all the functions in the "chain" have to yield, and is  
>>> not very good.
>> Yes, you're right.  However, if you're willing/able to use Python  
>> 2.5, you can use the new features of generators to implement a call  
>> stack that lets you call child functions and receive return values  
>> and exceptions from them.  I've implemented this in  
>> awsgiref.callstack.  Have a look at
>>  http://pseudogreen.org/bzr/awsgiref/examples/echo_request_with_callstack.py
>> for an example of how it works.
>
> I don't think this will solve the problem.
> Moreover in your example you buffer the whole request body so that  
> you have to yield only one time.

Your example was:

def application(environ, start_response):
   def nested():
      while True:
         poll(xxx)
         yield ''
      yield result

   for r in nested():
      if not r:
          yield ''

   yield r

My suggestion would allow you to rewrite this like so:

@awsgiref.callstack.add_callstack
def application(environ, start_response):
   def nested():
      while True:
         poll(xxx)
         yield ''
      yield result

   yield nested()

The nesting can be arbitrarily deep, so nested() could yield  
doubly_nested() and so on.  While not as elegant as greenlets, I think  
this does address your concern.

> The main problem I see with greenlet is that is is not yet stable  
> (there are some problems with the garbage collector) and that is is  
> not part of CPython.
>
> This means that it can be not acceptable to write a PEP for a WSGI  
> like interface with coroutine support.

This is the problem I see with greenlets, too.  If they were part of  
the stdlib, it'd be a different story, but as things stand, I don't  
think they should be part of the spec.


Chris

From cstawarz at csail.mit.edu  Wed May  7 22:06:37 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Wed, 7 May 2008 16:06:37 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <1210188931.4546.14.camel@gondor>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<88e286470805051908g2c06fb08uff4ebdad9cfecd8a@mail.gmail.com>
	<68C527C1-15F0-49E2-87CC-27DEBB95B530@csail.mit.edu>
	<88e286470805061602l6b9cc001t11b28e3ce3d3798d@mail.gmail.com>
	<48216173.7000502@libero.it>
	<88e286470805070120i32db1239u16455123c310d8a2@mail.gmail.com>
	<8CF076DD-8E9F-48E2-A14E-FD1EED3515EB@csail.mit.edu>
	<1210188931.4546.14.camel@gondor>
Message-ID: <7F84F89F-2F0A-4556-973E-668034339267@csail.mit.edu>

On May 7, 2008, at 3:35 PM, Duncan McGreggor wrote:

> It's my understanding that greenlets are python, not C. Are you  
> thinking
> of tasklets in stackless?

The version for CPython is a C extension module.  Have a look at the  
comments in

   http://svn.red-bean.com/bob/greenlet/trunk/greenlet.c

The switching is accomplished by saving and restoring chunks of the C  
stack, which I find both extremely clever and kind of scary :)


Chris

From ionel.mc at gmail.com  Wed May  7 23:36:56 2008
From: ionel.mc at gmail.com (Ionel Maries Cristian)
Date: Thu, 8 May 2008 00:36:56 +0300
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>
	<4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu>
Message-ID: <b322b4e60805071436y7841fdc0q8c3ca09a654232a@mail.gmail.com>

 On Wed, May 7, 2008 at 10:00 PM, Christopher Stawarz <
cstawarz at csail.mit.edu> wrote:

> On May 6, 2008, at 8:51 PM, Ionel Maries Cristian wrote:
>
> > - there is no support for chunked input - that would require having
> > support for readline in the first place,
> >
> Why is readline a requirement for chunked input?  Each chunk specifies its
> size, and the application receiving a chunk just keeps calling recv() until
> it's read the specified number of bytes.
>

Well, not really a requirement, i was implying there is some sort of
readline since that is what one would generaly use some sort of realine to
get the size of a chunk - but not necessarily.


>   also, it should be the gateway's business decoding the chunked input.
> >
> OK, but if it's the gateway's responsibility, then this isn't an issue at
> all, as decoding of chunked data takes place before the application ever
> sees the request body.
> To be clear, I didn't mean to imply that awsgi.input must be the actual
> socket object connected to the client.  It just has to provide a recv()
> method with the semantics of a socket.  The server is free to pre-read the
> entire request, or it can receive data on demand, decoding any chunked input
> before it passes it to the application.
>


>  - i don't see how removing the write callable will help (i don't see a
> > issue having the server providing a stringio.write as the write callable for
> > synchronous apps)
> >
> Manlio explained this well, so I'll refer you to his response.
>
> > - passing nonstring values though middleware will make using/porting
> > existing wsgi middleware hairy (suppose you have a middleware that applies
> > some filter to the appiter - you'll have your code full of isinstance
> > nastiness)
> >
> Yes, my proposal would require existing middleware to be modified to
> support AWSGI, which is unfortunate.
>
> > Also, have you looked at the existing gateway implementations with
> > asynchronous support? There are a bunch of them:
> > http://trac.wiretooth.com/public/wiki/asycwsgi
> > http://chiral.j4cbo.com/trac
> > http://wiki.secondlife.com/wiki/Eventlet
> > my own shot at the problem: http://code.google.com/p/cogen/
> > and manlio's mod_wsgi for nginx
> > (I may be missing some)
> >
> I've seen some of these, but I'll be sure to take a look at the others.
>
> > [*1]In my implementation i do a bunch of tricks to make use of regular
> > wsgi middleware with async apps possible - i have a bunch of working
> > examples using pylons:  - the extensions in the environ (like your
> > environ['awsgi.readable']) return a empty string that penetrates most[*2]
> > middleware and set the actual message (like your (token, fd, timeout) tuple
> > on some internal object)
> > From this point of view, an async middleware stack is just a set of
> > middleware that supports streaming.
> >
> This is an interesting idea that I'd like to explore some more.  I really
> like the fact that it works with existing middleware (or at least fully
> WSGI-compliant middleware, as you point out).
> Apart from the write() callable, the biggest issue I see with the WSGI
> spec for asynchronous servers is wsgi.input.  The problem is that this is
> explicitly a file-like object.  This means that input.read(n) reads until it
> finds n bytes or EOF, input.readline() reads until it finds a newline or
> EOF, and input.readlines() and input.__iter__() always read to EOF.  Every
> one of these functions implies multiple I/O operations (calls to fread() for
> a file or recv() for a socket).
>  This means that if an application calls input.read(8), and only 4 bytes
> are available, the first call to recv() returns 4 bytes, and the second one
> blocks.  And now your entire server is blocked until data is available on
> this one socket.   (Of course, the server is free to pre-read the entire
> request at its leisure and feed it to the application from a buffer, but
> this may not always be practical or desirable, and I don't think
> asynchronous servers should be forced to do so.)
>  This is why I propose replacing wsgi.input with awsgi.input, which
> exposes a recv() method with socket-like (rather than file-like) semantics.
>  The meaning of input.recv(n) is therefore "read at most n bytes (possibly
> less), calling the underlying socket recv() at most one time".
>  So, although your suggestion may eliminate the need to yield non-string
> output from the application iterable, I still think there needs to be a
> separate specification for asynchronous gateways, since the semantics of
> wsgi.input just aren't compatible with an asynchronous model.
>  Chris
>

The way I see it asynchronous wsgi is just a matter of deciding how to
handle the input asynchronously - a asynchronous input wsgi extension
specification.
So I suggest completely dropping the idea of a incompatibility between
async_wsgi and wsgi (since it doesn't help anyone in the long run really -
it just fragments the gateway providers and overcomplicate things) and
concentrate more on the async input extension.

So the idea is that the gateways would provide async input by default and a
piece of middleware or config option to make it synchronous (well, actually,
buffer it).

Also, since there already are a bunch of async gateways out there I would
like to hear if the other providers would/could implement the proposed form
of common async input - that would ultimately decide the success of this
proposed spec.

 --
http://ionelmc.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20080508/fc020b12/attachment-0001.htm>

From cstawarz at csail.mit.edu  Thu May  8 04:59:42 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Wed, 7 May 2008 22:59:42 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <b322b4e60805071436y7841fdc0q8c3ca09a654232a@mail.gmail.com>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>
	<4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu>
	<b322b4e60805071436y7841fdc0q8c3ca09a654232a@mail.gmail.com>
Message-ID: <36B208C9-2503-4AB2-B39B-6EB67398DD8A@csail.mit.edu>

On May 7, 2008, at 5:36 PM, Ionel Maries Cristian wrote:

> The way I see it asynchronous wsgi is just a matter of deciding how  
> to handle the input asynchronously - a asynchronous input wsgi  
> extension specification.

Another crucial element is the ability to perform non-blocking I/O on  
other file descriptors (TCP connections to other servers, pipes to  
other OS processes).  This is why the readable/writable functions (or  
something like them) are necessary.

> So I suggest completely dropping the idea of a incompatibility  
> between async_wsgi and wsgi (since it doesn't help anyone in the  
> long run really - it just fragments the gateway providers and  
> overcomplicate things) and concentrate more on the async input  
> extension.

This is a compelling argument.  As long as the application iterable  
yields only strings (which, the more I think about it, seems like the  
right thing to do), then the remaining functionality I propose can be  
implemented as extensions to WSGI, perhaps in a "x-wsgiorg.async"  
namespace.

However, the problem remains that, even though an asynchronous server  
can implement the write() callable and wsgi.input as required by the  
WSGI spec, they effectively can't be used by applications, since they  
involve potentially blocking I/O operations.  So either WSGI has to be  
revised to take the needs of asynchronous servers into account, or we  
have to accept that async servers can never be fully WSGI compliant.

> So the idea is that the gateways would provide async input by  
> default and a piece of middleware or config option to make it  
> synchronous (well, actually, buffer it).

You mean the middleware would be used to make the input synchronous so  
that an app that uses wsgi.input would function normally (reading from  
the buffer)?  That would fix the problem for wsgi.input, but the issue  
with write() remains.

Another point to keep in mind is that in order to function correctly  
on an async server, an application really has to be written with that  
execution environment in mind.  For example, an app couldn't use  
httplib, since it does blocking I/O (which, again, would freeze up the  
entire server).

> Also, since there already are a bunch of async gateways out there I  
> would like to hear if the other providers would/could implement the  
> proposed form of common async input - that would ultimately decide  
> the success of this proposed spec.

I would like to hear their opinions as well.  In particular, do any  
Twisted folks have comments on what we've discussed?


Chris

From cstawarz at csail.mit.edu  Thu May  8 07:49:42 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Thu, 8 May 2008 01:49:42 -0400
Subject: [Web-SIG] Proposal for asynchronous WSGI variant
In-Reply-To: <36B208C9-2503-4AB2-B39B-6EB67398DD8A@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<b322b4e60805061751ne2e9618ta3fc519b87997e6a@mail.gmail.com>
	<4B7ABD81-E475-44CC-8D45-2E0525BC7503@csail.mit.edu>
	<b322b4e60805071436y7841fdc0q8c3ca09a654232a@mail.gmail.com>
	<36B208C9-2503-4AB2-B39B-6EB67398DD8A@csail.mit.edu>
Message-ID: <52E01258-D046-4A44-AD35-2E9413C1DAB6@csail.mit.edu>

On May 7, 2008, at 10:59 PM, Christopher Stawarz wrote:

> However, the problem remains that, even though an asynchronous  
> server can implement the write() callable and wsgi.input as required  
> by the WSGI spec, they effectively can't be used by applications,  
> since they involve potentially blocking I/O operations.  So either  
> WSGI has to be revised to take the needs of asynchronous servers  
> into account, or we have to accept that async servers can never be  
> fully WSGI compliant.

Maybe this isn't as big a deal as I'm making it.  The point of the  
async extensions is to make it possible for WSGI apps to run  
effectively on asynchronous servers.  Apps that use the extensions  
won't use write() or wsgi.input, so it really doesn't matter whether  
they're blocking or not.

Although apps that don't use the async extensions *could* be run on an  
asynchronous server (by using wsgi.input in a blocking fashion), doing  
so would mean that the server could effectively handle only one  
request at a time (i.e. serially).  If this were unacceptable (which  
it most likely would be), then you just wouldn't do it.  Better to use  
mod_wsgi or some other server that can run your app effectively.

So I guess the only issue is that authors of asynchronous servers who  
want to comply fully with the WSGI spec have to implement  
functionality (write() and wsgi.input) that can't be used without  
severely degrading the server's performance.  But that's an issue that  
server authors can address as they see fit, not something that the  
WSGI spec needs to account for.

Thanks to everyone who has provided input so far -- please keep the  
comments coming!  I'm going to work on another draft of my proposal  
that takes into account what we've discussed and will post it here  
when it's done.


Chris

From cstawarz at csail.mit.edu  Mon May 12 00:15:57 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Sun, 11 May 2008 18:15:57 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
Message-ID: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>

This is a revised version of my AWSGI proposal from last week.  While  
many of the details remain the same, the big change is that I'm now  
proposing a set of extensions to standard WSGI, rather than a separate  
specification for asynchronous servers.

The updated proposal is included below.  I've also posted it at

   http://wsgi.org/wsgi/Specifications/async

The bzr repository for my reference implementation (which is only  
partially updated to match the new spec) is now at

   http://pseudogreen.org/bzr/wsgiorg_async_ref/

I'd appreciate your comments.


Thanks,
Chris


Abstract
--------

This specification defines a set of extensions that allow WSGI
applications to run effectively on asynchronous (aka event driven)
servers.

Rationale
---------

The architecture of an asynchronous server requires all I/O
operations, including both interprocess and network communication, to
be non-blocking.  For a WSGI-compliant server, this requirement
extends to all applications run on the server.  However, the WSGI
specification does not provide sufficient facilities for an
application to ensure that its I/O is non-blocking.  Specifically,
there are two issues:

* The methods provided by the input stream (``environ['wsgi.input']``)
   follow the semantics of the corresponding methods of the ``file``
   class.  In particular, each of these methods can invoke the
   underlying I/O function (in this case, ``recv`` on the socket
   connected to the client) more than once, without giving the
   application the opportunity to check whether each invocation will
   block.

* WSGI does not provide the application with a mechanism to test
   arbitrary file descriptors (such as those belonging to sockets or
   pipes opened by the application) for I/O readiness.

This specification defines a standard interface by which asynchronous
servers can provide the required facilities to applications.

Specification
-------------

Servers that want to allow applications to perform non-blocking I/O
must add four new variables to the WSGI environment:
``x-wsgiorg.async.input``, ``x-wsgiorg.async.readable``,
``x-wsgiorg.async.writable``, and ``x-wsgiorg.async.timeout``.  The
following sections describe these extensions.

Non-blocking Input Stream
~~~~~~~~~~~~~~~~~~~~~~~~~

The ``x-wsgiorg.async.input`` variable provides a non-blocking
replacement for ``wsgi.input``.  It is an object with one method,
``read(size)``, that behaves like the ``recv`` method of
``socket.socket``.  This means that a call to ``read`` will invoke the
underlying socket ``recv`` **no more than once** and return **at
most** ``size`` bytes of data (possibly less).  In addition, ``read``
may return an empty string (zero bytes) **only** if the client closes
the connection or the application attempts to read more data than is
specified by the ``CONTENT_LENGTH`` variable.

Before each call to ``read``, the application **must** test the input
stream for readiness with ``x-wsgiorg.async.readable`` (see below).
The result of calling ``read`` on a non-ready input stream is
undefined.

As with ``wsgi.input``, the server is free to implement
``x-wsgiorg.async.input`` using any technique it chooses (performing
reads on demand, pre-reading the request body, etc.).  The only
requirements are for ``read`` to obey the expected semantics and the
input object to be accepted as the first argument to
``x-wsgiorg.async.readable``.

Testing File Descriptors for I/O Readiness
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The variables ``x-wsgiorg.async.readable`` and
``x-wsgiorg.async.writable`` are callable objects that accept two
positional arguments, one required and one optional.  In the following
description, these arguments are given the names ``fd`` and
``timeout``, but they are not required to have these names, and the
application **must** invoke the callables using positional arguments.

The first argument, ``fd``, is either an integer representing a file
descriptor or an object with a ``fileno`` method that returns such an
integer.  (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even
if it lacks a ``fileno`` method.)  The second, optional argument,
``timeout``, is either ``None`` or a floating-point value in seconds.
If omitted, it defaults to ``None``.

When called, ``readable`` and ``writable`` return the empty string
(``''``), which **must** be yielded by the application iterable to the
server (passing through any middleware).  The server then suspends
execution of the application until one of the following conditions is
met:

* The specified file descriptor is ready for reading or writing.

* ``timeout`` seconds have elapsed without the file descriptor
   becoming ready for I/O.

* The server detects an error or "exceptional" condition (such as
   out-of-band data) on the file descriptor.

Put another way, if the application calls ``readable`` and yields the
empty string, it will be suspended until
``select.select([fd],[],[fd],timeout)`` would return.  If the
application calls ``writable`` and yields the empty string, it will be
suspended until ``select.select([],[fd],[fd],timeout)`` would return.

If ``timeout`` seconds elapse without the file descriptor becoming
ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true
when the application resumes.  Otherwise, it will be false.  The value
of ``x-wsgiorg.async.timeout`` when the application is first started
or after it yields each response-body string is undefined.

The server may use any technique it desires to detect when an
application's file descriptors are ready for I/O.  (Most likely, it
will add them to the same event loop that it uses for accepting new
client connections, receiving requests, and sending responses.)

Examples
--------

The following application reads the request body and sends it back to
the client unmodified.  Each time it wants to receive data from the
client, it first tests ``environ['x-wsgiorg.async.input']`` for
readability and then calls its ``read`` method.  If the input stream
is not readable after one second, the application sends a ``408
Request Timeout`` response to the client and terminates::

   def echo_request_body(environ, start_response):
       input = environ['x-wsgiorg.async.input']
       readable = environ['x-wsgiorg.async.readable']

       nbytes = int(environ.get('CONTENT_LENGTH') or 0)
       output = ''
       while nbytes:
           yield readable(input, 1.0)  # Time out after 1 second

           if environ['x-wsgiorg.async.timeout']:
               msg = 'The request timed out.'
               start_response('408 Request Timeout',
                              [('Content-Type', 'text/plain'),
                               ('Content-Length', str(len(msg)))])
               yield msg
               return

           data = input.read(nbytes)
           if not data:
               break
           output += data
           nbytes -= len(data)

       content_type = (environ.get('CONTENT_TYPE') or 'application/ 
octet-stream')
       start_response('200 OK', [('Content-Type', content_type),
                                 ('Content-Length', str(len(output)))])
       yield output

The following middleware component allows an application that uses the
``x-wsgiorg.async`` extensions to run on a server that does not
support them, without any modification to the application's code::

   def dummy_async(application):
       def wrapper(environ, start_response):
           input = environ['wsgi.input']
           environ['x-wsgiorg.async.input'] = input

           select_args = [None]

           def readable(fd, timeout=None):
               select_args[0] = ([fd], [], [fd], timeout)
               return ''

           def writable(fd, timeout=None):
               select_args[0] = ([], [fd], [fd], timeout)
               return ''

           environ['x-wsgiorg.async.readable'] = readable
           environ['x-wsgiorg.async.writable'] = writable

           for result in application(environ, start_response):
               if result or (not select_args[0]):
                   yield result
               else:
                   if select_args[0][2][0] is input:
                       environ['x-wsgiorg.async.timeout'] = False
                   else:
                       ready = select.select(*select_args[0])
                       environ['x-wsgiorg.async.timeout'] = (ready ==  
([],[],[]))
                   select_args[0] = None

       return wrapper

Problems
--------

* The empty string yielded by an application after calling
   ``readable`` or ``writable`` must pass through any intervening
   middleware and be detected by the server.  Although WSGI explicitly
   requires middleware to relay such strings to the server (see
   `Middleware Handling of Block Boundaries
   <http://python.org/dev/peps/pep-0333/#middleware-handling-of-block-boundaries 
 >`_),
   some components may not, making them incompatible with this
   specification.

* Although the extensions described here make it *possible* for
   applications to run effectively on asynchronous servers, they do not
   (and cannot) *ensure* that they do so.  As is the case with any
   cooperative multitasking environment, the burden of ensuring that
   all application code is non-blocking rests with application authors.

Other Possibilities
-------------------

* To prevent an application that does blocking I/O from blocking the
   entire server, an asynchronous server could run each instance of the
   application in a separate thread.  However, since asynchronous
   servers achieve high levels of concurrency by expressly *avoiding*
   multithreading, this technique will almost always be unacceptable.

* The `greenlet <http://codespeak.net/py/dist/greenlet.html>`_ package
   enables the use of cooperatively-scheduled micro-threads in Python
   programs, and a WSGI server could potentially use it to pause and
   resume applications around blocking I/O operations.  However, such
   micro-threading is not part of the Python language or standard
   library, and some server authors may be unwilling or unable to make
   use of it.

Open Issues
-----------

* Some third-party libraries (such as `PycURL
   <http://pycurl.sourceforge.net/>`_) provide non-blocking interfaces
   that may need to monitor multiple file descriptors for I/O readiness
   simultaneously.  Since this specification allows an application to
   wait on only one file descriptor at a time, it may be difficult or
   impossible for applications to use such libraries.

   Although this specification could be extended to include an
   interface for waiting on multiple file descriptors, it is unclear
   whether it would be easy (or even possible) for all servers to
   implement it.  Also, the appropriate behavior for a multi-descriptor
   wait is not obvious.  (Should the application be resumed when a
   single descriptor is ready?  All of them?  Some minimum number?)


From pje at telecommunity.com  Mon May 12 01:05:33 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 11 May 2008 19:05:33 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
Message-ID: <20080511230511.CE3C13A4061@sparrow.telecommunity.com>

At 06:15 PM 5/11/2008 -0400, Christopher Stawarz wrote:
>Non-blocking Input Stream
>~~~~~~~~~~~~~~~~~~~~~~~~~
>
>The ``x-wsgiorg.async.input`` variable provides a non-blocking
>replacement for ``wsgi.input``.  It is an object with one method,
>``read(size)``, that behaves like the ``recv`` method of
>``socket.socket``.  This means that a call to ``read`` will invoke the
>underlying socket ``recv`` **no more than once** and return **at
>most** ``size`` bytes of data (possibly less).  In addition, ``read``
>may return an empty string (zero bytes) **only** if the client closes
>the connection or the application attempts to read more data than is
>specified by the ``CONTENT_LENGTH`` variable.
>
>Before each call to ``read``, the application **must** test the input
>stream for readiness with ``x-wsgiorg.async.readable`` (see below).
>The result of calling ``read`` on a non-ready input stream is
>undefined.

For this to work, you're going to need this to take the wsgi.input 
object as a parameter.  If you don't, then this will bypass 
middleware that replaces wsgi.input.

That is, you will need a way for this spec to support middleware 
that's replacing wsgi.input, without the middleware knowing that this 
specification exists.  In the worst case, it should detect the 
replaced input and give an error or some response that lets the 
application know it won't really be able to use the async feature.


>If ``timeout`` seconds elapse without the file descriptor becoming
>ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true
>when the application resumes.  Otherwise, it will be false.  The value
>of ``x-wsgiorg.async.timeout`` when the application is first started
>or after it yields each response-body string is undefined.

Er, I think you are confused here.  There is no way for the server to 
know what environ dictionary the application is using, unless you 
explicitly pass it into your extension API.


From cstawarz at csail.mit.edu  Mon May 12 02:25:57 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Sun, 11 May 2008 20:25:57 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <20080511230511.CE3C13A4061@sparrow.telecommunity.com>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<20080511230511.CE3C13A4061@sparrow.telecommunity.com>
Message-ID: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>

On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote:

> At 06:15 PM 5/11/2008 -0400, Christopher Stawarz wrote:
>> Non-blocking Input Stream
>> ~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> The ``x-wsgiorg.async.input`` variable provides a non-blocking
>> replacement for ``wsgi.input``.  It is an object with one method,
>> ``read(size)``, that behaves like the ``recv`` method of
>> ``socket.socket``.  This means that a call to ``read`` will invoke  
>> the
>> underlying socket ``recv`` **no more than once** and return **at
>> most** ``size`` bytes of data (possibly less).  In addition, ``read``
>> may return an empty string (zero bytes) **only** if the client closes
>> the connection or the application attempts to read more data than is
>> specified by the ``CONTENT_LENGTH`` variable.
>>
>> Before each call to ``read``, the application **must** test the input
>> stream for readiness with ``x-wsgiorg.async.readable`` (see below).
>> The result of calling ``read`` on a non-ready input stream is
>> undefined.
>
> For this to work, you're going to need this to take the wsgi.input  
> object as a parameter.  If you don't, then this will bypass  
> middleware that replaces wsgi.input.
>
> That is, you will need a way for this spec to support middleware  
> that's replacing wsgi.input, without the middleware knowing that  
> this specification exists.  In the worst case, it should detect the  
> replaced input and give an error or some response that lets the  
> application know it won't really be able to use the async feature.

I hadn't considered middleware that replaces wsgi.input.  Is there an  
example component you can point me to, just so I have something  
concrete to look at?

Given that the semantics of wsgi.input are, in general, incompatible  
with non-blocking execution, I'm inclined to think that such  
middleware would either need to be rewritten to use x- 
wsgiorg.async.input, or just couldn't be used with asynchronous  
servers.  But I'll think about it some more -- maybe there's a way to  
make this work.

>> If ``timeout`` seconds elapse without the file descriptor becoming
>> ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true
>> when the application resumes.  Otherwise, it will be false.  The  
>> value
>> of ``x-wsgiorg.async.timeout`` when the application is first started
>> or after it yields each response-body string is undefined.
>
> Er, I think you are confused here.  There is no way for the server  
> to know what environ dictionary the application is using, unless you  
> explicitly pass it into your extension API.

My thinking is that the server *creates* the environ dictionary, so it  
can just keep a reference to it and update it as needed.  Is  
middleware allowed to replace environ with another dict instance  
before passing it to the application?  I wasn't aware that this was  
allowed, but if it is, then I see the problem.

The solution would probably be for the application to pass a mutable  
object (e.g. an empty list) to readable/writable that the server could  
set a timeout flag on.


Thanks,
Chris

From pje at telecommunity.com  Mon May 12 03:09:41 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 11 May 2008 21:09:41 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<20080511230511.CE3C13A4061@sparrow.telecommunity.com>
	<6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
Message-ID: <20080512010919.B3EB63A4061@sparrow.telecommunity.com>

At 08:25 PM 5/11/2008 -0400, Christopher Stawarz wrote:
>Given that the semantics of wsgi.input are, in general, incompatible
>with non-blocking execution, I'm inclined to think that such
>middleware would either need to be rewritten to use x- 
>wsgiorg.async.input, or just couldn't be used with asynchronous
>servers.  But I'll think about it some more -- maybe there's a way to
>make this work.

Please read 
http://www.python.org/dev/peps/pep-0333/#server-extension-apis for 
the lowdown on this.  It's only seven paragraphs, but it already 
covers this ground thoroughly.


>Is
>middleware allowed to replace environ with another dict instance
>before passing it to the application?

See the same seven paragraphs for the answer to this as well (albeit 
somewhat implicitly).


From ionel.mc at gmail.com  Mon May 12 06:01:40 2008
From: ionel.mc at gmail.com (Ionel Maries Cristian)
Date: Mon, 12 May 2008 07:01:40 +0300
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<20080511230511.CE3C13A4061@sparrow.telecommunity.com>
	<6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
Message-ID: <b322b4e60805112101g7aa8e5f1lcfc08cf9e5482888@mail.gmail.com>

> My thinking is that the server *creates* the environ dictionary, so it can
> just keep a reference to it and update it as needed.  Is middleware allowed
> to replace environ with another dict instance before passing it to the
> application?  I wasn't aware that this was allowed, but if it is, then I see
> the problem.
>
> The solution would probably be for the application to pass a mutable
> object (e.g. an empty list) to readable/writable that the server could set a
> timeout flag on.
>

How about a environ['x-wsgiorg.async'].timeout ? I do something like that in
cogen.

-- 
http://ionelmc.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20080512/7e39a65c/attachment.htm>

From ionel.mc at gmail.com  Mon May 12 06:45:22 2008
From: ionel.mc at gmail.com (Ionel Maries Cristian)
Date: Mon, 12 May 2008 07:45:22 +0300
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<20080511230511.CE3C13A4061@sparrow.telecommunity.com>
	<6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
Message-ID: <b322b4e60805112145n5373fc40g80e0622f04230a5@mail.gmail.com>

On Mon, May 12, 2008 at 3:25 AM, Christopher Stawarz <cstawarz at csail.mit.edu>
wrote:

> On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote:
>
>  For this to work, you're going to need this to take the wsgi.input object
> > as a parameter.  If you don't, then this will bypass middleware that
> > replaces wsgi.input.
> >
> > That is, you will need a way for this spec to support middleware that's
> > replacing wsgi.input, without the middleware knowing that this specification
> > exists.  In the worst case, it should detect the replaced input and give an
> > error or some response that lets the application know it won't really be
> > able to use the async feature.
> >
>
> I hadn't considered middleware that replaces wsgi.input.  Is there an
> example component you can point me to, just so I have something concrete to
> look at?
>
> Given that the semantics of wsgi.input are, in general, incompatible with
> non-blocking execution, I'm inclined to think that such middleware would
> either need to be rewritten to use x-wsgiorg.async.input, or just couldn't
> be used with asynchronous servers.  But I'll think about it some more --
> maybe there's a way to make this work.
>


Making input filters work could be achieved using greenlets - but then again
- if one would use greenlets he could use them to simulate a seemingly
blocking api for the input so this is pretty much pointless.

But I agree, detecting this is good and errors should be thrown in this
case.
In cogen i'm setting wsgi.input to None - so any use of it would end in a
error - though it's not very elegant.


-- 
http://ionelmc.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20080512/388db3ff/attachment-0001.htm>

From manlio_perillo at libero.it  Mon May 12 15:03:44 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Mon, 12 May 2008 15:03:44 +0200
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <20080511230511.CE3C13A4061@sparrow.telecommunity.com>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<20080511230511.CE3C13A4061@sparrow.telecommunity.com>
Message-ID: <48284030.6020300@libero.it>

Phillip J. Eby ha scritto:
> [...]
> 
> 
>> If ``timeout`` seconds elapse without the file descriptor becoming
>> ready for I/O, the variable ``x-wsgiorg.async.timeout`` will be true
>> when the application resumes.  Otherwise, it will be false.  The value
>> of ``x-wsgiorg.async.timeout`` when the application is first started
>> or after it yields each response-body string is undefined.
> 
> Er, I think you are confused here.  There is no way for the server to 
> know what environ dictionary the application is using, unless you 
> explicitly pass it into your extension API.
> 

Interesting, this is something I have never considered.
In my implementation ngx.poll returns a function, so there should be no 
problems.


Regards  Manlio Perillo

From cstawarz at csail.mit.edu  Mon May 12 16:35:09 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Mon, 12 May 2008 10:35:09 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <b322b4e60805112101g7aa8e5f1lcfc08cf9e5482888@mail.gmail.com>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<20080511230511.CE3C13A4061@sparrow.telecommunity.com>
	<6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
	<b322b4e60805112101g7aa8e5f1lcfc08cf9e5482888@mail.gmail.com>
Message-ID: <CB1C38C5-7EAE-4B92-929C-751371D1471A@csail.mit.edu>

On May 12, 2008, at 12:01 AM, Ionel Maries Cristian wrote:

> My thinking is that the server *creates* the environ dictionary, so  
> it can just keep a reference to it and update it as needed.  Is  
> middleware allowed to replace environ with another dict instance  
> before passing it to the application?  I wasn't aware that this was  
> allowed, but if it is, then I see the problem.
>
> The solution would probably be for the application to pass a mutable  
> object (e.g. an empty list) to readable/writable that the server  
> could set a timeout flag on.
>
> How about a environ['x-wsgiorg.async'].timeout ? I do something like  
> that in cogen.

Or environ['x-wsgiorg.async.timeout'] could be an object whose truth  
value can be toggled by the server, like an instance of the following:

   class MutaBool(object):
       def __init__(self):
           self.val = False
       def __nonzero__(self):
           return self.val

Then there's no need for the server to change environ after starting  
the app.  I think that's probably the way to go.


Chris

From cstawarz at csail.mit.edu  Mon May 12 17:03:12 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Mon, 12 May 2008 11:03:12 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <b322b4e60805112145n5373fc40g80e0622f04230a5@mail.gmail.com>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<20080511230511.CE3C13A4061@sparrow.telecommunity.com>
	<6DC53B84-A8C6-4CD2-8174-E3ABA05AEFF0@csail.mit.edu>
	<b322b4e60805112145n5373fc40g80e0622f04230a5@mail.gmail.com>
Message-ID: <8FE609A5-6865-4D25-A318-064729FF99E1@csail.mit.edu>

On May 12, 2008, at 12:45 AM, Ionel Maries Cristian wrote:

> On Mon, May 12, 2008 at 3:25 AM, Christopher Stawarz <cstawarz at csail.mit.edu 
> > wrote:
> On May 11, 2008, at 7:05 PM, Phillip J. Eby wrote:
>
> For this to work, you're going to need this to take the wsgi.input  
> object as a parameter.  If you don't, then this will bypass  
> middleware that replaces wsgi.input.
>
> That is, you will need a way for this spec to support middleware  
> that's replacing wsgi.input, without the middleware knowing that  
> this specification exists.  In the worst case, it should detect the  
> replaced input and give an error or some response that lets the  
> application know it won't really be able to use the async feature.
>
> I hadn't considered middleware that replaces wsgi.input.  Is there  
> an example component you can point me to, just so I have something  
> concrete to look at?
>
> Given that the semantics of wsgi.input are, in general, incompatible  
> with non-blocking execution, I'm inclined to think that such  
> middleware would either need to be rewritten to use x- 
> wsgiorg.async.input, or just couldn't be used with asynchronous  
> servers.  But I'll think about it some more -- maybe there's a way  
> to make this work.
>
>
> Making input filters work could be achieved using greenlets - but  
> then again - if one would use greenlets he could use them to  
> simulate a seemingly blocking api for the input so this is pretty  
> much pointless.
>
> But I agree, detecting this is good and errors should be thrown in  
> this case.
> In cogen i'm setting wsgi.input to None - so any use of it would end  
> in a error - though it's not very elegant.

But if your server sets wsgi.input to None, then you really can't  
claim that it's WSGI-compliant.

It seems like the authors of asynchronous servers have two options for  
how to handle wsgi.input.  The first option is to provide a compliant  
wsgi.input (with file-like, blocking behavior).  This means that  
middleware that uses/replaces wsgi.input will work properly, but the  
whole server can block whenever such use takes place.  Therefore, apps  
and middleware will essentially be required to use x- 
wsgiorg.async.input.

The second option is to provide a non-compliant (i.e. non-blocking)  
wsgi.input, which works something like x-wsgiorg.async.input.  But  
then any middleware that uses wsgi.input will be broken, since it  
won't work as expected.

In either case, wsgi.input ends up being unusable.  Ugh.

Of course, there is an easy way out of this:  Drop the idea of x- 
wsgiorg.async.input, and push the responsibility for making wsgi.input  
non-blocking on to server authors.  In effect, this would mean that  
asynchronous servers must *always* pre-read the request body and  
provide it to the app as a StringIO (or whatever).

I would like to avoid this requirement, since the ability for servers  
to provide on-demand, non-blocking input to the application seems  
useful.  But if it comes down to a choice between (1) the ability to  
receive data from the client on-demand and (2) having a wsgi.input  
that can actually be used, I'm think I'd choose (2).


Chris

From foom at fuhm.net  Mon May 12 18:18:50 2008
From: foom at fuhm.net (James Y Knight)
Date: Mon, 12 May 2008 12:18:50 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
Message-ID: <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net>


On May 11, 2008, at 6:15 PM, Christopher Stawarz wrote:
> Abstract
> --------
>
> This specification defines a set of extensions that allow WSGI
> applications to run effectively on asynchronous (aka event driven)
> servers.
>
> Rationale
> ---------
>
> The architecture of an asynchronous server requires all I/O
> operations, including both interprocess and network communication, to
> be non-blocking.  For a WSGI-compliant server, this requirement
> extends to all applications run on the server.  However, the WSGI
> specification does not provide sufficient facilities for an
> application to ensure that its I/O is non-blocking.  Specifically,
> there are two issues:
>
> * The methods provided by the input stream (``environ['wsgi.input']``)
>  follow the semantics of the corresponding methods of the ``file``
>  class.
>
> * WSGI does not provide the application with a mechanism to test
>  arbitrary file descriptors (such as those belonging to sockets or
>  pipes opened by the application) for I/O readiness.

There are other issues. How do you do a DNS lookup? How do you get  
process completion notification? Heck, how do you run a process? Once  
you have I/O readiness information, what do you do with that? I guess  
you'd need to write a whole new asynchronous server framework on top  
of AWSGI? I can't see being able to use it "raw" for any real  
applications.

> The first argument, ``fd``, is either an integer representing a file
> descriptor or an object with a ``fileno`` method that returns such an
> integer.  (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even
> if it lacks a ``fileno`` method.)  The second, optional argument,
> ``timeout``, is either ``None`` or a floating-point value in seconds.
> If omitted, it defaults to ``None``.

What if the event-loop of the server doesn't use integer fds, but  
windows file handles or a java channel object? Where are you allowed  
to get these integers from? Is it always a socket from  
socket.socket().fileno()? Or can it be a file from open().fileno() or  
os.open()? A pipe from os.pipe()? Note that these distinctions are  
important everywhere but UNIX.

> Other Possibilities
> -------------------
>
> * To prevent an application that does blocking I/O from blocking the
>  entire server, an asynchronous server could run each instance of the
>  application in a separate thread.  However, since asynchronous
>  servers achieve high levels of concurrency by expressly *avoiding*
>  multithreading, this technique will almost always be unacceptable.

Well, my claim would be that it's usually acceptable. Certainly  
sometimes it's not, which is where the use of an asynchronous server  
framework comes in handy. But here you're inventing a whole new  
framework...

PS, a minor bug: I notice the spec says wsgiorg.async.input is  
supposed to have only a read function, but you actually call recv() on  
it in the examples.

James

From cstawarz at csail.mit.edu  Mon May 12 20:55:27 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Mon, 12 May 2008 14:55:27 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net>
Message-ID: <4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu>

On May 12, 2008, at 12:18 PM, James Y Knight wrote:

> There are other issues. How do you do a DNS lookup? How do you get  
> process completion notification? Heck, how do you run a process?

These are valid questions that I'm not attempting to address with this  
proposal.  So maybe the title of my spec should be "Extensions for  
Asynchronous I/O", since that's the only issue it deals with.  I see  
these other issues as something for other specifications to address.

> Once you have I/O readiness information, what do you do with that? I  
> guess you'd need to write a whole new asynchronous server framework  
> on top of AWSGI? I can't see being able to use it "raw" for any real  
> applications.

No, you don't need a whole new framework.  You need libraries (for  
making HTTP requests, talking to databases, etc.) that are written to  
use the extensions the spec provides.  These only need to be written  
once and can then be used with *any* server that supports the  
extensions.

So the existence of a spec like this lets us move from a world where  
every server/framework (be it Twisted, nginx, cogen, whatever) needs  
to reimplement these utilities in terms of its own async I/O  
framework, to one where a single implementation can be written against  
the spec and then used by any server that implements it.  In turn,  
this should make application developers more comfortable with  
targeting their apps at async servers, since they won't be tied to any  
particular server/framework's API.

And, yes, the fact that what I just wrote sounds like "write once, run  
anywhere" sets off alarm bells in my head, too :)  But I think the  
interface I propose is so basic that any async server should be able  
to provide it with very little trouble.

> What if the event-loop of the server doesn't use integer fds, but  
> windows file handles or a java channel object? Where are you allowed  
> to get these integers from? Is it always a socket from  
> socket.socket().fileno()? Or can it be a file from open().fileno()  
> or os.open()? A pipe from os.pipe()? Note that these distinctions  
> are important everywhere but UNIX.

Although I didn't state it in the spec, my thinking was that readable/ 
writable should accept whatever would be accepted by select() on the  
platform you're running on.  On Windows, they would be limited to  
sockets; elsewhere, any file descriptor would do.

In that light, maybe the title should really be "Extensions for  
Polling File Descriptors for I/O Readiness".  But even limited to that  
scope, I still think it'd be extremely useful.

>> * To prevent an application that does blocking I/O from blocking the
>> entire server, an asynchronous server could run each instance of the
>> application in a separate thread.  However, since asynchronous
>> servers achieve high levels of concurrency by expressly *avoiding*
>> multithreading, this technique will almost always be unacceptable.
>
> Well, my claim would be that it's usually acceptable. Certainly  
> sometimes it's not, which is where the use of an asynchronous server  
> framework comes in handy.

I don't get how it's acceptable.  If you spawn a separate thread for  
each request, then your server is no longer asynchronous.  At that  
point, why not just save yourself some trouble and use Apache?

> PS, a minor bug: I notice the spec says wsgiorg.async.input is  
> supposed to have only a read function, but you actually call recv()  
> on it in the examples.

Thanks.  The examples in the spec text are correct, but I haven't  
updated the examples in my reference code yet.


Chris

From foom at fuhm.net  Mon May 12 23:07:33 2008
From: foom at fuhm.net (James Y Knight)
Date: Mon, 12 May 2008 17:07:33 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net>
	<4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu>
Message-ID: <DEDC821E-E6E8-4100-9F4B-04158A03A307@fuhm.net>


On May 12, 2008, at 2:55 PM, Christopher Stawarz wrote:

>
>> There are other issues. How do you do a DNS lookup? How do you get  
>> process completion notification? Heck, how do you run a process?
>
> These are valid questions that I'm not attempting to address with  
> this proposal.  So maybe the title of my spec should be "Extensions  
> for Asynchronous I/O", since that's the only issue it deals with.  I  
> see these other issues as something for other specifications to  
> address.

Surely you need DNS lookup to make a socket connection? Do you mean to  
provide that in an external library via a threadpool?

> No, you don't need a whole new framework.  You need libraries (for  
> making HTTP requests, talking to databases, etc.) that are written  
> to use the extensions the spec provides.  These only need to be  
> written once and can then be used with *any* server that supports  
> the extensions.

You do need a framework. Using socket functions correctly (and  
portably) in non-blocking mode is not trivial.

>> Well, my claim would be that it's usually acceptable. Certainly  
>> sometimes it's not, which is where the use of an asynchronous  
>> server framework comes in handy.
>
> I don't get how it's acceptable.  If you spawn a separate thread for  
> each request, then your server is no longer asynchronous.  At that  
> point, why not just save yourself some trouble and use Apache?

Well,

1) Using apache is certainly a valid option performance-wise. Apache  
is pretty fast (obviously not the fastest server ever, but pretty  
good...). So if it has the features/packaging you need, by all means,  
use it. The advantage IMO of python servers is that they're lighter- 
weight deployment-wise and more easily configurable by code.
2) If your app uses a database, you probably might as well just run it  
in a thread, because you're most likely going to use a blocking  
database API anyhow.
3) If your app does not make use of outgoing sockets, then
  3a) If it also doesn't use wsgi.input, you could inform the WSGI  
server that it can just run the app not in a thread as it won't be  
blocking.
  3b) If it does use wsgi.input, but doesn't need to read it  
incrementally, you could inform the server that it should pre-read the  
input and then run the app directly, not in a thread, as it won't be  
blocking.

If none of the above apply, that is: you do not use a database, you do  
use incremental reading of wsgi.input, or an outgoing socket  
connection, /then/ an async WSGI extension might be useful. I claim  
that will cover a small subset of WSGI apps.

James

From cstawarz at csail.mit.edu  Tue May 13 00:18:47 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Mon, 12 May 2008 18:18:47 -0400
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <DEDC821E-E6E8-4100-9F4B-04158A03A307@fuhm.net>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net>
	<4098D448-63F5-4A71-A79E-8D7CF2BBB345@csail.mit.edu>
	<DEDC821E-E6E8-4100-9F4B-04158A03A307@fuhm.net>
Message-ID: <9FB1544F-79B6-47CE-9918-B95BF24C1B62@csail.mit.edu>

On May 12, 2008, at 5:07 PM, James Y Knight wrote:

> Surely you need DNS lookup to make a socket connection? Do you mean  
> to provide that in an external library via a threadpool?

No, I don't mean to, because I don't care enough to bother.  But if  
you or someone else did, you'd be free to.

> You do need a framework. Using socket functions correctly (and  
> portably) in non-blocking mode is not trivial.

I need a library, not a framework.  And I may not even need to write  
it myself.  (For example, for making HTTP requests, I can use pycurl.)

> 1) Using apache is certainly a valid option performance-wise. Apache  
> is pretty fast (obviously not the fastest server ever, but pretty  
> good...). So if it has the features/packaging you need, by all  
> means, use it. The advantage IMO of python servers is that they're  
> lighter-weight deployment-wise and more easily configurable by code.

Fair enough.  But I'm specifically interested in doing non-blocking I/ 
O on an asynchronous server.

> 2) If your app uses a database, you probably might as well just run  
> it in a thread, because you're most likely going to use a blocking  
> database API anyhow.

Yes, the compatibility of database and other API's with an  
asynchronous execution model is important.  Some (like MySQL) don't  
support non-blocking connections, so you'd have to work around that  
with threads or some other technique.  Others (like PostgreSQL) do  
provide an async API, which could be used with my proposed  
extensions.  (Manlio Perillo has an example of how this works with his  
nginx mod_wsgi module at http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py.)

This is another issue you have to worry about to keep your app non- 
blocking, but I don't think it's an insurmountable one.  And again,  
any library you develop to support these operations, written in terms  
of the proposed non-blocking I/O extensions, will be usable on any  
server that supports the extensions.

> 3) If your app does not make use of outgoing sockets, then
> 3a) If it also doesn't use wsgi.input, you could inform the WSGI  
> server that it can just run the app not in a thread as it won't be  
> blocking.
> 3b) If it does use wsgi.input, but doesn't need to read it  
> incrementally, you could inform the server that it should pre-read  
> the input and then run the app directly, not in a thread, as it  
> won't be blocking.
>
> If none of the above apply, that is: you do not use a database, you  
> do use incremental reading of wsgi.input, or an outgoing socket  
> connection, /then/ an async WSGI extension might be useful. I claim  
> that will cover a small subset of WSGI apps.

As I mentioned above, the database issue is a real one, but it can be  
dealt with.  I would like to be able to allow incremental reading of  
wsgi.input, but I don't see how to do this without breaking  
middleware.  (If you have suggestions, please let me know.)  As for  
outgoing socket connections, I'm willing to accept the cost of a DNS  
lookup; if someone else isn't, then they're free to write some kind of  
local lookup server that their app talks to over a socket, and other  
applications running on other servers can enjoy the fruits of their  
labor.

I regret calling my proposal "Extensions for Asynchronous Servers",  
since clearly that encompasses a much broader range of functionality  
for you than it does for me.  All I'm interested in is the ability to  
poll file descriptors (and the things that allows me to do), and in  
the next revision of my proposal I'll strive to make that clear.  If  
you have an application that requires functionality beyond that, then  
my proposal won't be sufficient for your needs.


Chris

From manlio_perillo at libero.it  Tue May 13 14:51:58 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 13 May 2008 14:51:58 +0200
Subject: [Web-SIG] Proposed WSGI extensions for asynchronous servers
In-Reply-To: <1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net>
References: <580DA7BB-921D-4665-AED9-0EE1DDAB14D5@csail.mit.edu>
	<1BA84A35-CB5C-45F4-9CF2-2BC66608CC9C@fuhm.net>
Message-ID: <48298EEE.7080703@libero.it>

James Y Knight ha scritto:
> 
> On May 11, 2008, at 6:15 PM, Christopher Stawarz wrote:
>> Abstract
>> --------
>>
>> This specification defines a set of extensions that allow WSGI
>> applications to run effectively on asynchronous (aka event driven)
>> servers.
>>
>> Rationale
>> ---------
>>
>> The architecture of an asynchronous server requires all I/O
>> operations, including both interprocess and network communication, to
>> be non-blocking.  For a WSGI-compliant server, this requirement
>> extends to all applications run on the server.  However, the WSGI
>> specification does not provide sufficient facilities for an
>> application to ensure that its I/O is non-blocking.  Specifically,
>> there are two issues:
>>
>> * The methods provided by the input stream (``environ['wsgi.input']``)
>>  follow the semantics of the corresponding methods of the ``file``
>>  class.
>>
>> * WSGI does not provide the application with a mechanism to test
>>  arbitrary file descriptors (such as those belonging to sockets or
>>  pipes opened by the application) for I/O readiness.
> 
> There are other issues. How do you do a DNS lookup? How do you get 
> process completion notification? Heck, how do you run a process? Once 
> you have I/O readiness information, what do you do with that? I guess 
> you'd need to write a whole new asynchronous server framework on top of 
> AWSGI? I can't see being able to use it "raw" for any real applications.
> 

This is not a problem with AWSGI.
As an example there are libraries like PostgreSQL and curl that can be 
used with an external event loop.

In the WSGI implementation for Nginx I can provide an interface for 
using the builtin supporto for asynchronous DNS client.


>> The first argument, ``fd``, is either an integer representing a file
>> descriptor or an object with a ``fileno`` method that returns such an
>> integer.  (In addition, ``fd`` may be ``x-wsgiorg.async.input``, even
>> if it lacks a ``fileno`` method.)  The second, optional argument,
>> ``timeout``, is either ``None`` or a floating-point value in seconds.
>> If omitted, it defaults to ``None``.
> 
> What if the event-loop of the server doesn't use integer fds, but 
> windows file handles or a java channel object? Where are you allowed to 
> get these integers from? Is it always a socket from 
> socket.socket().fileno()? Or can it be a file from open().fileno() or 
> os.open()? A pipe from os.pipe()? Note that these distinctions are 
> important everywhere but UNIX.
> 

This has the same problems that we have with wsgi.file_wrapper.

This is the reason, among other things, why the API in my implementation 
uses ngx.connection_wrapper and ngx.poll_register

 > [...]


Manlio Perillo

From stephan.diehl at gmx.net  Fri May 16 11:13:28 2008
From: stephan.diehl at gmx.net (Stephan Diehl)
Date: Fri, 16 May 2008 11:13:28 +0200
Subject: [Web-SIG] upgrading wsgi.org
Message-ID: <482D5038.9010406@gmx.net>

Hi,

just in case somebody have problems accessing wsgi.org: I'll upgrade the OS.

Cheers

Stephan

From manlio_perillo at libero.it  Tue May 20 18:38:22 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 20 May 2008 18:38:22 +0200
Subject: [Web-SIG] WSGI and PEP 325
Message-ID: <4832FE7E.2060508@libero.it>

The WSGI PEP explicitly mention the PEP 325 (for the application 
iterable close method).

Maybe this should be updated for the next WSGI spec, since Python 2.5 
implements the PEP 342?


Regards
Manlio Perillo

From cstawarz at csail.mit.edu  Wed May 21 02:42:48 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Tue, 20 May 2008 20:42:48 -0400
Subject: [Web-SIG] Proposed specification: waiting for file descriptor events
Message-ID: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu>

This is the third draft of my proposed extensions for better  
supporting WSGI apps on asynchronous servers.  The major changes since  
the last draft are as follows:

* The title and abstract now accurately reflect the scope of the  
proposal.
   In addition, the extensions are now in the namespace "x- 
wsgiorg.fdevent"
   (instead of "x-wsgiorg.async").

* The proposal for an alternative, non-blocking input stream has been
   dropped, since I don't see a way to add one that wouldn't break  
middleware.
   Instead, the spec recommends that async servers pre-read the  
request body
   before invoking the app (either by default or as a configurable  
option).

* The mechanism for indicating timeouts no longer requires the server to
   know what environ dict the app is using (addressing one of PJE's  
points).

* The examples have been updated.  The first one shows how an app can  
use
   pycurl to perform an outgoing HTTP request in a non-blocking fashion.

The updated spec is included below and is also available at

   http://wsgi.org/wsgi/Specifications/fdevent

The example code and some utilities are available in a bzr repository at

   http://pseudogreen.org/bzr/wsgiorg_fdevent_util

Once again, I'd appreciate your comments.


Thanks,
Chris


Abstract
--------

This specification defines a set of extensions that allow a WSGI
application to suspend its execution until an event occurs on a
specified file descriptor.

Rationale
---------

The architecture of asynchronous (aka event driven) servers requires
all I/O operations, including both interprocess and network
communication, to be non-blocking.  For a WSGI-compliant server, this
requirement extends to all applications run on the server.  However,
the WSGI specification does not provide sufficient facilities for an
application to ensure that its I/O is non-blocking.  Specifically, it
lacks a mechanism by which an application can suspend its execution
until an arbitrary file descriptor (such as one belonging to a socket
or pipe opened by the application) is ready for reading or writing.
This specification defines a standard interface by which servers can
provide such a mechanism to applications.

Specification
-------------

This specification introduces three new variables to the WSGI
environment: ``x-wsgiorg.fdevent.readable``,
``x-wsgiorg.fdevent.writable``, and ``x-wsgiorg.fdevent.timeout``.

The variables ``x-wsgiorg.fdevent.readable`` and
``x-wsgiorg.fdevent.writable`` are callable objects that accept two
positional arguments, one required and one optional.  In the following
description, these arguments are given the names ``fd`` and
``timeout``, but they are not required to have these names, and the
application **must** invoke the callables using positional arguments.

The first argument, ``fd``, is either an integer representing a file
descriptor or an object with a ``fileno`` method that returns such an
integer.  The set of acceptable file descriptors is defined to be
those accepted by ``select.select``.  (Note that this set is platform
dependent: only sockets are allowed on Windows, whereas sockets,
pipes, and files are acceptable on Unix-like systems.)  The second,
optional argument, ``timeout``, is either ``None`` or a floating-point
value in seconds.  If omitted, it defaults to ``None``.

When called, ``x-wsgiorg.fdevent.readable`` and
``x-wsgiorg.fdevent.writable`` return the empty string (``''``), which
**must** be yielded by the application iterable to the server (passing
through any middleware).  The server then suspends execution of the
application until one of the following conditions is met:

* The specified file descriptor is ready for reading (if the
   application called ``x-wsgiorg.fdevent.readable``) or writing (if
   the application called ``x-wsgiorg.fdevent.writable``).

* ``timeout`` seconds have elapsed without the desired file descriptor
   event occurring (unless the value of ``timeout`` is ``None``, in
   which case the wait will never timeout).

* The server detects an error or "exceptional" condition (such as
   out-of-band data) on the file descriptor.

Put another way, if the application calls
``x-wsgiorg.fdevent.readable`` and yields the empty string, it will be
suspended until ``select.select([fd],[],[fd],timeout)`` would return.
If the application calls ``x-wsgiorg.fdevent.writable`` and yields the
empty string, it will be suspended until
``select.select([],[fd],[fd],timeout)`` would return.

The variable ``x-wsgiorg.fdevent.timeout`` is an object whose truth
value can be changed by the server.  (For example, it could be a
``list`` instance, whose truth value is false when empty, true
otherwise.)  If ``timeout`` seconds elapse without the desired file
descriptor event occurring, ``x-wsgiorg.fdevent.timeout`` will be true
when the application resumes; otherwise, it will be false.  The truth
value of ``x-wsgiorg.fdevent.timeout`` when the application is first
started or after it yields each response-body string is undefined.

The server may use any technique it desires to detect events on an
application's file descriptors.  (Most likely, it will add them to the
same event loop that it uses for accepting new client connections,
receiving requests, and sending responses.)

Handling of the Input Stream
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

While technically outside the scope of this specification, the
application's input stream (``environ['wsgi.input']``) is another
source of potentially blocking I/O that deserves mention.

The methods provided by the input stream follow the semantics of the
corresponding methods of the ``file`` class.  In particular, each of
these methods can invoke the underlying I/O function (in this case,
``recv`` on the socket connected to the client) more than once,
without giving the application the opportunity to check whether each
invocation will block.  Although authors of asynchronous servers may
be tempted to provide a non-standard input stream that supports
on-demand, non-blocking reads, such an input stream would be
incompatible with WSGI middleware.

In order to avoid these problems, it is strongly recommended that
asynchronous servers pre-read the entire request body before invoking
the application, either by default or as a configurable option.  Doing
so will ensure that the input stream is compatible with middleware and
that reads from it are always non-blocking.

Examples
--------

The following application acts as a proxy to `python.org
<http://python.org/>`_.  It uses a ``pycurl.CurlMulti`` instance to
perform the outgoing HTTP request in a non-blocking fashion.  When the
``CurlMulti.perform`` method detects that its next I/O operation would
block, it returns control to the application, which then yields until
the file descriptor of interest becomes readable or writable as
required.  If the descriptor is not ready after one second, the
application sends a ``504 Gateway Timeout`` response to the client and
terminates::

   def pyorg_proxy(environ, start_response):
       result = StringIO()

       c = pycurl.Curl()
       c.setopt(pycurl.URL, 'http://python.org' + environ['PATH_INFO'])
       c.setopt(pycurl.WRITEFUNCTION, result.write)

       m = pycurl.CurlMulti()
       m.add_handle(c)

       while True:
           while True:
               ret, num_handles = m.perform()
               if ret != pycurl.E_CALL_MULTI_PERFORM:
                   break
           if not num_handles:
               break

           read, write, exc = m.fdset()
           if read:
               yield environ['x-wsgiorg.fdevent.readable'](read[0], 1.0)
           else:
               yield environ['x-wsgiorg.fdevent.writable'](write[0],  
1.0)

           if environ['x-wsgiorg.fdevent.timeout']:
               msg = 'The request to python.org timed out.'
               start_response('504 Gateway Timeout',
                              [('Content-Type', 'text/plain'),
                               ('Content-Length', str(len(msg)))])
               yield msg
               return

       start_response('200 OK', [('Content-Type', 'application/octet- 
stream'),
                                 ('Content-Length', str(result.len))])
       yield result.getvalue()

The following adapter allows an application that uses the
``x-wsgiorg.fdevent`` extensions to run on a server that does not
support them, without any modification to the application's code::

   def with_fdevent(application):
       def wrapper(environ, start_response):
           select_args = [None]

           def readable(fd, timeout=None):
               select_args[0] = ([fd], [], [fd], timeout)
               return ''

           def writable(fd, timeout=None):
               select_args[0] = ([], [fd], [fd], timeout)
               return ''

           environ['x-wsgiorg.fdevent.readable'] = readable
           environ['x-wsgiorg.fdevent.writable'] = writable

           timeout = False
           class TimeoutWrapper(object):
               def __nonzero__(self):
                   return timeout

           environ['x-wsgiorg.fdevent.timeout'] = TimeoutWrapper()

           for result in application(environ, start_response):
               if result or (not select_args[0]):
                   yield result
               else:
                   ready = select.select(*select_args[0])
                   timeout = (ready == ([], [], []))
                   select_args[0] = None

       return wrapper

Problems
--------

* The empty string yielded by an application after calling
   ``x-wsgiorg.fdevent.readable`` or ``x-wsgiorg.fdevent.writable``
   must pass through any intervening middleware and be detected by the
   server.  Although WSGI explicitly requires middleware to relay such
   strings to the server (see `Middleware Handling of Block Boundaries
   <http://python.org/dev/peps/pep-0333/#middleware-handling-of-block-boundaries 
 >`_),
   some components may not, making them incompatible with this
   specification.

Other Possibilities
-------------------

* To prevent an application that does blocking I/O from blocking the
   entire server, an asynchronous server could run each instance of the
   application in a separate thread.  However, since asynchronous
   servers achieve high levels of concurrency by expressly *avoiding*
   multithreading, this technique will almost always be unacceptable.

* The `greenlet <http://codespeak.net/py/dist/greenlet.html>`_ package
   enables the use of cooperatively-scheduled micro-threads in Python
   programs, and a WSGI server could potentially use it to pause and
   resume applications around blocking I/O operations.  However, such
   micro-threading is not part of the Python language or standard
   library, and some server authors may be unwilling or unable to make
   use of it.

Open Issues
-----------

* Some third-party libraries (such as `PycURL
   <http://pycurl.sourceforge.net/>`_) provide non-blocking interfaces
   that may need to monitor multiple file descriptors for events
   simultaneously.  Since this specification allows an application to
   wait on only one file descriptor at a time, application authors may
   find it difficult or impossible to use such libraries, or they may
   be limited to a subset of the libraries' capabilities.

   Although this specification could be extended to include an
   interface for waiting on multiple file descriptors, it is unclear
   whether it would be easy (or even possible) for all servers to
   implement it.  Also, the appropriate behavior for a multi-descriptor
   wait is not obvious.  (Should the application be resumed when a
   single descriptor is ready?  All of them?  Some minimum number?)


From manlio_perillo at libero.it  Wed May 21 19:34:19 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Wed, 21 May 2008 19:34:19 +0200
Subject: [Web-SIG] Proposed specification: waiting for file descriptor
 events
In-Reply-To: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu>
References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu>
Message-ID: <48345D1B.2030905@libero.it>

Christopher Stawarz ha scritto:
> This is the third draft of my proposed extensions for better supporting 
> WSGI apps on asynchronous servers.  The major changes since the last 
> draft are as follows:
> 

First of all, thanks for your effort.

> * The title and abstract now accurately reflect the scope of the proposal.
>   In addition, the extensions are now in the namespace "x-wsgiorg.fdevent"
>   (instead of "x-wsgiorg.async").
> 
> * The proposal for an alternative, non-blocking input stream has been
>   dropped, since I don't see a way to add one that wouldn't break 
> middleware.

Well, IMHO the "general" solution here is to use greenlets.

>   Instead, the spec recommends that async servers pre-read the request body
>   before invoking the app (either by default or as a configurable option).
> 

This is the best solution most of the time (but not for all of the 
time), especially if the "server" can do some "pre-parsing" of 
multipart/form-data request body.

In fact I plan to write a custom function (in C for Nginx) that will 
"reduce", as an example:

    Content-Type: multipart/form-data; boundary=AaB03x

    --AaB03x
    Content-Disposition: form-data; name="submit-name"

    Larry
    --AaB03x
    Content-Disposition: form-data; name="files"; filename="file1.txt"
    Content-Type: text/plain

    ... contents of file1.txt ...
    --AaB03x--

to (not properly escaped):

Content-Type: application/x-www-form-urlencoded

submit-name=Larry&files.filename=file1.txt&files.ctype=text/plain&files.path=xxx


and the contents of file1.txt will be saved to a temporary file 'xxx'.


> 
> Once again, I'd appreciate your comments.
>


I have some comments:

1) Why not add a more generic poll like interface?

    Moreover IMHO storing a timeout variable in the environ to check if
    the previous call timedout, is not the best solution.

    In my implementation I return a function, but with generators in
    Python 2.5 this can be done in a better way.

2) In Nginx it is not possible to simply handle "plain" file
    descriptors, since these are wrapped in a connection structure.

    This is the reason why I had to add a connection_wrapper function in
    my WSGI module for Nginx.

3) If you read an example that implements a database connection pool:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py

    you can see that there is a problem.

    In fact the pool is not very flexible; the application can not handle
    more than POOL_SIZE concurrent requests.

    However it is possible to just have a new request to wait until a
    previous connection is free (or a timeout occurs).

    I have attached an example (it is not in the repository since there
    are some problems).

    The examples use a new extension:

      - ctx = environ['ngx.request_context']()
      - ctx.resume()

    ctx.resume() "asynchronously" resumes the given request
    (it will be resumed as soon as control returns to Nginx, when the
     application yields something).


    Note that the problem of resuming another request is easily solved
    with greenlets, without the need to new extensions
    (this is one of the reason why I like greenlets).


 > [...]


Regards  Manlio Perillo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nginx-postgres-async-2.py
Type: text/x-python
Size: 4155 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/web-sig/attachments/20080521/4e5ccc3b/attachment.py>

From manlio_perillo at libero.it  Thu May 22 10:51:09 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Thu, 22 May 2008 10:51:09 +0200
Subject: [Web-SIG] WSGI and greenlets
In-Reply-To: <9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<48203045.60504@libero.it>
	<8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu>
	<48216BE7.5010000@libero.it>
	<9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu>
Message-ID: <483533FD.8090707@libero.it>

Christopher Stawarz ha scritto:
> On May 7, 2008, at 4:44 AM, Manlio Perillo wrote:
> [...]
>> I don't think this will solve the problem.
>> Moreover in your example you buffer the whole request body so that you 
>> have to yield only one time.
> 
> Your example was:
> 
> def application(environ, start_response):
>   def nested():
>      while True:
>         poll(xxx)
>         yield ''
>      yield result
> 
>   for r in nested():
>      if not r:
>          yield ''
> 
>   yield r
> 
> My suggestion would allow you to rewrite this like so:
> 
> @awsgiref.callstack.add_callstack
> def application(environ, start_response):
>   def nested():
>      while True:
>         poll(xxx)
>         yield ''
>      yield result
> 
>   yield nested()
> 
> The nesting can be arbitrarily deep, so nested() could yield 
> doubly_nested() and so on.  While not as elegant as greenlets, I think 
> this does address your concern.
> 


I'm reading the PEP 342, and I still think that this will not work as I 
want for Nginx (where I have no control over the "scheduler").

In fact the PEP 342 says:
"""However, if it were possible to pass values or exceptions *into* a
generator at the point where it was suspended, a simple co-routine
scheduler or "trampoline function" would let coroutines "call" each
other without blocking."""


However writing a co-routine scheduler or "trampoline function" when 
your application is embedded in an external server is not possible (but 
please, correct me if I'm wrong).


 > [...]


Regards   Manlio Perillo

From cstawarz at csail.mit.edu  Thu May 22 18:30:47 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Thu, 22 May 2008 12:30:47 -0400
Subject: [Web-SIG] Proposed specification: waiting for file descriptor
	events
In-Reply-To: <48345D1B.2030905@libero.it>
References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu>
	<48345D1B.2030905@libero.it>
Message-ID: <4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu>

On May 21, 2008, at 1:34 PM, Manlio Perillo wrote:

>>  Instead, the spec recommends that async servers pre-read the  
>> request body
>>  before invoking the app (either by default or as a configurable  
>> option).
>
> This is the best solution most of the time (but not for all of the  
> time), especially if the "server" can do some "pre-parsing" of  
> multipart/form-data request body.
>
> In fact I plan to write a custom function (in C for Nginx) that will  
> "reduce", as an example:
>
>   Content-Type: multipart/form-data; boundary=AaB03x
>
>   --AaB03x
>   Content-Disposition: form-data; name="submit-name"
>
>   Larry
>   --AaB03x
>   Content-Disposition: form-data; name="files"; filename="file1.txt"
>   Content-Type: text/plain
>
>   ... contents of file1.txt ...
>   --AaB03x--
>
> to (not properly escaped):
>
> Content-Type: application/x-www-form-urlencoded
>
> submit-name=Larry&files.filename=file1.txt&files.ctype=text/ 
> plain&files.path=xxx
>
>
> and the contents of file1.txt will be saved to a temporary file 'xxx'.

It seems like you're making this more complicated than it needs to  
be.  Why not just store the entire request body in a temporary file,  
and then pass an open handle to it as wsgi.input?  That way, the  
server doesn't have to rewrite the request, and the application  
doesn't need to know how to interpret the files.* parameters.

> 1) Why not add a more generic poll like interface?

Because such an interface would be more complicated than what I've  
proposed and harder for server authors to implement.  Also, I'm not  
sure that it gains you much.

Note that I'm not 100% sure on this, as I tried to indicate in the  
"Open Issues" section of my proposal.  The approach I'd like to take  
is to try writing apps with my interface for a while, and if real- 
world usage shows that a poll-like interface would be very useful (or  
necessary), then the spec could be extended to add one.  I think this  
is a safe route, since the readable/writable functions could easily be  
implemented in terms of a more generic poll-like interface, so  
existing apps that use the fdevent extensions would continue to work.

>   Moreover IMHO storing a timeout variable in the environ to check if
>   the previous call timedout, is not the best solution.

I think it's a simple and effective solution.  Server authors don't  
need to implement any new functions or data types.  They just create  
and hold on to a mutable object instance (the simplest being a list  
instance) for each app instance and toggle its truth value as required.

>   In my implementation I return a function, but with generators in
>   Python 2.5 this can be done in a better way.

What advantage does this have over what I've proposed?

> 2) In Nginx it is not possible to simply handle "plain" file
>   descriptors, since these are wrapped in a connection structure.
>
>   This is the reason why I had to add a connection_wrapper function in
>   my WSGI module for Nginx.

But the connection structure just wraps an integer file descriptor,  
right?  So the readable/writable functions can create the required  
wrapper to register with nginx. There's no reason to make the  
application author do it.

> 3) If you read an example that implements a database connection pool:
> http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-postgres-async.py
>
>   you can see that there is a problem.
>
>   In fact the pool is not very flexible; the application can not  
> handle
>   more than POOL_SIZE concurrent requests.
>
>   However it is possible to just have a new request to wait until a
>   previous connection is free (or a timeout occurs).
>
>   I have attached an example (it is not in the repository since there
>   are some problems).
>
>   The examples use a new extension:
>
>     - ctx = environ['ngx.request_context']()
>     - ctx.resume()
>
>   ctx.resume() "asynchronously" resumes the given request
>   (it will be resumed as soon as control returns to Nginx, when the
>    application yields something).
>
>
>   Note that the problem of resuming another request is easily solved
>   with greenlets, without the need to new extensions
>   (this is one of the reason why I like greenlets).

Right, you want something like Queue.Queue, but for exchanging data  
between request handlers in the same thread.  Since this is a  
different problem from waiting on file descriptors, it's outside the  
scope of my proposal.  However, one way you might implement something  
like this using my proposal would be to run the connection-pool  
manager in a separate thread, and have request handlers talk to it  
over sockets.  Kind of ugly, but I think it would do the job.


Chris

From cstawarz at csail.mit.edu  Thu May 22 20:10:02 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Thu, 22 May 2008 14:10:02 -0400
Subject: [Web-SIG] WSGI and greenlets
In-Reply-To: <483533FD.8090707@libero.it>
References: <9DE10165-DC87-4AD0-A1EC-606C7E35F108@csail.mit.edu>
	<48203045.60504@libero.it>
	<8FF19D43-D9DF-4BEA-B2F1-31A4BD4A7296@csail.mit.edu>
	<48216BE7.5010000@libero.it>
	<9A1AD097-A95A-4F0C-86B0-2FA50E31014E@csail.mit.edu>
	<483533FD.8090707@libero.it>
Message-ID: <5D29F147-3619-4156-A4F4-D7FD2EE2AFB1@csail.mit.edu>

On May 22, 2008, at 4:51 AM, Manlio Perillo wrote:

> I'm reading the PEP 342, and I still think that this will not work  
> as I want for Nginx (where I have no control over the "scheduler").
>
> In fact the PEP 342 says:
> """However, if it were possible to pass values or exceptions *into* a
> generator at the point where it was suspended, a simple co-routine
> scheduler or "trampoline function" would let coroutines "call" each
> other without blocking."""
>
> However writing a co-routine scheduler or "trampoline function" when  
> your application is embedded in an external server is not possible  
> (but please, correct me if I'm wrong).

That's correct.  My with_callstack wrapper supports calling  
subroutines (which can yield values to the server or return results to  
their caller) within a single application instance.  It doesn't  
support switching between app instances, since that's the server's  
job.  Therefore, it doesn't help with your DB connection pool example.


Chris

From manlio_perillo at libero.it  Fri May 23 00:21:13 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Fri, 23 May 2008 00:21:13 +0200
Subject: [Web-SIG] Proposed specification: waiting for file descriptor
 events
In-Reply-To: <4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu>
References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu>
	<48345D1B.2030905@libero.it>
	<4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu>
Message-ID: <4835F1D9.3070406@libero.it>

Christopher Stawarz ha scritto:
> On May 21, 2008, at 1:34 PM, Manlio Perillo wrote:
> 
>>>  Instead, the spec recommends that async servers pre-read the request 
>>> body
>>>  before invoking the app (either by default or as a configurable 
>>> option).
>>
>> This is the best solution most of the time (but not for all of the 
>> time), especially if the "server" can do some "pre-parsing" of 
>> multipart/form-data request body.
>>
>> In fact I plan to write a custom function (in C for Nginx) that will 
>> "reduce", as an example:
>>
>>   Content-Type: multipart/form-data; boundary=AaB03x
>>
>>   --AaB03x
>>   Content-Disposition: form-data; name="submit-name"
>>
>>   Larry
>>   --AaB03x
>>   Content-Disposition: form-data; name="files"; filename="file1.txt"
>>   Content-Type: text/plain
>>
>>   ... contents of file1.txt ...
>>   --AaB03x--
>>
>> to (not properly escaped):
>>
>> Content-Type: application/x-www-form-urlencoded
>>
>> submit-name=Larry&files.filename=file1.txt&files.ctype=text/plain&files.path=xxx 
>>
>>
>>
>> and the contents of file1.txt will be saved to a temporary file 'xxx'.
> 
> It seems like you're making this more complicated than it needs to be.  
> Why not just store the entire request body in a temporary file, and then 
> pass an open handle to it as wsgi.input?  

Because if you have a big file (like a video of > 100 MB), your 
application will block everything while parsing the request body.

Parsing the body incrementally is far more efficient (although it is 
more hard).


> That way, the server doesn't 
> have to rewrite the request, and the application doesn't need to know 
> how to interpret the files.* parameters.
> 

How to interpret the files.* parameters is not really a problem.

>> 1) Why not add a more generic poll like interface?
> 
> Because such an interface would be more complicated than what I've 
> proposed and harder for server authors to implement.  Also, I'm not sure 
> that it gains you much.
> 

Well, I have modelled my extension so that it has a "well know" 
interface and that it is not hard to implement.

But I have to say that I'm not sure if one want to poll multiple sockets.

Moreover in my implementation ngx.poll only returns one "ready" socket 
at a time.


By the way: I see a problem with you API.
What happens if an application do:

     read, write, exc = m.fdset()

     environ['x-wsgiorg.fdevent.readable'](read[0], 1.0)
     environ['x-wsgiorg.fdevent.writable'](write[0], 1.0)

     yield ''


There is no way to know, when the application is resumed, if the socket 
is ready for read or write.

This probabily should not be a problem, but I'm not sure.

> Note that I'm not 100% sure on this, as I tried to indicate in the "Open 
> Issues" section of my proposal.  The approach I'd like to take is to try 
> writing apps with my interface for a while, and if real-world usage 
> shows that a poll-like interface would be very useful (or necessary), 
> then the spec could be extended to add one.  I think this is a safe 
> route, since the readable/writable functions could easily be implemented 
> in terms of a more generic poll-like interface, so existing apps that 
> use the fdevent extensions would continue to work.
> 
>>   Moreover IMHO storing a timeout variable in the environ to check if
>>   the previous call timedout, is not the best solution.
> 
> I think it's a simple and effective solution.  Server authors don't need 
> to implement any new functions or data types.  They just create and hold 
> on to a mutable object instance (the simplest being a list instance) for 
> each app instance and toggle its truth value as required.
> 
>>   In my implementation I return a function, but with generators in
>>   Python 2.5 this can be done in a better way.
> 
> What advantage does this have over what I've proposed?
> 

You don't need to store a mutable variable in the environ.

>> 2) In Nginx it is not possible to simply handle "plain" file
>>   descriptors, since these are wrapped in a connection structure.
>>
>>   This is the reason why I had to add a connection_wrapper function in
>>   my WSGI module for Nginx.
> 
> But the connection structure just wraps an integer file descriptor, 
> right?  So the readable/writable functions can create the required 
> wrapper to register with nginx. There's no reason to make the 
> application author do it.
> 

The "problem" is that Ninx keeps a list of preallocated connection 
objects (the size of the list being controlled by worker_connections).

This means that a newly constructed connection *must* be freed as soon 
as it is no more used, otherwise it can limit the number of concurrent 
connections that can be handled by Nginx.

Since with my API (register/unregister) a connection should be kept 
alive until is is unregistered, I have choosen to create a wrapper for 
the Nginx connection object.


Probabily with your API it can be possible to create temporary wrappers.
But I don't know if this is a good idea.

> [...]


> Chris
> 


Manlio Perillo

From cstawarz at csail.mit.edu  Fri May 23 17:12:37 2008
From: cstawarz at csail.mit.edu (Christopher Stawarz)
Date: Fri, 23 May 2008 11:12:37 -0400
Subject: [Web-SIG] Proposed specification: waiting for file descriptor
	events
In-Reply-To: <4835F1D9.3070406@libero.it>
References: <0906B338-B65E-43C8-9DCF-A175C3D34A8A@csail.mit.edu>
	<48345D1B.2030905@libero.it>
	<4E3FC27D-F2FD-4E70-9AFF-8BEE7E39C2C9@csail.mit.edu>
	<4835F1D9.3070406@libero.it>
Message-ID: <B7D0479F-5216-4BA7-A25F-45DF314809A3@csail.mit.edu>

On May 22, 2008, at 6:21 PM, Manlio Perillo wrote:

>> That way, the server doesn't have to rewrite the request, and the  
>> application doesn't need to know how to interpret the files.*  
>> parameters.
>
> How to interpret the files.* parameters is not really a problem.

It's a problem for a portable application, which will have to be able  
to parse both the original request and your server's rewritten version  
of it.

In any case, your request rewriting is compatible with my proposal.

> By the way: I see a problem with you API.
> What happens if an application do:
>
>    read, write, exc = m.fdset()
>
>    environ['x-wsgiorg.fdevent.readable'](read[0], 1.0)
>    environ['x-wsgiorg.fdevent.writable'](write[0], 1.0)
>
>    yield ''
>
> There is no way to know, when the application is resumed, if the  
> socket is ready for read or write.
>
> This probabily should not be a problem, but I'm not sure.

The result of doing this is undefined, and I've updated the spec to  
say so.  The application shouldn't do it, and the server should  
probably throw an error if it does.

>>>  In my implementation I return a function, but with generators in
>>>  Python 2.5 this can be done in a better way.
>> What advantage does this have over what I've proposed?
>
> You don't need to store a mutable variable in the environ.

I don't see any problem with a mutable environ variable, especially if  
it makes things simpler for server and application authors.  But if  
you want to do something fancier (like raising a Timeout exception in  
a Python 2.5 generator), then it's easy to write a wrapper that does so.


Chris