[Web-SIG] Use cases for file-like objects (was Re: Bill's comments on WSGI draft 1.4)

Wed Sep 8 17:55:20 CEST 2004

At 03:25 PM 9/8/04 +0100, Alan Kennedy wrote:
>Well, I see sendfile functionality as being much more than widespread than 
>that. Java.nio, for example, has excellent support for fast "channel 
>transfers" between file channels and other writable channel types.

Well, I see a few options here, then.  We can use 'wsgi.file_wrapper' to 
wrap Python 'file' objects, allowing each platform to dig into the file 
object and get at the file descriptor, nio, or what-have-you in a platform 
specific way.  As long as it remains an optional extension, I'm fine with that.

Another option is to have separate 'wsgi.nio_wrapper', 'wsgi.fd_wrapper', 
and so on, for different physical backend types.

> > [other cases snipped]
> >
> > These are all very simple, one-line solutions (at least for 2.2+) and
> > have the advantage of being explicit, and refusing the temptation to
> > guess.  The application is in total control of how the resource will
> > be transmitted.
>
>Well, I suppose the key question here is "should the application be in 
>total control of how the resource is transmitted"?

Yes, because of the need for backward compatibility.  I realize that most 
people discussing WSGI here on the Web-SIG seem more interested in new 
applications than old, but backward compatibility is critical and that 
means apps must have control that's comparable to what they have today.

>Welsh describes the use of thread-pools of a fixed "width" to service 
>particular request types, with requests shunted between those (otherwise 
>isolated) thread pools using queues.

The description you use here sounds exactly like typical Python async 
servers today: they have fixed-size threadpools for running "application" 
code, and another fixed size thread pool (width=1) for I/O.

>So I suppose my real concern is that by relegating disk-originating byte 
>streams to being second-class citizens under WSGI, we might hinder the 
>portability of some highly-desirable server architectural approaches.

We're not; we're simply requiring that any functionality more sophisticated 
than an iterable be treated as an optional extension, that the application 
has to check for and opt to use.  The application developer is motivated to 
do this because of the promise of extra performance when run on platforms 
that support the boost.  But middleware developers don't have to think 
about it because they always have access to the data in iterable form.

> > After thinking about the 'file_wrapper' idea some more, I'm thinking
> > that this way works better for everything but the issue of closing
> > files.  However, my example 'file_wrapper' class should maybe be
> > included in the PEP under an application note about sending files and
> > file-like objects.
>
>Perhaps a "finalise" method might be appropriate?
>
>Just thinking through some scenarios here:
>
>What happens if the server is just about to start serving a multi-megabyte 
>PDF file back to a client socket, and then the client closes the socket, 
>i.e. the user cancelled their request. What should the server do in that 
>case? Should it continue to iterate through the iterable right until the 
>end, discarding the results? Or should it just drop the iterable on the 
>floor, to be sorted out by GC (and thus potentially wasting 
>file-descriptors)? Or should it attempt to finalise the iterable, so that 
>all related resource is freed?

The current spec requires that the iterable's 'close()' method be called at 
the termination of the request, whether the iterator was exhausted or 
not.  So, the server is free to cancel iteration when a client connection 
is lost.

>Does these considerations also apply when the bytestream being transferred 
>is not "physical", i.e. coming from a file-descriptor/channel. What if the 
>bytestream is coming from an iterable yielding several megabytes of python 
>strings, from a page rendering component, for example. How does the server 
>tell the application to stop, because the client is no longer interested? 
>Does it simply drop the iterable on the floor and forget about it?
>
>Might the application have a need to know that the client aborted the 
>request, for example in E-commerce scenarios? If the application did need 
>to know, how could the server inform the application?

By calling 'close()' on the iterable, as the spec requires.  Until PEP 325 
is implemented, though, generators have to be wrapped in a custom iterable 
in order to support this functionality, e.g.:

     class MyApp:

         def __init__(self,environ,start_response):
             # setup code here

         def __iter__(self):
             # generator yielding results

         def close(self):
             # cleanup code here

There are of course other ways to do the same basic thing, such as my 
file_wrapper example class.  But, once PEP 325 is implemented, you'll be 
able to use try/finally in the generator body, and the finally block will 
be executed when close() is called or the generator is garbage 
collected.  (PEP 325 was written by Samuele Pedroni, so I assume he intends 
to implement it in Jython, too.)