[Web-SIG] WSGI & transfer-encodings

Thu Sep 16 21:03:53 CEST 2004

On Sep 16, 2004, at 2:30 PM, Phillip J. Eby wrote:

> Hm.  An interesting conundrum.  Do any Python servers or applications 
> exist today that *work* when there's no content-length?

Unknown.

> Personally, I'm thinking that WSGI should follow CGI here, and decode 
> incoming transfer encodings.  If this means HTTP/1.1 servers have to 
> dump the incoming data to a file first, so be it.

Following CGI means: do not allow requests without a Content-Length. No 
servers I know of will dump the data to a file to determine the length 
first before sending to a CGI. I would not ask them to either: that's 
like saying "Pleeease denial of service me!". And, really, the only 
place I've seen incoming chunked requests used is for streaming data -- 
and that will "never" finish.

>> The only way to tell if there's incoming data is therefore to attempt 
>> to read() the input stream. read() will either immediately return an 
>> EOF condition (returning '') or else read the data. Also, it seems 
>> that read() with no args isn't allowed? Perhaps it should be.
>
> A no-argument read would be problematic in some environments -- CGI 
> for example.

No -- CGI requires CONTENT_LENGTH, so in the CGI environment it is 
perfectly possible to simulate EOF at the end of the data. read could 
look something like this:

class CGIReq:
   def __init__(self):
     self.maxlength = int(environ.get('CONTENT_LENGTH', 0))

   def read(self, length=None):
     if length is None:
       length = self.maxlength
     else:
       length = min(self.maxlength, length)
     data = sys.stdin.read(length)
     self.maxlength -= len(data)
     return data

>> - Wouldn't providing pre-encoded data screw up middleware that is 
>> expecting to do something useful with the data going through it?
>
> Yes, it would.  There are at least two ways to handle it, though:
>
> 1. Don't use middleware that's not smart enough to handle your app's 
> output
>
> 2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other 
> parameters on the way in to the application, so that the application 
> (if written correctly) won't send data the server or middleware can't 
> handle.

You've confused Content-Encoding with Transfer-Encoding. TE is the 
request header that goes with Transfer-Encoding response header. And 
according to HTTP 1.1, chunked is always acceptable, so no amount of 
header munging can change that. So under the "WSGI application is a 
HTTP origin server" interpretation, all pieces of middleware must be 
prepared to deal with chunked output. I think that's silly -- there is 
no reason for a WSGI application to produce chunked-encoded strings, as 
it already has a way to produce chunks via the iterator.

>> I would suggest that that the correct answer is: the application 
>> should have nothing to do with any connection oriented behavior. It 
>> should not send a Connection or Transfer-Encoding header and should 
>> not expect to receive the Connection, Keep-Alive, TE, Trailers, 
>> Transfer-Encoding, or Upgrade headers, although it is optional for 
>> the server to strip them. The application should not apply a 
>> transfer-encodng to its output and the server should not give it a 
>> transfer-encoded input.
>
> I like most of this, *except* that I'd like to leave open the option 
> of an application providing transfer-encoding on its output.  I'd 
> rather have servers and middleware set HTTP_ACCEPT_ENCODING to 
> "identity;q=1.0, *;q=0" (or an empty string, or delete the entry), if 
> they interpret content, and have applications be required to respect 
> this.  Specifically, an application can only apply a content-encoding 
> if it matches a non-zero quality in HTTP_ACCEPT_ENCODING.

Again: I'm talking only about Transfer-Encoding, not Content-Encoding. 
Content-Encoding is an end-to-end function and thus properly belongs to 
the application. Transfer-Encoding is a hop-by-hop header, and properly 
belongs to the server. If you want a transfer-encoded output, you can 
always request it via a server-specific extension or configuration 
mechanism.

Both Transfer-Encoding and Content-Encoding have a gzip argument, but 
these mean significantly different things. The first is connection 
compression, the second is transferring a compressed file over an 
uncompressed connection.

James