[Web-SIG] environ["wsgi.input"].read()

Sun Jan 27 07:44:42 CET 2008

On 26/01/2008, Brian Smith <brian at briansmith.org> wrote:
> 1. PEP 333 doesn't indicate that the size parameter for the read() method is optional. Is it optional or required? If it is optional, is the default value -1?
>
> 2. What are the semantics of environ["wsgi.input"].read(-1) when Content-Length is provided? Is it guaranteed to return the entire request entity, up to at most <Content-Length> bytes?
>
> 3. What are the semantics of environ["wsgi.input"].read(-1) when the response has no Content-Length? Can environ["wsgi.input"].read(-1) be used (as the only available mechanism) to read a chunked response entity?
>
> Putting all this together, are these two programs correct?:
>
> def application(environ, start_response):
>         start_response("200 OK", [])
>         yield environ["wsgi.input"].read()
>
> def application(environ, start_response):
>         start_response("200 OK", [])
>         yield environ["wsgi.input"].read(-1)
>
> This is another issue where there is a lot of variance between gateways, where I think a clarification in the specification is needed.

I have brought up the issue of chunked encoding and mutating input
filters previously, whether they be implemented in Apache or as WSGI
middleware. For the outcome of that discussion see:

  http://groups.google.com/group/python-web-sig/browse_frm/thread/25bf70b49a90e0c0

As to your questions about read() with no argument, or with
traditional Python file like object default of -1, the only WSGI
server/adapter I know of where this will NOT work as one would expect,
ie., read remainder of request content, is the CherryPy WSGI adapter.

As far as I know it works fine with Apache CGI WSGI adapters, Apache
mod_wsgi, plus SCGI, FASTCGI and AJP adapters via flup, as well as
with paste WSGI server. Not sure what wsgiref will do though.

The reason it doesn't work with CherryPy WSGI server comes down to the
problem I highlighted recently. That was the questions I posed in:

  http://groups.google.com/group/python-web-sig/browse_frm/thread/e46e72cc812870c6

about WSGI adapters not discarding request content which was not consumed.

What it all comes down to is that CherryPy WSGI server, unless it has
changed, chooses not to simulate EOF as per:

"""The server is not required to read past the client's specified
Content-Length, and is allowed to simulate an end-of-file condition if
the application attempts to read past that point. The application
should not attempt to read more data than is specified by the
CONTENT_LENGTH variable."""

from specification.

It is because it just supplies the socket as wsgi.input that it can't
do this and that it doesn't do this also leads to the problems with it
not being able to discard request content which wasn't consumed,
thereby causing problems when request pipelining is occurring as the
unconsumed input gets interpreted as the headers of the subsequent
request.

In contrast, the paste server wraps any actual socket in
LimitedLengthFile which simulates EOF but also allows how much content
is remaining to be tracked and thus allowing it to be discarded at the
end of the request if not consumed.

If the WSGI specification simply required that EOF be simulated then
read() with no arguments, or -1 argument, could mean return all
remaining content with absolutely no problems. Implementations would
also naturally lend themselves to dealing with unconsumed input
correctly.

This would subsequently also allow mutating input filters which change
the content length, which could then be flagged by setting
Content-Length header to -1.

What this still doesn't solve is chunked request content. But then, I
don't believe the existing read() method is suitable for that, as what
you want with chunked request content, is not return me all input, but
return me the next available chunk. As such, some sort of separate
abstraction may be required for dealing with chunked request content,
using a special argument to read() just isn't going to work.

Anyway, in the past, as with many issues it seems people just want to
shove this all to be worried about in WSGI 2.0 rather than actually
trying to fix all the inconsistencies and sub optimal stuff in WSGI
1.0.

All in all I can appreciate the problems some feel in respect of
trying to write a true portable WSGI application. If you keep to the
core stuff all is okay, start to do complex stuff where the PEP isn't
perhaps well defined and you start to run into problems as to what it
means and whether it is actually portable. Waiting for WSGI 2.0 isn't
really an option since it isn't even going to be interface compatible
and frankly may never get done anyway because people will think 1.0 is
good enough even if it is not as good as it could be.

Because I still feel that these details should be fixed prior to WSGI
2.0, am going to add this and some of the other issues raised recently
to:

  http://www.wsgi.org/wsgi/Amendments_1.0

Graham