[Web-SIG] Chunked Tranfer encoding on request content.

Wed Sep 5 13:55:14 CEST 2007

On 05/09/07, Mark Nottingham <mnot at mnot.net> wrote:
> Are you actually seeing chunked request bodies in the wild? If so,
> from what UAs?
>
> IME they're not very common, because of lack of support in most
> servers, and some interop issues with proxies (IIRC).

It has come up as an issue on mod_python list a couple of times. Agree
though that it isn't common. From memory the people were using custom
user agents designed for a special purpose.

Just because it isn't common doesn't mean that an attempt shouldn't be
made to support it, especially if it is part of the HTTP standard.

Also, the same solution for handling this would also be applicable in
cases where mutating input filters are used which change the length of
the request content but are unable to update the content length
header. Thus, like with chunked encoding, a way is needed in this
circumstance to indicate that there is content, but the length isn't
known.

Graham

> On 05/03/2007, at 10:28 AM, Graham Dumpleton wrote:
>
> > The WSGI specification doesn't really say much about chunked
> > transfer encoding
> > for content sent within the body of a request. The only thing that
> > appears to
> > apply is the comment:
> >
> >   WSGI servers must handle any supported inbound "hop-by-hop"
> > headers on their
> >   own, such as by decoding any inbound Transfer-Encoding, including
> > chunked
> >   encoding if applicable.
> >
> > What does this really mean in practice though?
> >
> > As a means of getting feedback on what is the correct approach I'll
> > go through
> > how the CherryPy WSGI server handles it. The problem is that the
> > CherryPy
> > approach raises a few issues which makes me wander if it is doing
> > it in the
> > most appropriate way.
> >
> > In CherryPy, when it sees that the Transfer-Encoding is set to
> > 'chunked' while
> > parsing the HTTP headers, it will at that point, even before it has
> > called
> > start_response for the WSGI application, read in all content from
> > the body of
> > the request.
> >
> > CherryPy reads in the content like this for two reasons. The first
> > is so that
> > it can then determine the overall length of the content that was
> > available and
> > set the CONTENT_LENGTH value in the WSGI environ. The second reason
> > is so that
> > it can read in any additional HTTP header fields that may occur in
> > the trailer
> > after the last data chunk and also incorporate them into the WSGI
> > environ.
> >
> > The first issue with what it does is that it has read in all the
> > content. This denies
> > a WSGI application the ability to stream content from the body of a
> > request and
> > process it a bit at a time. If the content is huge, that it buffers
> > it can also mean
> > the application process size will grow significantly.
> >
> > The second issue, although I am confused on whether the CherryPy
> > WSGI server
> > actually implements this correctly, is that if the client was
> > expecting to see a
> > 100 continue response, this will need to be sent back to the client
> > before any
> > content can be read. When chunked transfer encoding is not used,
> > such a 100
> > continue response would in a good WSGI server only be sent when the
> > WSGI
> > application called read() on wsgi.input for the first time. Ie.,
> > the 100 continue
> > indicates that the application which is consuming the data is
> > actually ready to
> > start processing it. What CherryPy WSGI server is doing is
> > circumventing that and
> > the client could think the final consumer application is ready
> > before it actually is.
> >
> > Note that I am assuming here that 100 continue is still usable in
> > conjunction
> > with chunked transfer encoding. In CherryPy WSGI server it only
> > actually sends
> > the 100 continue after it attempts to try and read content in the
> > presence of a
> > chunked transfer encoding header. Not sure if this is actually a
> > bug or not.
> >
> > CherryPy WSGI server also doesn't wait until first read() by WSGI
> > application
> > before sending back the 100 continue either and instead sends it as
> > soon as the
> > headers are parsed. This may be fine, but possibly not most optimal
> > as it denies
> > an application the ability to fail a request and avoid a client
> > sending the
> > actual content.
> >
> > Now, to my mind, the preferred approach would be that the content
> > would not
> > be read up front like this and instead CONTENT_LENGTH would simply
> > be unset
> > in the WSGI environ.
> >
> >> From prior discussions related to input filtering on the list, a WSGI
> > application shouldn't really be paying much attention to
> > CONTENT_LENGTH anyway
> > and should just be using read() to get data until it returns an
> > empty string.
> > Thus, for chunked data, that it doesn't know the content length up
> > front
> > shouldn't matter as it should just call read() until there is no
> > more. BTW, it may
> > not be this simple for something like a proxy, but that is a
> > discussion for another
> > time.
> >
> > Doing this also means that the 100 continue only gets sent when the
> > application
> > is ready and there is no need to for the content to be buffered up.
> >
> > That it is the actual application which is consuming the data and
> > not some
> > intermediary means that an application could implement some
> > mechanism whereby
> > it reads some data, acts on that and starts sending some data in
> > response. The
> > client then might send more data based on that response which the
> > application
> > only then reads, send more data as response etc. Thus an end to end
> > communication stream can be established where the actual overall
> > content length
> > of the request could never be established up front.
> >
> > The only problem with deferring any reading of data to when the
> > application
> > wants to actually read it, is that if the overall length of content
> > in the request
> > is bounded, there is no way to get access to the additional headers
> > in the trailer
> > of the request and have them available in the WSGI environ since
> > processing of
> > the WSGI environ has already occurred before any data was read.
> >
> > So, what gives. What should a WSGI server do for chunked transfer
> > encoding on
> > a request?
> >
> > I may not totally understand 100 continue and chunked transfer
> > encoding and
> > am happy to be correct in my understanding of them, but what
> > CherryPy WSGI
> > server does doesn't seem right to me at first look.
> >
> > Graham
> > _______________________________________________
> > Web-SIG mailing list
> > Web-SIG at python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot%
> > 40mnot.net
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>