[Web-SIG] Proposal: Handling POST forms in WSGI

Ian Bicking ianb at colorstudy.com
Sat Oct 21 23:04:39 CEST 2006


I've added another spec to wsgi.org: 
http://wsgi.org/wsgi/Specifications/handling_post_forms

This one is a little more intrusive than wsgi.url_vars, but it addresses 
an outstanding source of problems: contention over wsgi.input.

Text copied:


:Title: Handling POST forms in WSGI
:Author: Ian Bicking <ianb at colorstudy.com>
:Discussions-To: Python Web-SIG <web-sig at python.org>
:Status: Draft
:Created: 21-Oct-2006

.. contents::

Abstract
--------

This suggests a way that WSGI middleware, applications, and frameworks 
can access POST form bodies so that there is less contention for the 
``wsgi.input`` stream.

Rationale
---------

Currently ``environ['wsgi.input']`` points to a stream that represents 
the body of the HTTP request.  Once this stream has been read, it cannot 
necessarily be read again.  It may not have a ``seek`` method (none is 
required by the WSGI specification, and frequently none is provided by 
WSGI servers).

As a result any piece of a system that looks at the request body 
essentially takes ownership of that body, and no one else is able to 
access it.  This is particularly problematic for POST form requests, as 
many framework pieces expect to have access to this.

Specification
-------------

This applies when certain requirements of the WSGI environment are met::

     def is_post_request(environ):
         if environ['REQUEST_METHOD'].upper() != 'POST':
             return False
         content_type = environ.get('CONTENT_TYPE',
             'application/x-www-form-urlencoded')
         return (
           content_type.startswith('application/x-www-form-urlencoded'
           or content_type.startswith('multipart/form-data'))

That is, it must be a POST request, and it must be a form request 
(generally ``application/x-www-form-urlencoded`` or when there are file 
uploads ``multipart/form-data``).

When this happens, the form can be parsed by ``cgi.FieldStorage``.  The 
results of this parsing should be put in ``environ['wsgi.post_form']`` 
in a particular fashion::

     def get_post_form(environ):
         assert is_post_request(environ)
         input = environ['wsgi.input']
         post_form = environ.get('wsgi.post_form')
         if (post_form is not None
             and post_form[0] is input):
             return post_form[2]
         fs = cgi.FieldStorage(fp=input,
                               environ=environ,
                               keep_blank_values=1)
         new_input = InputProcessed('')
         post_form = (new_input, input, fs)
         environ['wsgi.post_form'] = post_form
         environ['wsgi.input'] = new_input
         return fs

     class InputProcessed(object):
         def read(self, *args):
             raise EOFError(
                 'The wsgi.input stream has already been consumed')
         readline = readlines = __iter__ = read

This way multiple consumers can parse a POST form, accessing the form 
data in any order (later consumers will get the already-parsed data). 
The replacement ``wsgi.input`` guards against non-conforming access to 
the data, while the value in ``wsgi.post_form`` allows for access to the 
original ``wsgi.input`` in case it may be useful.

By checking for the replacement ``wsgi.input`` when checking if 
``wsgi.post_forms`` applies, this does not get in the way of WSGI 
middleware that may replace that key.  If the key is replaced, then the 
parsed data is implicitly invalidated.

Query String data
-----------------

Note that nothing in this specification touches or applies to the query 
string (in ``environ['QUERY_STRING']``).  This is not parsed as part of 
the process, and nothing in this specification applies to GET requests, 
or to the query string which may be present in a POST request.

Open Issues
-----------

1. Is cgi.FieldStorage the best way to store the parsed data?  It's the 
most common way, at least.

2. This doesn't address non-form-submission POST requests.  Most of the 
same issues apply to such requests, except that frameworks tend not to 
touch the request body in that case.  The body may be large, so the 
actual contents of the request body shouldn't go in the environment. 
Perhaps they could go in a temporary file, but this too might be an 
unnecessary indirection in many cases.  Also other kinds of request 
(like PUT) that have a request body are not covered, for largely the 
same reason.  In both these cases, it is much easier to construct a new 
``wsgi.input`` that accesses whatever your internal representation of 
the request body is.

3. Is the tuple of information necessary in ``wsgi.post_form``, or could 
it just be the ``FieldStorage`` instance?

4. Should ``wsgi.input`` be replaced by ``InputProcessed``, or just left 
as is?


More information about the Web-SIG mailing list