[Web-SIG] Proposal: Handling POST forms in WSGI
Ian Bicking
ianb at colorstudy.com
Sat Oct 21 23:04:39 CEST 2006
I've added another spec to wsgi.org:
http://wsgi.org/wsgi/Specifications/handling_post_forms
This one is a little more intrusive than wsgi.url_vars, but it addresses
an outstanding source of problems: contention over wsgi.input.
Text copied:
:Title: Handling POST forms in WSGI
:Author: Ian Bicking <ianb at colorstudy.com>
:Discussions-To: Python Web-SIG <web-sig at python.org>
:Status: Draft
:Created: 21-Oct-2006
.. contents::
Abstract
--------
This suggests a way that WSGI middleware, applications, and frameworks
can access POST form bodies so that there is less contention for the
``wsgi.input`` stream.
Rationale
---------
Currently ``environ['wsgi.input']`` points to a stream that represents
the body of the HTTP request. Once this stream has been read, it cannot
necessarily be read again. It may not have a ``seek`` method (none is
required by the WSGI specification, and frequently none is provided by
WSGI servers).
As a result any piece of a system that looks at the request body
essentially takes ownership of that body, and no one else is able to
access it. This is particularly problematic for POST form requests, as
many framework pieces expect to have access to this.
Specification
-------------
This applies when certain requirements of the WSGI environment are met::
def is_post_request(environ):
if environ['REQUEST_METHOD'].upper() != 'POST':
return False
content_type = environ.get('CONTENT_TYPE',
'application/x-www-form-urlencoded')
return (
content_type.startswith('application/x-www-form-urlencoded'
or content_type.startswith('multipart/form-data'))
That is, it must be a POST request, and it must be a form request
(generally ``application/x-www-form-urlencoded`` or when there are file
uploads ``multipart/form-data``).
When this happens, the form can be parsed by ``cgi.FieldStorage``. The
results of this parsing should be put in ``environ['wsgi.post_form']``
in a particular fashion::
def get_post_form(environ):
assert is_post_request(environ)
input = environ['wsgi.input']
post_form = environ.get('wsgi.post_form')
if (post_form is not None
and post_form[0] is input):
return post_form[2]
fs = cgi.FieldStorage(fp=input,
environ=environ,
keep_blank_values=1)
new_input = InputProcessed('')
post_form = (new_input, input, fs)
environ['wsgi.post_form'] = post_form
environ['wsgi.input'] = new_input
return fs
class InputProcessed(object):
def read(self, *args):
raise EOFError(
'The wsgi.input stream has already been consumed')
readline = readlines = __iter__ = read
This way multiple consumers can parse a POST form, accessing the form
data in any order (later consumers will get the already-parsed data).
The replacement ``wsgi.input`` guards against non-conforming access to
the data, while the value in ``wsgi.post_form`` allows for access to the
original ``wsgi.input`` in case it may be useful.
By checking for the replacement ``wsgi.input`` when checking if
``wsgi.post_forms`` applies, this does not get in the way of WSGI
middleware that may replace that key. If the key is replaced, then the
parsed data is implicitly invalidated.
Query String data
-----------------
Note that nothing in this specification touches or applies to the query
string (in ``environ['QUERY_STRING']``). This is not parsed as part of
the process, and nothing in this specification applies to GET requests,
or to the query string which may be present in a POST request.
Open Issues
-----------
1. Is cgi.FieldStorage the best way to store the parsed data? It's the
most common way, at least.
2. This doesn't address non-form-submission POST requests. Most of the
same issues apply to such requests, except that frameworks tend not to
touch the request body in that case. The body may be large, so the
actual contents of the request body shouldn't go in the environment.
Perhaps they could go in a temporary file, but this too might be an
unnecessary indirection in many cases. Also other kinds of request
(like PUT) that have a request body are not covered, for largely the
same reason. In both these cases, it is much easier to construct a new
``wsgi.input`` that accesses whatever your internal representation of
the request body is.
3. Is the tuple of information necessary in ``wsgi.post_form``, or could
it just be the ``FieldStorage`` instance?
4. Should ``wsgi.input`` be replaced by ``InputProcessed``, or just left
as is?
More information about the Web-SIG
mailing list